String Heap

From C64-Wiki
Jump to navigationJump to search

The String Heap is the BASIC interpreter's data structure, a dynamic memory area which contains strings of a BASIC program. On a C64 −as usual for all 8-bit Microsoft BASIC variants− it starts at right below the BASIC ROM (see the memory map), which is usually the address (MEMTOP) 40960/$A000. Other CBM BASIC variants with more than 64 KB memory might locate the string heap in some other memory bank (sometimes combined with variables or completely separate).
Other well-known names for it are String Storage Area[1], String Space referenced more or less frequently in books, magazines and manuals.

Structure[edit | edit source]

Each element on the string heap is to be considered as string, a collection of characters with length from 1 to 255. They are normally (if in use) referenced by string descriptor structure which is embedded in simple variables, arrays or on the temporary string descriptor stack.

Starting from the strings on the heap there is no way to differentiate between strings in use and the one which aren't in in use anymore (called "garbage").
Since BASIC 3.5 (derived from the old BASIC 4.0 of the PET class) and in later CBM BASIC versions this structural disadvantage has been eliminated:

  1. An additional back-pointer to the string descriptor (back link) is attached to every string and opens the possibility to correct the descriptor if the string has to be moved.
  2. A gap marker designates a gap as unused space with an additional length byte giving the size of the gap.

Newly created strings let grow the heap towards lower addresses until the end of arrays area (STREND) is reached. The current top of heap is referenced by the pointer FRETOP.
If the free space is exhausted (FRETOP reaches STREND) the garbage collection comes into play which tries to compact the heap by removing all the gaps and moving every used strings to the upper heap area.

After processing the following commands in direct mode:

A$="ABC"
B$(2)=A$+"XYZ"

the string heap looks like this in memory:

          ├─             ─┤
MEMSIZ    │               │◄─────┐           BASIC's end of memory, usually at $A000
          └───────────────┘      
          ┌───────────────┐      
$9FFF     │      'C'      ─             ─      'B'      ─             ─      'A'      │◄───┐            string "ABC" from A$
          ├───────────────┤    │        ─┐
          │      'Z'      │    │         │ 
          ├─             ─┤    │         │
          │      'Y'      │    │         │  unused gap ("garbage"), which has been left after string concatenation
          ├─             ─┤    │         │  (not referenced anymore by a descriptor)
          │      'X'      │    │         │
          ├───────────────┤    │        ─┘
          │      'Z'      │    │ ─             ─┤    │       'Y'      │    │ ─             ─┤    │       'X'      │    │ ─             ─┤    │       'C'      │    │ ─             ─┤    │       'B'      │    │ ─             ─┤    │       'A'      │◄─┐ │ ◄┐         concatenated string "ABCXYZ" from B$(2)
          ├───────────────┤  │ │       ─┐
          │               │  │ │        │
          ├─             ─┤  │ │        │
          │               │  │ │         .          │ │         .          │ │        │  free space for new string heap requests
                  .          │ │        │
          │               │  │ │        │
          ├─             ─┤  │ │        │
STREND    │               │  │ │  ◄┐    │
          └───────────────┘  │ │      ─┘
                  .          │ │   
                  .          │ │   
                  .          │ │   
          ├───────────────┤  │ │   
          │   High Byte   ││ │ │   
          ├─             ─┤├─┘ │   
          │   Low Byte    ││   │   
          ├─             ─┤    │    
          │   Length (6)  │    │          string descriptor in array B$(), element B$(2)
          ├───────────────┤    │   
                  .            │   
                  .            │   
                  .            │   
          ├───────────────┤    │   
          │   High Byte   ││   │   
          ├─             ─┤├───┘   
          │   Low Byte    ││       
          ├─             ─┤         
          │   Length (3)  │               string descriptor of variable A$
          ├───────────────┤         
                  .                
                  .                
                  .                
          ┌───────────────┐         
$0038     │   High Byte   ││     │  
          ├─             ─┤├─────┘         MEMSIZ - end of BASIC memory: highest possible address of the string heap
$0037     │   Low Byte    │        
          └───────────────┘         
                  .                 
                  .                 
                  .                 
          ┌───────────────┐         
$0034     │  High Byte    ││       │ 
          ├─             ─┤├───────┘        FRETOP - points to the last requested string (to the first used byte)
$0033     │   Low Byte    │         
          ├───────────────┤          
$0032     │  High Byte    ││         │
          ├─             ─┤├─────────┘       STREND - end of the array area: lowest possible address of the string heap
$0031     │   Low Byte    │                          (the first byte past the arrays)
          └───────────────┘


Legend:
   used
   unused (gap)
   free


References[edit | edit source]