The Garbage Collection (short: GC) is a procedure to clean up dynamic allocated data structures, in particular for strings commonly used in BASIC interpreters. It collects gaps of unused space which could be reclaimed as free space. The GC is run automatically in cases free space runs out or may be started in a controlled way using a triggering action. Some implementations the GC, especially the one used on a C64 with BASIC V2, can cause the computer to freeze for minutes or even hours meanwhile a user isn't able to interact anyhow.
The problem with strings
String on the other hand have a variable length ranging from 0 byte to 255 bytes. If a new value is stored to an existing string variable then it will most likely have a different length. This makes the efficient allocation of memory for strings problematic.
One possible solution is to specify a maximum length for each string used. This method is used in Pascal and some forms of BASIC (e.g. Atari). This can be wasteful if there is a large number of strings. Consider an array of strings used to represent names and suppose that the maximum length of each string is 10 bytes. This means that any name that is larger than 10 bytes will be truncated. However, most names are likely to be shorter than 10 bytes and the remaining bytes in the string is lost as wasted space. This kind of waste can be trimmed by reducing the maximum length of each string further but this leads also to the situation where it turns to be impractical for names.
String storage in C64 BASIC
C64 BASIC uses 3 bytes for each string, which are stored in the 5 byte area of a simple variable located in the variable area or just takes exactly 3 bytes per element in case of a string array. This structure, called string descriptor, consists of one byte containing the length of the string (hence the maximum of 255 bytes) and of two bytes forming a pointer to where the string is kept on the string heap.
With statements such as,
10 A$="A STRING" or strings that are filled by a READ statement, the string descriptor points to program area where the statement is located so no space for the string itself has to be used. Other strings such as those built with an INPUT statement, returned by string functions or those created by joining two or more strings together are stored in the area of memory between the array area and the highest memory used by BASIC which is commonly known as the "string heap". The string pointer points to the start of this string.
Whenever a string that points to string heap is changed, its pointer is changed and the former string that was formerly referenced by this pointer is unused now on and becomes "garbage". To get the new string present on the string heap a new memory allocation will be made - even if the original string would be large enough to contain the new string. In case of statements like
A$=B$ where B$ already points to the string heap then rather setting A$ to point to the same string, a new copy of B$ is created (as a result of the generalized expression evaluation) on the string heap where A$ will be pointing to.
Garbage collection in C64 BASIC
If enough string manipulations are done then sooner or later, the string heap gets filled with string garbage and there won't be enough room to add any more strings. This is where the garbage collector comes in.
The garbage collector scans through the all areas which may contain string descriptors (string stack, simple variable area, array area) looking for the string with the highest address. This string will be moved to the new top of the string heap including the adjustment of the string address in the string descriptor. This search guarantees that string still in use aren't harmed. It then scans the string descriptors for the next one pointing just below the current string heap top and moves that string up too so it is right below the previous relocated string (adjusting the string heap top downwards accordingly). This procedure is repeated as long any string descriptor below the new string heap top could be found.
Thanks to the garbage collection all strings are laying adjacent on the string heap without gaps in between not wasting space anymore.
Garbage collection "gotcha"
For each string (part of simple variables or arrays) pointing to string heap, the garbage collector has to scan all string descriptors. If there are N such string variables then the number of string descriptors has to be scanned N times. These scans occur even than it turns out that a string doesn't need to be moved.
Therefore, the time it takes for the garbage collector to do its job increases with the square of the number of strings actually pointing into the string heap. When there are only a handful of strings, the running time of the garbage collector is negligible. However, If there are hundreds of strings being dealt with the garbage collection could take 20 minutes or far longer while the C64 will appearing quite dead (not even responding to thekey.
Building or manipulating strings can also cause the string heap to rapidly fill up with garbage because C64 BASIC allocates new copies of strings instead of reassigning pointers (where possible).
Both of these factors can be a trap for the unwary programmer who tests some code on a small handful of strings without realizing that a much larger set of strings would cause the program to halt for lengthy periods of time due to garbage collection.
Averting garbage collection delays
Already BASIC 4.0 and later BASIC 3.5 and its successor BASIC 7.0 solved the problem of garbage collection delays by including "back pointers" in the string space which made it possible to do the full set of garbage collections with just a single sweep. However, the Commodore 64 never went beyond BASIC 2.0 so the C64 programmer needs to take some measures   to prevent garbage collection from becoming too intrusive when dealing with a large number of strings.
- When the contents of a string is no longer required, the string should be emptied. A statement like
A$=""causes A$ to point to program area (however, it doesn't matter because the length is 0) where it not only escapes the scrutiny of the garbage collector but also frees the memory which has been previously referenced for reuse on the string heap.
- Instead of manipulating large arrays of strings which would rapidly generate garbage, one could work with index tables to just manipulate the indices instead and leave the actual strings unmoved. The recursion section has an example that shows how this can be done.
- When the FRE function is executed, it forces a garbage collection so that the true amount of free memory can be calculated. Inserting statements like
X=FRE(0)at the appropriate places in a program ensures that garbage collection takes place at a time of the programmer's choice and doesn't catch anybody by surprise. It should be clear that this won't change anything on the time the garbage collection will take.
- If it is known that there is no garbage above a certain memory location then the "top of BASIC" pointer (locations 55 and 56 on the C64) could be temporarily changed to that location before running the garbage collector.
- A program could include a subroutine which is run each time a new string or a copy of a string is added to the string heap. The subroutine would check the difference between the bottom of the string heap pointer (locations 51 and 52 on the C64) and the end of BASIC arrays pointer (locations 49 and 50 on the C64). If this difference falls below a critical level then the subroutine writes all of the strings to disk, empties each string, runs the garbage collector then reads each string back from the disk. This would be faster for a lot of strings than just running the garbage collector alone.
On a C64 the BASIC V2 garbage collector routine can be found at BASIC ROM address $B526/46374 (with extensive commentary). This routine is directly derived from the MS-BASIC implementation as variation "Commodore BASIC 2".
The routine is called
- from the service routine for acquiring space from the string storage, see BASIC ROM at address $B4F4/46324 Allocate Space for String and
- during changes on main Basic areas (e.g. while creating a new array), see BASIC ROM at address $A408/41992: Check Memory Overlap.
Other implementations with the aim to drastically reduce the time of the garbage collection process usually obey one of these two main techniques:
- Collecting is done with the help of an additional temporary buffer area (typically the unused RAM space covered by the ROMs). The buffer size is usually from 4 to 16 kbyte.
Linear run-time: The whole string space could be reoranized in a handful of runs with linear time-behavior.
The space needed for a string does not change which serves best compatibility to every known BASIC program (under the assumption it does not collide with the buffer area).
A more or less large free memory area has to be provided.
Interrupt activities might suffer from a copy-back action while accessing the RAM below ROM with blocked interrupts (depends on the actual implementation).
- A back-pointer attached to each strings (introduced with the BASIC 4.0 implementation) on the string heap will consume 2 additional bytes for each non empty string. The back-pointer denotes the string descriptor which uses the string or marks a gap and its size leading to the following element in the string space. This structure allows the elimination of all gaps in one single pass.
The run-time is short and strictly deterministic.
Each string consumes additional 2 bytes for the back-link which could lead to serious problems for large programs or programs with a huge string demand.
The BASIC ROM has to be patched at least on three places (get space, garbage collect, string house-keeping), either as version which fits into the ROM replacing the old routine or as a stand-alone version).
This technique can be found in
Available implementations for the C64
- Super Garbage Collection (Johann Klasek) - buffer technique: It uses the unused RAM below the KERNAL ROM as buffer space. The runtime-consumption grows linearly. The example below takes just 3,4 sec to run.
- Garbage64 - buffer technique: It uses the unused RAM below BASIC and KERNAL ROM. This version is seriously bugged. This version is limited by design which renders it useless for general practical use.
- Garbcol - back-pointer technique: This implementation is very similar to the initial version introduced with BASIC 4.0, which went into BASIC 3.5 and BASIC 7.0. The published version contains some bugs.
A BASIC demo program for GC, which used a string array with 9000 strings and a string length of 1 (each strings used 4 bytes in the BASIC memory). This program needs the whole BASIC memory:
1 M=9000 10 DIM A$(M) 20 PRINT "FREE STRING MEMORY: " FRE(0) 30 B$=CHR$(65) 40 PRINT "ARRAY SETUP ..." 50 FOR I=0 TO M : A$(I)=B$: NEXT 60 PRINT "GARBAGE COLLECTION ..." 70 TI$="000000" 80 PRINT FRE(0) "BYTES FREE" 90 PRINT "TIME (HHMMSS): " TI$ " /" INT(TI/3600+1) "MINUTES"
An alternative BASIC program for the direct mode:
M=9000 DIM A$(M) FOR I=0 TO M: A$(I)="A": NEXT TI$="000000": PRINT FRE(0) "BYTES FREE" PRINT "TIME (HHMMSS): " TI$ " /" INT(TI/3600+1) "MINUTES"
Each programs need normally on a standard C64 1 hour and 49 minutes!
The first program is also running on the VICE emulator with warp mode, that has a runtime of a few minutes here:
- Compute! Magazine June 1984, p. 131, Jim Butterfield: "Garbage Collection on Commodore Computers - Part 1"
- Compute! Magazine July 1984, p. 144, Jim Butterfield: "Garbage Collection on Commodore Computers - Part 2"
- unusedino.de: BASIC ROM at address $B526/46374: Garbage Collection
- unusedino.de: BASIC ROM at address $B37D/45949: FRE()
- unusedino.de: BASIC-ROM an Adresse $B4F4/46324: Allocate Space for String
- unusedino.de: BASIC-ROM an Adresse $A408/41992: Check Memory Overlap
- zimmers.net: PET Firmeware Sourcecode: Garbage Collection Routine $C61D (archive.org)
- BSOS-8296 Sourcecode: Garbage_Collection ; $c66a
- Plus/4, C16 ROM disassembly: $A954 (by Mike Dailly with additional comments by Johann Klasek)
- C128 ROM disassembly: $92EA Garbage Collection Routine (by Soci with additonal comments by Johann Klasek)
- 64'er-Magazin February 1986, p. 52, "Weg mit dem Müll" - alternative GC implementation "Garbage64"
- 64'er-Magazin October 1988, p. 54, "High-Speed-Strings" - alternative GC implementation "Garbcol"