Garbage Collection

From C64-Wiki
Jump to navigationJump to search

The Garbage Collection (short: GC) is a procedure to clean up dynamic allocated data structures, in particular for strings commonly used in BASIC interpreters. It collects gaps of unused space which could be reclaimed as free space. The GC is run automatically in cases free space runs out or may be started in a controlled way using a triggering action. Some implementation of the GC, especially the one used on a C64 with BASIC V2, can cause the computer to freeze for minutes or even hours meanwhile a user isn't able to interact anyhow.

The direct and controlled GC can be run for example with the BASIC command PRINT FRE(0) (it primarily shows the free BASIC memory).

The problem with strings[edit | edit source]

Integer and float occupy a fixed amount of memory in BASIC so it is easy to reserve memory for these variables.

String on the other hand have a variable length ranging from 0 to 255 bytes. If a new value is stored to an existing string variable then it will most likely have a different length. This makes the efficient allocation of memory for strings problematic.

One possible solution is to specify a maximum length for each string used. This method is used in Pascal and some forms of BASIC (e.g. Atari). This can be wasteful if there is a large number of strings. Consider an array of strings used to represent names and suppose that the maximum length of each string is 10 bytes. This means that any name that is larger than 10 bytes will be truncated. However, most names are likely to be shorter than 10 bytes and the remaining bytes in the string is lost as wasted space. This kind of waste can be trimmed by reducing the maximum length of each string further but this leads also to the situation where it turns to be impractical for names.

String storage in C64 BASIC[edit | edit source]

C64 BASIC uses 3 bytes for each string, which are stored in the 5 byte area of a simple variable located in the variable area or just takes exactly 3 bytes per element in case of a string array. This structure, called string descriptor, consists of one byte containing the length of the string (hence the maximum of 255 bytes) and of two bytes forming a pointer to where the string is kept on the string heap.

With statements such as, 10 A$="A STRING" or strings that are filled by a READ statement, the string descriptor points to program area where the statement is located so no space for the string itself has to be used. Other strings such as those built with an INPUT statement, returned by string functions or those created by joining two or more strings together are stored in the area of memory between the array area and the highest memory used by BASIC which is commonly known as the "string heap". The string pointer points to the start of this string.

Whenever a string that points to string heap is changed, its pointer is changed and the former string that was formerly referenced by this pointer is unused now on and becomes "garbage". To get the new string present on the string heap a new memory allocation will be made - even if the original string would be large enough to contain the new string. In case of statements like A$=B$ where B$ already points to the string heap then rather setting A$ to point to the same string, a new copy of B$ is created (as a result of the generalized expression evaluation) on the string heap where A$ will be pointing to.

Garbage collection in C64 BASIC[edit | edit source]

If enough string manipulations are done then sooner or later, the string heap gets filled with string garbage and there won't be enough room to add any more strings. This is where the garbage collector comes in.

The garbage collector scans through the all areas which may contain string descriptors (string stack, simple variable area, array area) looking for the string with the highest address. This string will be moved to the new top of the string heap including the adjustment of the string address in the string descriptor. This search guarantees that string still in use aren't harmed. It then scans the string descriptors for the next one pointing just below the current string heap top and moves that string up too so it is right below the previous relocated string (adjusting the string heap top downwards accordingly). This procedure is repeated as long any string descriptor below the new string heap top could be found.

Thanks to the garbage collection all strings are laying adjacent on the string heap without gaps in between not wasting space anymore.

Garbage collection "gotcha"[edit | edit source]

For each string (part of simple variables or arrays) pointing to string heap, the garbage collector has to scan all string descriptors. If there are N such string variables then the number of string descriptors has to be scanned N times. These scans occur even than it turns out that a string doesn't need to be moved.

Therefore, the time it takes for the garbage collector to do its job increases with the square of the number of strings actually pointing into the string heap. When there are only a handful of strings, the running time of the garbage collector is negligible. However, If there are hundreds of strings being dealt with the garbage collection could take 20 minutes or far longer while the C64 will appearing quite dead (not even responding to the RUN/STOP  key.

Building or manipulating strings can also cause the string heap to rapidly fill up with garbage because C64 BASIC allocates new copies of strings instead of reassigning pointers (where possible).

Both of these factors can be a trap for the unwary programmer who tests some code on a small handful of strings without realizing that a much larger set of strings would cause the program to halt for lengthy periods of time due to garbage collection.

Averting garbage collection delays[edit | edit source]

Already BASIC 4.0 and later BASIC 3.5 and its successor BASIC 7.0 solved the problem of garbage collection delays by including "back pointers" in the string space which made it possible to do the full set of garbage collections with just a single sweep. However, the Commodore 64 never went beyond BASIC 2.0 so the C64 programmer needs to take some measures [1] [2] to prevent garbage collection from becoming too intrusive when dealing with a large number of strings.

  • When the contents of a string is no longer required, the string should be emptied. A statement like A$="" causes A$ to point to program area (however, it doesn't matter because the length is 0) where it not only escapes the scrutiny of the garbage collector but also frees the memory which has been previously referenced for reuse on the string heap.
  • Instead of manipulating large arrays of strings which would rapidly generate garbage, one could work with index tables to just manipulate the indices instead and leave the actual strings unmoved. The recursion section has an example that shows how this can be done.
  • When the FRE function is executed, it forces a garbage collection so that the true amount of free memory can be calculated. Inserting statements like X=FRE(0) at the appropriate places in a program ensures that garbage collection takes place at a time of the programmer's choice and doesn't catch anybody by surprise. It should be clear that this won't change anything on the time the garbage collection will take.
  • If it is known that there is no garbage above a certain memory location then the "top of BASIC" pointer (locations 55 and 56 on the C64) could be temporarily changed to that location before running the garbage collector.
  • A program could include a subroutine which is run each time a new string or a copy of a string is added to the string heap. The subroutine would check the difference between the bottom of the string heap pointer (locations 51 and 52 on the C64) and the end of BASIC arrays pointer (locations 49 and 50 on the C64). If this difference falls below a critical level then the subroutine writes all of the strings to disk, empties each string, runs the garbage collector then reads each string back from the disk. This would be faster for a lot of strings than just running the garbage collector alone.

Implementation[edit | edit source]

On a C64 the BASIC V2 garbage collector routine can be found at BASIC ROM address $B526/46374 (with extensive commentary).[3] This routine is directly derived from the MS-BASIC implementation as variation "Commodore BASIC 2".

The routine is called

Alternative implementations[edit | edit source]

Other implementations with the aim to drastically reduce the time of the garbage collection process usually obey one of these two main techniques:

  1. Collecting is done with the help of an additional temporary buffer area (typically the unused RAM space covered by the ROMs). The buffer size is usually from 4 to 16 kbyte.
    pro Linear run-time: The whole string space could be reorganized in a handful of runs with linear time-behavior.
    pro The space needed for a string does not change which serves best compatibility to every known BASIC program (under the assumption it does not collide with the buffer area).
    contra A more or less large free memory area has to be provided.
    contra Interrupt activities might suffer from a copy-back action while accessing the RAM below ROM with blocked interrupts (depends on the actual implementation).
  2. A back-pointer attached to each strings (introduced with the BASIC 4.0 implementation) on the string heap will consume 2 additional bytes for each non empty string. The back-pointer denotes the string descriptor which uses the string or marks a gap and its size leading to the following element in the string space. This structure allows the elimination of all gaps in one single pass.
    pro The run-time is short and strictly deterministic.
    contra Each string consumes additional 2 bytes for the back-link which could lead to serious problems for large programs or programs with a huge string demand.
    Exception: The Back-Link-On-Demand implementation don't need these extra bytes!
    contra The BASIC ROM has to be patched at least on three places (get space, garbage collect, string house-keeping, for the BLOD implementation garbage collect, variable assignment, string concatenation and string-part operation), either as version which fits into the ROM replacing the old routine or as a stand-alone version).
    This technique can be found in

Available implementations for the C64[edit | edit source]

  • Super Garbage Collection (Johann Klasek) - buffer technique: There are two hook-methods. Patching the interpreter copy in RAM or interrupt-based (RAM below interpreter remains free). It uses either the unused RAM below the KERNAL or the BASIC ROM as buffer space (depending on the hook-method). The runtime-consumption grows linearly. The example below takes just 3.4 sec to run.
  • Back-Link-On-Demand-GC (Johann Klasek)
    - back-pointer technique: Instead of using extra space for the back-pointer on the string heap it embeds the back-pointer in the existing string data and string descriptor on-demand during the collection. A gap data-structure which marks unused strings will be ensured to exist with the help of the several string-generating operations in the interpreter.
  • Garbage64[11] - buffer technique: It uses the unused RAM below BASIC and KERNAL ROM. This version is seriously bugged. This version is limited by design which renders it useless for general practical use.
  • Garbcol[12] - back-pointer technique: This implementation is very similar to the initial version introduced with BASIC 4.0, which went into BASIC 3.5 and BASIC 7.0. The published version[13] contains some bugs, which has been fixed[14] and optimized later later by others.


Examples[edit | edit source]

A BASIC demo program for GC, which uses a string array consisting of 9000 strings of length 1 (each strings consumes 4 bytes of BASIC memory). Almost the entire BASIC memory is used up. Calling the function FRE(0) triggers the GC which does not free any space but will take a very long time anyway:

 1 M=9000
10 DIM A$(M)
20 PRINT "FREE STRING MEMORY: " FRE(0)
30 B$=CHR$(65)
40 PRINT "ARRAY SETUP ..."
50 FOR I=0 TO M : A$(I)=B$: NEXT
60 PRINT "GARBAGE COLLECTION ..."
70 TI$="000000"
80 PRINT FRE(0) "BYTES FREE"
90 PRINT "TIME (HHMMSS): " TI$ " /" INT(TI/3600+1) "MINUTES"

An alternative for the above program for use in direct mode only:

M=9000
DIM A$(M)
FOR I=0 TO M: A$(I)="A": NEXT
TI$="000000": PRINT FRE(0) "BYTES FREE"
PRINT "TIME (HHMMSS): " TI$ " /" INT(TI/3600+1) "MINUTES"

Both programs take on a standard C64 1 hour and 49 minutes!

These sample programs could be easily tested on the VICE emulator with warp mode, running them in just a few minutes (depending on the emulator's host performance):

Listing and running of the GC test program (German version)

References[edit | edit source]

Links[edit | edit source]

WP-W11.png Wikipedia: Garbage_collection_(computer_science)