Garbage Collection

From C64-Wiki
Jump to: navigation, search

The Garbage Collection (short: GC) is a procedure to clean up dynamic allocated data structures, in particular for strings commonly used in BASIC interpreters. It collects gaps of unused space which could be reclaimed as free space. The GC is run automatically in cases free space runs out or may be started in a controlled way using a triggering action. Some implementations the GC, especially the one used on a C64 with BASIC V2, can cause the computer to freeze for minutes or even hours meanwhile a user isn't able to interact anyhow.

The direct and controlled GC can be run for example with the BASIC command PRINT FRE(0) (it primarily shows the free BASIC memory).

The problem with strings[edit]

Integer and Float occupy a fixed amount of memory in BASIC so it is easy to reserve memory for these variables.

String on the other hand have a variable length ranging from 0 byte to 255 bytes. If a new value is stored to an existing string variable then it will most likely have a different length. This makes the efficient allocation of memory for strings problematic.

One possible solution is to specify a maximum length for each string used. This method is used in Pascal and some forms of BASIC (eg Atari). This can be wasteful if there is a large number of strings. Consider an array of strings used to represent names and suppose that the maximum length of each string is 10 bytes. This means that any name that is larger than 10 bytes will be truncated. However, most names are likely to be shorter than 10 bytes and the remaining bytes in the string is lost as wasted space. This kind of waste can be trimmed by reducing the maximum length of each string further but this leads also to the situation where it turns to be impractical for names.

String storage in C64 BASIC[edit]

C64 BASIC uses 3 bytes for each string, which are stored in the 5 byte area of a simple variable located in the variable area or just takes exactly 3 bytes per element in case of a string array. This structure, called string descriptor, consists of one byte containing the length of the string (hence the maximum of 255 bytes) and of two bytes forming a pointer to where the string is kept on the string heap.

With statements such as, 10 A$="A STRING" or strings that are filled by a READ statement, the string descriptor points to program area where the statement is located so no space for the string itself has to be used. Other strings such as those built with an INPUT statement, returned by string functions or those created by joining two or more strings together are stored in the area of memory between the array area and the highest memory used by BASIC which is commonly known as the "string heap". The string pointer points to the start of this string.

Whenever a string that points to string heap is changed, its pointer is changed and the former string that was formerly referenced by this pointer is unused now on and becomes "garbage". To get the new string present on the string heap a new memory allocation will be made - even if the original string would be large enough to contain the new string. In case of statements like A$=B$ where B$ already points to the string heap then rather setting A$ to point to the same string, a new copy of B$ is created (as a result of the generalized expression evaluation) on the string heap where A$ will be pointing to.

Garbage collection in C64 BASIC[edit]

If enough string manipulations are done then sooner or later, the string heap gets filled with string garbage and there won't be enough room to add any more strings. This is where the garbage collector comes in.

The garbage collector scans through the all areas which may contain string descriptors (string stack, simple variable area, array area) looking for the string with the highest address. This string will be moved to the new top of the string heap including the adjustment of the string address in the string descriptor. This search guarantees that string still in use aren't harmed. It then scans the string descriptors for the next one pointing just below the current string heap top and moves that string up too so it is right below the previous relocated string (adjusting the string heap top downwards accordingly). This procedure is repeated as long any string descriptor below the new string heap top could be found.

Thanks to the garbage collection all strings are laying adjacent on the string heap without gaps in between not wasting space anymore.

Garbage collection "gotcha"[edit]

For each string (part of simple variables or arrays) pointing to string heap, the garbage collector has to scan all string descriptors. If there are N such string variables then the number of string descriptors has to be scanned N times. These scans occur even than it turns out that a string doesn't need to be moved.

Therefore, the time it takes for the garbage collector to do its job increases with the square of the number of strings actually pointing into the string heap. When there are only a handful of strings, the running time of the garbage collector is negligible. However, If there are hundreds of strings being dealt with the garbage collection could take 20 minutes or far longer while the C64 will appearing quite dead (not even responding to the RUN/STOP  key.

Building or manipulating strings can also cause the string heap to rapidly fill up with garbage because C64 BASIC allocates new copies of strings instead of reassigning pointers (where possible).

Both of these factors can be a trap for the unwary programmer who tests some code on a small handful of strings without realizing that a much larger set of strings would cause the program to halt for lengthy periods of time due to garbage collection.

Averting garbage collection delays[edit]

Already BASIC 4.0 and later BASIC 3.5 and its successor BASIC 7.0 solved the problem of garbage collection delays by including "back pointers" in the string space which made it possible to do the full set of garbage collections with just a single sweep. However, the Commodore 64 never went beyond BASIC 2.0 so the C64 programmer needs to take some measures[1] [2] to prevent garbage collection from becoming too intrusive when dealing with a large number of strings.

  • When the contents of a string is no longer required, the string should be emptied. A statement like A$="" causes A$ to point to program area (however, it doesn't matter because the length is 0) where it not only escapes the scrutiny of the garbage collector but also frees the memory which has been previously referenced for reuse on the string heap.
  • Instead of manipulating large arrays of strings which would rapidly generate garbage, one could work with index tables to just manipulate the indices instead and leave the actual strings unmoved. The recursion section has an example that shows how this can be done.
  • When the FRE function is executed, it forces a garbage collection so that the true amount of free memory can be calculated. Inserting statements like X=FRE(0) at the appropriate places in a program ensures that garbage collection takes place at a time of the programmer's choice and doesn't catch anybody by surprise. It should be clear that this won't change anything on the time the garbage collection will take.
  • If it is known that there is no garbage above a certain memory location then the "top of BASIC" pointer (locations 55 and 56 on the C64) could be temporarily changed to that location before running the garbage collector.
  • A program could include a subroutine which is run each time a new string or a copy of a string is added to the string heap. The subroutine would check the difference between the bottom of the string heap pointer (locations 51 and 52 on the C64) and the end of BASIC arrays pointer (locations 49 and 50 on the C64). If this difference falls below a critical level then the subroutine writes all of the strings to disk, empties each string, runs the garbage collector then reads each string back from the disk. This would be faster for a lot of strings than just running the garbage collector alone.


Implementation[edit]

Alternative implementations which drastically reduce the time of the garbage collection process usually obey one of these two main techniques:

  1. Introducing back pointers for strings (according to the BASIC 4.0 implementation) on the heap with consumes 2 additional bytes for each non empty string
  2. Collecting with an additional intermediate collection area (such a buffer can be built with the unused RAM space covered by the ROMs).

Available implementations:

Examples[edit]

A BASIC demo program for GC, which used a string array with 9000 strings and a string length of 1 (each strings used 4 bytes in the BASIC memory). This program needs the whole BASIC memory:

 1 M=9000
10 DIM A$(M)
20 PRINT "FREE STRING MEMORY: " FRE(0)
30 B$=CHR$(65)
40 PRINT "ARRAY SETUP ..."
50 FOR I=0 TO M : A$(I)=B$: NEXT
60 PRINT "GARBAGE COLLECTION ..."
70 TI$="000000"
80 PRINT FRE(0) "BYTES FREE"
90 PRINT "TIME (HHMMSS): " TI$ " /" INT(TI/3600+1) "MINUTES"

An alternative BASIC program for the direct mode:

M=9000
DIM A$(M)
FOR I=0 TO M: A$(I)="A": NEXT
TI$="000000": PRINT FRE(0) "BYTES FREE"
PRINT "TIME (HHMMSS): " TI$ " /" INT(TI/3600+1) "MINUTES"

Each programs need normally on a standard C64 1 hour and 49 minutes!

The first program is also running on the VICE emulator with warp mode, that has a runtime of a few minutes here:

Listing and running of the GC test program (German version)

References[edit]

  1. Compute! Magazine June 1984
  2. Compute! Magazine July 1984 give additional hints on how to minimized garbage collection delays.
  3. 64'er-Magazin February 1986: alternative GC implementation: Garbage64
  4. 64'er-Magazin October 1988: alternative GC implementation: Garbcol

Links[edit]

WP-W11.png Wikipedia: Garbage_collection_(computer_science)