CHRGET

From C64-Wiki
Jump to navigationJump to search

CHRGET is a machine language subroutine on Commodore computers that plays a central role for the BASICinterpreter. It reads characters or tokens progressively in the so-called text of a program. So an input line in direct mode or a saved BASIC program.

CHRGET on C64[edit | edit source]

Name: CHRGET / CHRGOT
Description: Read characters from BASIC text
Entry point: $73 / 115 (CHRGET)
$79 / 121 (CHRGOT)
Passed arguments:
Return values:
Accumulator: Character read
carry flag: 0 = decimal digit
zero flag: 1 = end of instruction

CHRGET copy[edit | edit source]

CHRGET is located at C64 in the address range 115-138 ($73-$8A). During the cold start, the CHARGET routine is copied from the range 58274-58297 ($E3A2-$E3BE) into the Zeropage using the program loop in the KERNAL starting at 58336 ($E3E0). This ensures faster processing, but is also necessary because CHRGET contains self-modifying code: The pointer 122/123 ($7A/$7B) to the text is within the routine and is modified by it itself.

Details[edit | edit source]

There are two entry points:

  • CHRGET (115 = $0073) reads the next character, first incrementing the text pointer.
  • CHRGOT (121 = $0079) reads the character at the current position of the text pointer.

The routine initially reads over all spaces (Char 32 = $20) and then transfers the read character into the accumulator. Numerical characters (Charactercode 48 = $30 to 57 = $39) are identified by a deleted Carry-Flag. The Zero-Flag indicates the end of the instruction, namely when

  • either a colon character (character code 58 = $3A) was found, which means the end of the BASIC statement,
  • or if the value 0 was encountered, which indicates the end of the input buffer or the end of the line. The following table shows the possible conclusions upon querying the values highlighted in bold:
Carry flag Zero flag Result
0 x Digit
(zero flag is never set here)
1 0 no digit (but no end of statement)
1 1 End of instruction


The text pointer[edit | edit source]

The text pointer $7A/$7B is initialized depending on whether direct mode or program mode is active:

  • In direct mode, the BASIC-ROM routine places the pointer to $01FF in front of the input buffer (512-600 = $0200-$0258) starting at 42112 = $A480 (input queue).
  • In program mode, the BASIC ROM routine from 42638 = $A68E, which is called by RUN or LOAD, places the pointer directly before the start of BASIC, which is called by Vector 43/44 ($2B/$2C) is determined. In the normal case (start = $0801) this is $0800.

Listing[edit | edit source]

 0073: E6 7A INC $7A ; Increase text pointer, less significant part
 0075: D0 02 BNE $0079 ; overflow
 0077: E6 7B INC $7B ; Text pointer, higher order part
 0079: AD 00 08 LDA $0800 ; Read text with self-modified address
 007C: C9 3A CMP #$3A ; ":"=End of statement, also 1st character after "9"
 007E: B0 0A BCS $008A ; greater than "9", no digit: carry flag=1 or end of instruction at ":"
 0080: C9 20 CMP #$20 ; Space...
 0082: F0 EF BEQ $0073 ; read over
 0084: 38 SEC ; Prepare subtraction...
 0085: E9 30 SBC #$30 ; Digit "0"
 0087: 38 SEC ; Invert carry... (digits are >=0)
 0088: E9 D0 SBC #$D0 ; Subtraction back, less than "0", carry flag=1
 008A: 60 RTS ; Zero flag=1 end of instruction, carry flag=1 no digit

Here the text pointer is shown as $0800.

Time savings through placement in the zero page[edit | edit source]

The space in the zero page is particularly valuable because numerous Assembler instructions with indirect addressing rely on the zero page and accesses to the zero page are often faster than other accesses. The CHRGET routine takes up almost 10% of the zero page, so there should be good reason to put it here. Since self-modification also works outside of the zero page (for example in the extended zero page), the faster execution time is crucial here.

The self-modified access to the memory using the Absolute Addressing at $0079 provides a speed gain, where one to two clock cycles are saved compared to code outside the zero page and the Y register remains undamaged, which means that Calling code provides more flexibility when accessing the surrounding BASIC text. A routine outside of the zero page would then have to be either

  1. use the indirect, Y-post-indexed zero-page addressing and thereby set the Y register to 0, which would additionally slow down the runtime and, as a (not necessarily negative) side effect, the manipulation of the Y-Register would bring with it or
  2. Leave the routine unchanged, although you lose around 1 clock cycle in terms of running time due to the longer INC command in $0073 and $0077. The big disadvantage, especially for the calling code, would be that manipulation or direct access in the Interpreter in around 20 places[1] can no longer be done via indirectly indexed accesses and would therefore have to be recoded in a more time-consuming manner, since the CHRGET pointer would then no longer be in the zero page and there is no “indirect addressing” type of addressing for the necessary commands.

By analyzing typical BASIC programs, we find that around 53% of all characters in BASIC programs are smaller than $3A and spaces almost never occur (string constants are not evaluated with CHRGET). With these values you can calculate the time savings to be around 3%.

That may seem like a lot at first glance, but you have to keep in mind that the CHRGET routine only accounts for about 3% of the execution time of a BASIC program. The saving is therefore around 0.1%: with a BASIC program that runs for a whole day, you ultimately save around a minute. All in all, it is doubtful whether the decision to move the CHRGET routine to the zeropage was really justified.

Alternative implementations can achieve clearly measurable (if not always noticeable) acceleration here.[2]

CHRGET on C128[edit | edit source]

For C128, the routine also contains code to switch to BASIC RAM in bank 0 and then reactivate the ROM. Furthermore, the routine is not copied into the zero page (it requires at least 30 bytes), but into the memory area starting at $0380. So that the text pointer can still be manipulated efficiently via the zero page, the self-modifying code is omitted here and the Addressing indirect zero page Y-post-indexed is used, with the Y register as a side effect is always set to zero after the call, which can be used to the advantage of the calling layer.

Details[edit | edit source]

As with the C64 variant, there are two entry points:

  • CHRGET (896 = $0380) reads the next character, first incrementing the text pointer.
  • CHRGOT (902 = $0386) reads the character at the current position of the text pointer.

Text pointer[edit | edit source]

The pointer $3D/$3E in the zero page contains the current position to the BASIC text.

Listing[edit | edit source]

 0380: E6 3D INC $3D ; Increase text pointer
 0382: D0 02 GNI $0386
 0384: E6 3E INC $3E
 0386: 8D 01 FF STA $FF01 ; Show bank 0
 0389: A0 00 LDY #$00
 038B: B1 3D LDA ($3D),Y ; Read memory at text pointer position
 038D: 8D 03 FF STA $FF03 ; Hide bank 0 again
 0390: C9 3A CMP #$3A
 0392: B0 0A BCS $039E
 0394: C9 20 CMP #$20
 0396: F0 E8 BEQ $0380
 0398: 38 SEC
 0399: E9 30 SBC #$30
 039B: 38 SEC
 039C: E9 D0 SBC #$D0
 039E: 60 RTS

Apart from the documented deviations, the same applies as for C64 CHRGET-Listing.

Wedge[edit | edit source]

Since CHRGET is used to read each BASIC character before executing a statement, You can intervene here to add new BASIC commands for a BASIC extension. Such a change is commonly referred to as a "wedge". A well-known example of this technique is the DOS Wedge.

This standard technique can also be used going back to PET, where this was the only way to integrate extensions, since only later BASIC interpreters (at least with BASIC V2) have corresponding vectors, an extension completely to Token level.

See also[edit | edit source]

See also article 115-138.

Links[edit | edit source]

References[edit | edit source]

  1. ROM listing AAY: Use of $7A (CHRGET pointer) Language:english
  2. Thread: Check for digit in CHRGET (C64/C128) on Forum64.de Language German: Suggestions for alternative implementations of CHRGET