# Floating point arithmetic

**Floating point arithmetic** is a way to represent and handle a large range of real numbers in a binary form: The C64's built-in BASIC interpreter contains a set of subroutines which perform various tasks on numbers in floating point format, allowing BASIC to use real numbers. These routines may also be called from the user's own machine code programs, to handle real numbers in the range ±2.93873588·10^{−38} to ±1.70141183·10^{38}.

## Contents |

## [edit] How it works

A real number *T* in the floating point format consists of a *mantissa* *m* and an integer *exponent* *E*, which are "selected" so that

*T*=*m*· 2^{E}

The mantissa is always a number in the range from 1 to 2, so that 1 ≤ *m* < 2, and it's stored as a fixed-decimal binay real; a number that begins with a one and the decimal point, followed by several binary decimals (31 of them, in the case of the 64's BASIC routines).

The exponent is an integer with some special provisions for handling negative exponents (i.e. floating point real numbers less than 1): The 64 stores the exponent as the number *E* + 129, so that an exponent of 2 is stored as 131 (129 + 2), and an exponent of −2 as 127 (129 − 2). Exponent 128 is reserved for representing the zero.

Besides the mantissa and exponent, a seperate sign bit indicates whether the entire floating point number is to be perceived as positive or negative. Together, the three parts thus "cover" any real number (within the aforementioned range) except zero. To "indicate" that a floating point number equals zero, the C-64 reserves one exponent value, 0 (which would otherwise indicate an exponent of −128), to "flag" that the whole floating point number is 0, regardless of the value of the accompanying mantissa.

## [edit] In the 64

Two regions in zeropage are allocated for working with floating point numbers:

- One is called FAC, for
**F**loating Point**Ac**cumulator:- Address 97/$61 is the exponent byte
- Addresses 98–101/$62–$65 hold the four-byte (32 bit) mantissa
- Address 102/$66 stores the sign in it's most significant bit; off for positive, on for negative.

- The other is called ARG, for Floating Point
**ARG**ument. It's arranged in the same way as FAC, only eight bytes further up:- Address 105/$69 holds the exponent byte
- Addresses 106–109/$6A–$6D hold the four-byte mantissa
- Address 110/$6E holds the sign in it's most significant bit; off for positive, on for negative.

Note that this amounts to six bytes per floating point number, but the routines provided for moving numbers between FAC, ARG and arbitrary RAM addresses use a compression "trick" so that floating point numbers stored in RAM only take up five bytes: Since the mantissa is always in the 1-to-2 range, the first binary digit will always be a "1" — no need to store that. When storing a number in RAM, that "invariant 1" is replaced by the sign bit, and when reading numbers from RAM, the sign bit is moved to the seperate sign byte in FAC or ARG, and the invariant first mantissa digit is restored to "1".

## [edit] Conversion example

- Exponent: exp-129
- Mantissa: (m4 >= 128 ? -1 : +1) * (1 + (m4 && 0x7f) >> 7 + m3 >> 15 + m2 >> 23 + m1 >> 31) ; with "x >> y" as "float multiply x by 2^-y"

exp m4 m3 m2 m1 98 35 44 7A 00 - some constant in hex 152 53 68 122 0 - same in dec 10011000 00110101 01000100 01111010 00000000 - same in bin ^sign bit In this case: Exponent = 152 - 129 = 23 ; dec Mantissa = 1.0110101010001000111101000000000 ; bin Mantissa = +1 * (1 + 53 >> 7 + 68 >> 15 + 122 >> 23 + 0 >> 31) ; dec = 1 + 0 * 2^-1 + 1 * 2^-2 + 1 * 2^-3 + 0 * 2^-4 + 1 * 2^-5 + ... = 1.41615223884583 So the number is... 1.41615223884583 * 2^23 = 11879546.0

## [edit] Using floating point routines

Just like the CPU's accumulator plays a central role in much of what the machine does, the FAC and ARG are the "hubs" of floating point calculations: Numbers to be processed are stored in FAC and ARG, and after calling the relevant routine with a JSR the result is "delivered" in FAC.

Where other RAM locations must be specified, the A/Y register combination is used, wherein the low-byte of the memory address is stored in A and the high byte is stored in Y. Similarly, when converting to and from absolute, 16-bit, signed integer values, the A/Y combination is used.

Finally, the QINT routine indicated below stores the 32 bit signed value in FAC+1 through FAC+4, with the highest order byte starting in FAC+1 and the lowest order byte in FAC+4.

### [edit] Routines for moving (copying) numbers

Label | Address | Description | |

Dec. | Hex. | ||

CONUPK | 47756 | BA8C | Fetch a number from a RAM location to ARG |

MOVEF | 48143 | BBFC | Copy a number currently in ARG, over into FAC |

MOVFA | 48124 | BC0F | Copy a number currently in FAC, over into ARG |

MOVFM | 48034 | BBA2 | Fetch a number from a RAM location to FAC |

MOVMF | 48084 | BBD4 | Store the number currently in FAC, to a RAM location. Uses X and Y rather than A and Y to point to RAM. |

### [edit] Routines for converting between floating point and other formats

Label | Address | Description | |

Dec. | Hex. | ||

FACINX | 45482 | B1AA | Convert number in FAC to 16-bit signed integer |

FIN | 48371 | BCF3 | Convert number expressed as a zero-terminated PETSCII string, to floating point number in FAC |

FOUT | 48605 | BDDD | Convert number in FAC to a zero-terminated PETSCII string |

GIVAYF | 45969 | B391 | Convert 16-bit signed integer to floating point number in FAC |

QINT | 48283 | BC9B | Convert number in FAC to 32-bit signed integer |

### [edit] Routines for performing calculations

Label | Address | Description | |

Dec. | Hex. | ||

ABS (ROM routine) | 48216 | BC58 | Performs the ABS function on the number in FAC |

ATN (ROM routine) | 58126 | E30E | Performs the ATN function on the number in FAC |

COS (ROM routine) | 57956 | E264 | Performs the COS function on the number in FAC |

DIV10 | 47870 | BAFE | Divide the number held in FAC by 10 |

EXP (ROM routine) | 49133 | BFED | Performs the EXP function on the number in FAC |

FADD | 47207 | B867 | Adds the number in FAC with one stored in RAM |

FADDT | 47210 | B86A | Adds the numbers in FAC and ARG |

FDIV | 47887 | BB0F | Divides a numer stored in RAM by the number in FAC |

FDIVT | 47890 | BB12 | Divides the number in ARG by the number in FAC |

FMULT | 47656 | BA28 | Multiplies a number from RAM and FAC (clobbers ARG) |

FPWR | 49016 | BF78 | Raises a number stored ín RAM to the power in FAC |

FPWRT | 49019 | BF7B | Raises the number in ARG to the power in FAC |

FSUB | 47184 | B850 | Subtracts the number in FAC from one stored in RAM |

FSUBT | 47187 | B850 | Subtracts the number in FAC from the number in ARG |

INT (ROM routine) | 48332 | BCCC | Performs the INT function on the number in FAC |

LOG (ROM routine) | 47594 | B9EA | Performs the LOG function on the number in FAC |

NEGOP | 49076 | BFB4 | Switches sign on the number in FAC, if non-zero |

POLY | 57411 | E043 | Evaluates a polynomial for the value given in FAC |

POLY2 | 57433 | E059 | Evaluates a polynomial with odd powers only, for the value given in FAC |

SIN (ROM routine) | 57963 | E26B | Performs the SIN function on the number in FAC |

SGN (ROM routine) | 48185 | BC39 | Performs the SGN function on the number in FAC |

SQR (ROM routine) | 49009 | BF71 | Performs the SQR function on the number in FAC |

TAN (ROM routine) | 58036 | E2B4 | Performs the TAN function on the number in FAC |

### [edit] Routines for comparing numbers

Label | Address | Description | |

Dec. | Hex. | ||

FCOMP | 48219 | BC5B | Compares the number in FAC against one stored in RAM. The result of the comparison is stored in A. Zero (0) indicates the values were equal. One (1) indicates FAC was greater than RAM and negative one (-1 or $FF) indicates FAC was less than RAM. Also sets processor flags (N,Z) depending on whether the number in FAC is zero, positive or negative |

SIGN | 48171 | BC2B | Sets processor flags (N,Z) depending on whether the number in FAC is zero, positive or negative |