Migration to the new server completed.

Please add any new defect you notice here!

Die Migration auf den neuen Server ist abgeschlossen.

Bitte neue Defekte hier vermerken!

Floating point arithmetic

From C64-Wiki
Jump to: navigation, search

Floating point arithmetic is a way to represent and handle a large range of real numbers in a binary form: The C64's built-in BASIC interpreter contains a set of subroutines which perform various tasks on numbers in floating point format, allowing BASIC to use real numbers. These routines may also be called from the user's own machine code programs, to handle real numbers in the range ±2.93873588·10−38 to ±1.70141183·1038.


[edit] How it works

A real number T in the floating point format consists of a mantissa m and an integer exponent E, which are "selected" so that

T = m · 2E

The mantissa is always a number in the range from 1 to 2, so that 1 ≤ m < 2, and it's stored as a fixed-decimal binay real; a number that begins with a one and the decimal point, followed by several binary decimals (31 of them, in the case of the 64's BASIC routines).

The exponent is an integer with some special provisions for handling negative exponents (i.e. floating point real numbers less than 1): The 64 stores the exponent as the number E + 129, so that an exponent of 2 is stored as 131 (129 + 2), and an exponent of −2 as 127 (129 − 2). Exponent 128 is reserved for representing the zero.

Besides the mantissa and exponent, a seperate sign bit indicates whether the entire floating point number is to be perceived as positive or negative. Together, the three parts thus "cover" any real number (within the aforementioned range) except zero. To "indicate" that a floating point number equals zero, the C-64 reserves one exponent value, 0 (which would otherwise indicate an exponent of −128), to "flag" that the whole floating point number is 0, regardless of the value of the accompanying mantissa.

[edit] In the 64

Two regions in zeropage are allocated for working with floating point numbers:

  • One is called FAC, for Floating Point Accumulator:
    • Address 97/$61 is the exponent byte
    • Addresses 98–101/$62–$65 hold the four-byte (32 bit) mantissa
    • Address 102/$66 stores the sign in it's most significant bit; off for positive, on for negative.
  • The other is called ARG, for Floating Point ARGument. It's arranged in the same way as FAC, only eight bytes further up:
    • Address 105/$69 holds the exponent byte
    • Addresses 106–109/$6A–$6D hold the four-byte mantissa
    • Address 110/$6E holds the sign in it's most significant bit; off for positive, on for negative.

Note that this amounts to six bytes per floating point number, but the routines provided for moving numbers between FAC, ARG and arbitrary RAM addresses use a compression "trick" so that floating point numbers stored in RAM only take up five bytes: Since the mantissa is always in the 1-to-2 range, the first binary digit will always be a "1" — no need to store that. When storing a number in RAM, that "invariant 1" is replaced by the sign bit, and when reading numbers from RAM, the sign bit is moved to the seperate sign byte in FAC or ARG, and the invariant first mantissa digit is restored to "1".

[edit] Conversion example

  • Exponent: exp-129
  • Mantissa: (m4 >= 128 ? -1 : +1) * (1 + (m4 && 0x7f) >> 7 + m3 >> 15 + m2 >> 23 + m1 >> 31) ; with "x >> y" as "float multiply x by 2^-y"
     exp       m4       m3       m2       m1
      98       35       44       7A       00      - some constant in hex
     152       53       68      122        0      - same in dec
10011000 00110101 01000100 01111010 00000000      - same in bin
         ^sign bit

In this case:
Exponent = 152 - 129 = 23                                      ; dec
Mantissa = 1.0110101010001000111101000000000                   ; bin
Mantissa = +1 * (1 + 53 >> 7 + 68 >> 15 + 122 >> 23 + 0 >> 31) ; dec
         = 1 + 0 * 2^-1 + 1 * 2^-2 + 1 * 2^-3 + 0 * 2^-4 + 1 * 2^-5 + ...
         = 1.41615223884583

So the number is...
1.41615223884583 * 2^23 = 11879546.0

[edit] Using floating point routines

Just like the CPU's accumulator plays a central role in much of what the machine does, the FAC and ARG are the "hubs" of floating point calculations: Numbers to be processed are stored in FAC and ARG, and after calling the relevant routine with a JSR the result is "delivered" in FAC.

Where other RAM locations must be specified, the A/Y register combination is used, wherein the low-byte of the memory address is stored in A and the high byte is stored in Y. Similarly, when converting to and from absolute, 16-bit, signed integer values, the A/Y combination is used.

Finally, the QINT routine indicated below stores the 32 bit signed value in FAC+1 through FAC+4, with the highest order byte starting in FAC+1 and the lowest order byte in FAC+4.

[edit] Routines for moving (copying) numbers

CONUPK47756BA8CFetch a number from a RAM location to ARG
MOVEF48143BBFCCopy a number currently in ARG, over into FAC
MOVFA48124BC0FCopy a number currently in FAC, over into ARG
MOVFM48034BBA2Fetch a number from a RAM location to FAC
MOVMF48084BBD4Store the number currently in FAC, to a RAM location. Uses X and Y rather than A and Y to point to RAM.

[edit] Routines for converting between floating point and other formats

FACINX45482B1AAConvert number in FAC to 16-bit signed integer
FIN48371BCF3Convert number expressed as a zero-terminated PETSCII string, to floating point number in FAC
FOUT48605BDDDConvert number in FAC to a zero-terminated PETSCII string
GIVAYF45969B391Convert 16-bit signed integer to floating point number in FAC
QINT48283BC9BConvert number in FAC to 32-bit signed integer

[edit] Routines for performing calculations

ABS (ROM routine)48216BC58Performs the ABS function on the number in FAC
ATN (ROM routine)58126E30EPerforms the ATN function on the number in FAC
COS (ROM routine)57956E264Performs the COS function on the number in FAC
DIV1047870BAFEDivide the number held in FAC by 10
EXP (ROM routine)49133BFEDPerforms the EXP function on the number in FAC
FADD47207B867Adds the number in FAC with one stored in RAM
FADDT47210B86AAdds the numbers in FAC and ARG
FDIV47887BB0FDivides a numer stored in RAM by the number in FAC
FDIVT47890BB12Divides the number in ARG by the number in FAC
FMULT47656BA28Multiplies a number from RAM and FAC (clobbers ARG)
FPWR49016BF78Raises a number stored ín RAM to the power in FAC
FPWRT49019BF7BRaises the number in ARG to the power in FAC
FSUB47184B850Subtracts the number in FAC from one stored in RAM
FSUBT47187B850Subtracts the number in FAC from the number in ARG
INT (ROM routine)48332BCCCPerforms the INT function on the number in FAC
LOG (ROM routine)47594B9EAPerforms the LOG function on the number in FAC
NEGOP49076BFB4Switches sign on the number in FAC, if non-zero
POLY57411E043Evaluates a polynomial for the value given in FAC
POLY257433E059Evaluates a polynomial with odd powers only, for the value given in FAC
SIN (ROM routine)57963E26BPerforms the SIN function on the number in FAC
SGN (ROM routine)48185BC39Performs the SGN function on the number in FAC
SQR (ROM routine)49009BF71Performs the SQR function on the number in FAC
TAN (ROM routine)58036E2B4Performs the TAN function on the number in FAC

[edit] Routines for comparing numbers

FCOMP48219BC5BCompares the number in FAC against one stored in RAM. The result of the comparison is stored in A. Zero (0) indicates the values were equal. One (1) indicates FAC was greater than RAM and negative one (-1 or $FF) indicates FAC was less than RAM. Also sets processor flags (N,Z) depending on whether the number in FAC is zero, positive or negative
SIGN48171BC2BSets processor flags (N,Z) depending on whether the number in FAC is zero, positive or negative
Personal tools
Help and Feedback