Machine language

From C64-Wiki
Jump to navigationJump to search

Machine language, or machine code, is the native "language" of any given CPU: Any given computer only "understands" instructions, i.e. programs, that are written in the machine language native to the type of CPU used in that computer. Programs written in any other programming language require some form of translation in order to run; for instance, the Commodore 64 has a built-in interpreter enabling it to handle programs written in Commodore BASIC V2, and intepreters and compilers are available for other programming languages as well.

A language of numbers[edit | edit source]

A machine language is "made up" of numbers, or more precisely bit patterns, that fit in a byte; i.e. integers in the range 0 thru 255 (0 thru FF in hexadecimal): Each number, called an opcode, has some "action" associated to it. For instance, to the CPU inside the C-64, the number 141 (or $8D in hexdecimal) means "store a number to a specific address in RAM (or an I/O device)", much like POKE does in BASIC.

Just like the syntax of the POKE command requires an address, the machine language "word", or opcode for storing a byte in RAM needs to be followed by some information that says where to store that byte; such "details" are called an argument. Other opcodes need no such supplemental information, analogus to certain BASIC commands such as END and RETURN.

Advantages and limitations[edit | edit source]

A machine language instruction take from approximately 2 to 7 microseconds to complete, compared to milliseconds for a BASIC command (mainly because while running a BASIC program, the computer needs to "decode" every single command as it goes). But since each instruction in a machine perfoms only a miniscule task, it often takes a lot of instructions to do things which may be achieved with a single BASIC command.

BASIC was originally designed to be easy for the (novice) programmer, allowing programs to be formulated in a language much more natural to humans than native machine language. Programming in machine language requires the programmer to break down a given task to much smaller pieces that "fit" into the tiny tasks performed by the available opcodes. Everything in machine code is centered on how the computer works, not how the human works.

Writing and editing machine language programs[edit | edit source]

In order for the computer to "run", or execute a machine language program, the program needs to be stored as a "row" of opcodes and arguments in consecutive addresses in memory.

BASIC Type-In[edit | edit source]

This is reflected in the classical machine languge "type-in", which looks something like this (though in a "real" type-in, there would be several hundred DATA statements like line 50 in this example):

10 FOR A=49152 TO 49160
20 READ B
30 POKE A,B
40 NEXT
50 DATA 169,1,141,32,208,141,33,208,96
60 SYS 49152

The FOR-NEXT loop in lines 10 thru 40 READs the byte values forming a (very short) machine language program, and POKEs them into a row of vacant RAM addresses, before the SYS command starts the machine program.

While it is possible to create and edit machine language programs in this manner with the help of an opcode list, various tools exist to make the programmer's job easier and less error-prone:

Machine language monitor[edit | edit source]

A machine language monitor is a program that allows the programmer to view and edit code in a kind of primitive assembler language: In this form, the opcodes are given (abbreviated) "names", called mnemonics, and any arguments are presented in a way that makes sense to humans rather in the form they are stored in memory. Machine language monitors also provides tools to view and edit memory contents other than code, e.g. byte tables, ASCII/PETSCII texts etc.

Two-pass assembler[edit | edit source]

Even with the help of a machine language monitor, editing anything but the smallest, most straightforward program soon becomes non-trivial, especially when the need arises to remove or (especially) insert instructions in the midst of existing code. Two-pass assemblers allow the programmer to assign "names", called labels to specific points in the code, referring to the label rather than the physical address of that location, which may change every time an instruction is added to or removed from the program.

Compiled high-level languages[edit | edit source]

Compilers represent an attempt to "have the best of both worlds"; the speed of machine language and the ease of BASIC (or other high-level language): The compiler accepts a program written in e.g. BASIC, and does the "translation" into machine language (in principle) once and for all.