**** assem68k.f: A Forth-style assembler for the 68000. Register Names: D0 D1 D2 D3 D4 D5 D6 D7 A0 A1 A2 A3 A4 A5 A6 A7 SR CCR UP W IP RP SP Register usage in the Forth virtual machine: D0 D1 D2 D3 Scratch D4 Floating Point Stack pointer D5 D6 Scratch D7 Relocation Base. 16 bit token versions only A0 A1 A2 Scratch UP (A3) Base address of User Vector W (A4) Word pointer. IP (A5) Interpreter Pointer. RP (A6) Return stack Pointer SP (A7) Parameter Stack Pointer Addressing Modes: Forth Motorola Name Syntax Syntax Dn Dn Data Register Direct An An Address Register Direct An ) (An) Indirect An )+ (An)+ Indirect Postincrement An -) -(An) Indirect Predecrement d An D) d(An) Indirect Displaced d Ri An DI) d(An,Ri) Indirect Indexed (Short) d Ri An DI)L d(An,Ri) Indirect Indexed (Long) xxxx # #xxxx Immediate (Short) xxxxxxxx L# #xxxxxxxx Immediate (Long) xxxx #) xxxx.W Absolute (Short) xxxxxxxx L#) xxxxxxxx.L Absolute (Long) d PCD) d PC relative PC relative d Ri PCDI) d PC rel. + Ri PC relative indexed PCD) and PCDI) are not currently implemented (Forth very seldom needs to use pc-relative addressing). Operation names: The operation names are the same as in the Motorola manual. Overall Syntax: Motorola: op src,dst Forth: src dst op Operation Size: Motorola: Every operation explicitly specifies its size, as in ADD.W or ADD.L Forth: There is a "current size" that is used for operations. The current size may be changed with the words BYTE WORD LONG NORMAL Once the size has been set, it remains that way until changed. NORMAL is either WORD or LONG, depending on the width of the Forth stack. For the 16 bit Forth system, NORMAL is the same as WORD. For the 32 bit Forth system, NORMAL is the same as LONG. As a special case, there are 4 versions of the move instruction: MOVE BMOVE WMOVE LMOVE MOVE is dependent on the current size. BMOVE is always byte move. WMOVE is always word move. LMOVE is always long move. BMOVE, WMOVE, and LMOVE do not affect the current size. Conditionals: The normal Motorola conditionals are provided (e.g. BLE, SVC, DBNE, etc.) but their use is discouraged. Instead, a better facility is provided. The condition and the operation may be specified independently. Here are the condition names; both sets (Forth and Motorola) are provided. Forth: ALWAYS NEVER U> U<= U>= U< 0<> 0= Motorola: ALWAYS NEVER HI LS CC CS NE EQ Forth: VC VS 0>= 0< >= < > <= Motorola: VC VS PL MI GE LT GT LE Any of these condition names can be used preceding one of the following conditional operations: BRIF ( where condition -- ) Assembles a conditional branch to "where" SETIF ( operand condition -- ) Assembles a "Set according to condition codes" DBUNTIL ( where condition data-register -- ) Assembles a "Decrement and Branch according to Condition Codes" Structured conditionals: The Forth assembler provides structured conditionals in the assembler. is one of the condition names described in the previous section. IF ... THEN IF ... ELSE ... THEN BEGIN ... UNTIL BEGIN ... AGAIN BEGIN ... WHILE ... REPEAT BEGIN ... Dn DBUNTIL The structures do not have to fit on one line. They may be nested. The range of the branches assembled by these structures is +- 128 bytes, so the structure must not be extremely large. A conditional structure that large ought to be coded in high-level Forth anyway. Starting assembly: The assembler is really just a Forth vocabulary which contains the words for assembling 68000 code. It is "activated" by adding the assembler vocabulary to the search order. There are also some common ways to control assembly which do more than just put the assembler vocabulary in the search order. ASSEMBLER ( -- ) Makes the assembler vocabulary the CONTEXT vocabulary (i.e. puts it at the top of the search order), thus making all the assembler words accessible. See: VOCABULARY CODE ( -- ) ( Input Stream: word-name ) Defines the word whose name is taken from the input stream and invokes the assembler. When that word is later executed, the machine code that was generated by the assembler will be executed. The word is added to the CURRENT vocabulary, and the CONTEXT vocabulary (the one that is first in the search order) is initially set to ASSEMBLER. The operand size is initially set to NORMAL. This is the most common way to begin assembly. C; ( -- ) "c-semi-colon" Terminates a code definition and allows the name of the corresponding definition to be found in the dictionary. The CONTEXT vocabulary is set to the same as the CURRENT vocabulary (which removes the ASSEMBLER vocabulary from the search order, unless you have explicitly done something funny to the search order while assembling the code). The "next" routine, which causes the Forth interpreter to continue execution with the next word, is automatically added to the end of the word being defined. This is the most common way to end assembly. END-CODE ( -- ) Terminates a code definition and allows the name of the corresponding definition to be found in the dictionary. The CONTEXT vocabulary is set to the same as the CURRENT vocabulary (which removes the ASSEMBLER vocabulary from the search order, unless you have explicitly done something funny to the search order while assembling the code). The "next" routine is NOT automatically added to the end of the code definition. Usually you want "next" to be at the end of the definition, but sometimes the last thing in the definition is a branch to somewhere else, so the "next" at the end is not needed. ;CODE ( -- ) "semi-colon-code" Used in the form: : ... CREATE ... ;CODE ... C; (or END-CODE) Stops compilation, terminates the defining word , executes ASSEMBLER, and sets the operand size to NORMAL. When is later executed in the form: to define the new , the still later execution of will cause the machine code sequence following the ;CODE to be executed. This is analogous to DOES>, except that the behavior of the defined words is specified in assembly language instead of high-level Forth. See: CODE DOES> NEXT ( -- ) Assembler macro which assembles the "next" routine. For the 16 bit version, this actually assembles a branch instruction to the "next" code. For the 32 bit version, assembles the "next" routine in-line. Forth Virtual Machine Considerations: The Forth parameter stack is implemented with A7, but the name "SP" should be used instead of "A7", in case the virtual machine implementation should change. The return stack is implemented with A6, and the name "RP" should be used to refer to it. Items are pushed on the stack with the "-)" addressing mode, and removed from the stack with the ")+" addressing mode. The base address of the user area is in A3 ("UP"). User variable number 124 (for instance) may be accessed with the "124 UP D)" addressing mode. The interpreter pointer is A5 ("IP"). The interpreter is post-incrementing, so when a code definition is being executed, IP points to the token after the one being executed. A "token" is the number that is compiled into the dictionary for each Forth word in a definitions. For the 32 bit token Forth, a token is a 32 bit absolute address. For the 16 bit token Forth, a token is a 16 bit address that is relative to the address in the high word of D7. Registers D0, D1, D2, A0, A1, and A2 may be used freely within code definitions. Their values are not expected to be preserved from one word to the next. When defining a code sequence using ;CODE, it is usually necessary to access some data that is placed in the child word. The "W" (A4) is used for this purpose. When the code sequence starts executing, the W register contains the starting address of the data area of the child word. Stack Item Sizes and Relocation: In the Forth versions which use 32 bit stacks, numbers on the stack are always 32 bits wide. Addresses on either stack are 32 bit absolute addresses. In the version with 16 bit stacks, numbers on the stack are normally 16 bits. "Long" numbers are 32-bits, stored with the most-significant 16 bits at the lower address. This makes it convenient to access the long stack items with a 68000 long-sized instruction. In the 16 bit stack version, normal addresses on the stack are 16 bit offsets from the base of the Forth system. The base is constrained to start on a 64K boundary, and the most significant 16 bits of this base address are always kept in the high word of D7. Before a 16 bit address may be used by the machine language, it must be relocated. This is accomplished with the following two assembly language instructions: D7 WMOVE D7 An LMOVE is usually "SP )+" and An is whichever address register is going to be used to actually access the operand, frequently A0. This works because a word-sized move to a data register does not change the high word of the data register, thus the base address portion of the data register is not affected by the WMOVE. It is possible to write assembly code that will work on either the 32 bit or the 16 bit Forth systems. Whenever you need to take an address off of the stack, use the following word: AMOVE ( source destination -- ) Assembles an appropriate sequence of instructions to move the address specified by the source operand to the destination operand, taking into account the size of the stacks on the Forth system. For the 16 bit stack system, this generates: source D7 WMOVE D7 destination LMOVE For the 32 bit stack system, it generates: source destination LMOVE Usually used in the form: SP )+ A0 AMOVE to get an addess from the stack into an address register where it may be used. Immediate Operands: The following word is intended to assist in the writing of assembler code which assembles without change on both 16 bit and 32 bit systems: N# ( immediate-operand -- ) On a 16 bit system, equivalent to #. On a 32 bit system, equivalent to L#. Coping with Missing Instructions: There are some 68010 instructions that this assembler doesn't know about. These instructions may be assembled by writing them in hex followed by ",". , ( 16-bit-value -- ) Takes a number off the stack and places the low-order 16 bits into the dictionary at HERE, and advances HERE to point to the following location. This version of "," which is in the assembler vocabulary, is distinct from the normal Forth version. This version always puts a 16 bit number in the dictionary, whereas the normal Forth version puts either a 16 bit or a 32 bit number, depending on the size of a normal stack item. Macros: Assembler macros may be written as Forth : definitions. For example, here is a macro which will generate code to relocate a 16 bit address to 32 bits, using D7 as the base address (with the base address constrained to start on a 64K boundary). : rmove ( src-mode dest-mode -- ) [ forth ] swap [ assembler ] d7 wmove d7 [ forth ] swap [ assembler ] lmove ; This could be used while assembling as: "sp )+ a1 rmove" Note the switching to the forth vocabulary in order to get the Forth version of SWAP. This is necessary because the 68000 assembly language also contains a SWAP opcode, and the assembler vocabulary has the 68000 opcode version of SWAP. Oddities, bugs, etc: In conventional 68000 assemblers, the numeric operand to the "quick" instructions (MOVEQ, ADDQ, etc) is explicitly specified as an immediate operand, as in MOVEQ #4,D0 In the Forth assembler, the immediate operator is not necessary or even permitted; the correct syntax is: 4 D0 MOVEQ Note that in these instructions, the numeric operand is not really an immediate operand, in the narrow sense that an immediate operand is specified as Mode 7 Register 4, with the immediate data following the opcode as 1 or 2 words. The register list for a MOVEM instruction is specified as an immediate bit mask, as in: 00ff # sp -) movem which pushes all the address registers. The complementary instruction sp )+ ff00 # movem pops the stack back into the address registers. The assembler ought to be smarter in deciding automatically whether to use # or L#. The real problem occurs for a 16 bit forth, where a 32 bit number is stored on the stack as 2 separate 16 bit stack items. Since the interpretation of an immediate operand as either word-sized or long-sized depends both on the current size and also the particular opcode, the # word doesn't know whether it needs to take a word (16 bits) or a long (32 bits) off the stack, since it hasn't seen the opcode yet. Every now and then it would be nice if the assembler had labels. Usually the structured conditionals provide a nicer syntax and better looking code than labels, but there are times when you want to do something unstructured. You can usually work around this by pushing an address on the stack and backpatching a branch offset into that address later, but it's ugly. The PCD) and PCDI) addressing modes are not currently implemented Forth very seldom needs to use pc-relative addressing. List of opcodes: MOVE LMOVE WMOVE BMOVE MOVEM MOVEP MOVEQ LINK UNLK TRAP TRAPV ILLEGAL RESET NOP STOP DBCC DBGE DBLS DBPL DBCS DBGT DBLT DBT DBEQ DBHI DBMI DBVC DBF DBLE DBNE DBVS DBRA DBUNTIL BRA BSR BHI BLS BCC BCS BNE BEQ BVC BVS BPL BMI BGE BLT BGT BLE RTE RTS RTR JSR JMP NEGX NEG NBCD CLR NOT TST PEA LEA EXG SWAP EXT EXTW EXTL SCC SCS SEQ SF SGE SGT SHI SLE SLS SLT SMI SNE SPL ST SVS SVC BTST BSET BCHG BCLR CHK TAS DIVS DIVU MULS MULU ADD SUB ADDQ SUBQ ABCD SBCD ADDX SUBX ADDI SUBI ADDA SUBA CMP OR AND EOR CMPI ORI ANDI EORI CMPM ( A-REG-POSTINC A-REG-POSTINC -- ) CMPA ROL ROR ROXL ASL ASR ROXR LSL LSR Condition Names: ALWAYS NEVER HI LS CC CS NE EQ VC VS PL MI GE LT GT LE U> U<= U>= U< 0<> 0= 0>= 0< >= < > <= Conditionals: BRIF ( where condition -- ) SETIF ( ea condition -- ) ( set according to condition codes ) DBUNTIL IF ELSE THEN BEGIN WHILE REPEAT UNTIL AGAIN