Extended Addressing Wordset Mitch Bradley, Bill Sebok _A_B_S_T_R_A_C_T The arrival of affordable 32-bit machines poses some problems for Forth, which explicitly assumes that addresses and stacks are 16 bits wide. We propose a consistent scheme for naming Forth addressing operators in an environment where addresses larger than 16 bits exist. Portability of Forth code to machines with dif- ferent address lengths is explicitly considered in this nomenclature. This naming scheme includes as a subset the currently-existing addressing operators. This pro- posal is based on the authors' experience in using Forth on machines with address and data sizes other than 16 bits, and on discussions held during a working group meeting at the 1984 Rochester Forth Applications Conference. Address and Data sizes We define five different sizes: Byte (Character) A byte is 8 bits. The letter `c' (for character) is used to denote a byte. Word A word is 16 bits. The letter `w' is used to denote a word. (Unfortunately, the word `word' is already used in Forth to mean several things, and this adds yet another meaning. This nomenclature was chosen anyway, because some Forth vendors are already using it, and it is stupid to quibble over minor points.) Long A long is 32 bits. The letter `l' denotes a long. Normal A normal is the same width as the Forth parameter stack. Normals are denoted by `n' or `addr', depending on whether the item is a data item or an address. Double Stack Item March 20, 1986 - 2 - A double is two stack items. It is denoted either by the number `2' or by the letter `d'. Stack Width The stack width is whatever makes sense for the machine, either 16 bits or 32 bits. A good choice for the stack width is the size of the particular machine's address arithmetic. Data or address items that are not as wide as the stack are stored as the least-significant portion of a stack item, with the high-order bits set to 0. Items that are wider than the stack width are stored as more than one stack item, with the most- significant portion of the item closest to the top of the stack. Sign extension is considered in a later section. Default Operators The usual Forth operators `@' and `!' operate on a `normal' address and `normal' data, which is to say their address and data operands are the same width as the stack. Default-Address Operators These operators take a `normal' address and use an explicitly-specified data size. C@ ( addr -- c ) fetches a byte using the given `normal' address. The byte is left on the stack with the high-order bits set to 0. C! ( c addr -- ) stores the least-significant byte of `n' in the location specified by the given `normal' address. W@ ( addr -- w ) fetches a word (16-bits) using the given `normal' address. If the stack width is wider than 16 bits, the word is left on the stack with the high-order bits set to 0. W! ( n addr -- w ) stores the least-significant word of `n' in the location specified by the given `normal' address. L@ ( addr -- l ) fetches a long (32-bits) using the given `normal' address. If the stack width is only 16 bits, the long is left on the stack as two 16-bit words. If the stack width is wider than 32 bits, the long is left on the stack with the high-order bits set to 0. L! ( l addr -- ) stores the given long `l' in the location specified by the March 20, 1986 - 3 - given `normal' address. 2@ ( addr -- n1 n2 ) fetches 2 stack items using the given `normal' address. The item that is left on top of the stack is the one that was at the lower memory address. 2! ( n1 n2 addr -- ) stores 2 stack items using the given `normal' address. The item `n2' is stored at the lower memory address. Explicit Operators The first character in the name of an explicit operator specifies the size of the data operand. The last character specifies the size of the address operand. These operators behave similarly to the default operators, except that the address is explicitly either a `word' or a `long', rather than being whatever the `normal' address is. @W ( w-addr -- n ) @L ( l-addr -- n ) !W ( n w-addr -- ) !L ( n l-addr -- ) C@W ( w-addr -- c ) C@L ( l-addr -- c ) C!W ( c w-addr -- ) C!L ( c l-addr -- ) W@W ( w-addr -- w ) W@L ( l-addr -- w ) W!W ( w w-addr -- ) W!L ( w l-addr -- ) L@W ( w-addr -- l ) L@L ( l-addr -- l ) L!W ( l w-addr -- ) L!L ( l l-addr -- ) 2@W ( w-addr -- n1 n2 ) March 20, 1986 - 4 - 2@L ( l-addr -- n1 n2 ) 2!W ( n1 n2 w-addr -- ) 2!L ( n1 n1 w-addr -- ) Required Implementation It is not necessary for a particular forth system to imple- ment all the operators. Only those address sizes and data sizes which the machine reasonably supports should be implemented. The default operators ( @ and ! ) and the default-address operators should be implemented. An application should use the default operators whenever possible. The explict operators should be used only when there are special requirements which require specifically choosing the address size, with the understanding that the code may not be portable to some machines. It is hoped that the existence of explicit operators will actually increase portablity, since such special dependencies will then not be `hidden'. Also, code which uses explicit operators should be portable to machines which support addresses and data of the sizes used (regardless of the stack width), since the definition of the explicit operators is independent of the stack width. Address Modification Operators The use of words like `1+' and `2+' to increment addresses is a serious hindrance to portability. For one thing, it works poorly on word-addressed machines, where adding 1 to an address may not be the right way to select the next byte. Further, it fails if code from a 16-bit machine is run on a machine with a 32-bit-wide stack. We propose the following operators which modify `normal' addresses. A1+ ( addr -- addr-of-next-normal ) CA1+ ( addr -- addr-of-next-byte ) WA1+ ( addr -- addr-of-next-word ) LA1+ ( addr -- addr-of-next-long ) A+ ( addr n -- addr-of-nth-next-normal ) CA+ ( addr n -- addr-of-nth-next-byte ) WA+ ( addr n -- addr-of-nth-next-word ) LA+ ( addr n -- addr-of-nth-next-long ) These operators are useful for incrementing pointers, because they scale by the size of the data item specified. Equivalently, they may be used for indexing into an array containing items of the given type. March 20, 1986 - 5 - Fetch/Store Operation Matrix Operand: Normal Byte Word Long Double Address Normal @ C@ W@ L@ 2@ ! C! W! L! 2! Word @W C@W W@W L@W 2@W !W C!W W!W L!W 2!W Long @L C@L W@L L@L 2@L !L C!L W!L L!L 2!L Stack Operations A major problem when dealing with multiple data sizes is managing the stack. The following stack operations provide a minimal set of primitives for trying to cope with the problem. Not all possible combinations are listed here, but it should be obvious how to name those that aren't included. WNOVER ( w n -- w n w ) WOVER ( w1 w2 -- w1 w2 w1 ) WNSWAP ( w n -- n w ) WSWAP ( w1 w2 -- w2 w1 ) LNOVER ( l n -- l n l ) LOVER ( l1 l2 -- l1 l2 l1 ) LNSWAP ( l n -- n l ) LSWAP ( l1 l2 -- l2 l1 ) W>R ( w -- ) WR> ( -- w ) L>R ( l -- ) LR> ( -- l ) Sign Extension When a word item is fetched to a 32-bit stack, it is not sign-extended. If sign extension is desired, the following word March 20, 1986 - 6 - may be used. L ( n -- l ) convert a normal stack item to a long. W->L ( w -- l ) convert a word stack item to a long without sign extension. S->L ( w -- l.sign-extended ) convert a word stack item to a long with sign extension. L->W ( l -- w ) convert a long stack item to a word. The most-significant word of the long is dropped if the stack is 16 bits wide or zeroed if the stack is 32 bits wide. L->N ( l -- n ) convert a long stack item to a normal. If the stack is 16 bits wide, this is the same as L->W. If the stack is 32 bits wide, this is a no-op. Size Constants The following words behave like constants. They leave on the stack a number which is the size of a particular operand type. They are useful in cases where it is necessary to perform expli- cit address arithmetic, such as when ALLOT'ing several longs or when using +LOOP to step through an array. Size is measured in "addressable units", not bytes, because some machines are not byte-addressable. An "addressable unit" is the size of the operand that an address normally refers to. On a byte-addressed machine, it is a byte; on a word-addressed machine, it is a word. /C ( -- n ) Leaves the size of a character (1 on a byte-addressed machine). /W ( -- n ) Leaves the size of a word. ( 2 on a byte-addressed machine). /L ( -- n ) Leaves the size of a long. ( 4 on a byte-addressed machine). /N ( -- n ) Leaves the size of a stack item. March 20, 1986 - 7 - 64 Bit Items In anticipation of the future, the word `extended' is sug- gested for denoting explicit 64-bit items. The associated mnemonic letter is `x'. The names of words for use with 64 bit items is an obvious extension of the above proposal. Usage, or How to Write Portable Code The default operators (@ !) should be used whenever you don't care if the operand is 16 bits or 32 bits. Experience shows that this is most of the time! This way your code will work on both 16 bit and 32 bit forth systems. In those cases where you need a 32 bit number, use the "L" operators instead of the traditional "2" operators (2@ 2SWAP etc). The "2" operators should be reserved for their original purpose, i.e. to manipulate 2 stack items at a time. Remember that "2 stack items" is NOT the same thing as a 32 bit number. In particular, on a 32 bit system, a 32 bit number is one stack item. It does not suffice to define, for example, "2SWAP" to mean a 32 bit number regardless of stack width, because there are many times when "2SWAP" is needed for interchanging pairs of stack items. Thus the "L" operators are really needed for mani- pulating 32 bit numbers. The double number arithmetic operators as listed in the "Dou- ble Number Extension Word Set" of the 83 standard are okay to use on 32 bit numbers, because their definition is consistent with 32 bit numbers. For consistency in naming, it is recommended that additional copies of these operators be made, having names begin- ning with "L". For instance, : L+ D+ ; The "D" names should be kept around for compatibility with old code. The use of "ROT" instead of "WLSWAP" is discouraged, as it is not portable. Similarly, "-ROT" should not be used for "LWSWAP". Important Point It is not necessary to implement all the above words. It is important to agree on a nomenclature, however, so that code may be shared without confusion about what it means. The implementa- tion of any one of these words is trivial, so it should be prac- tical to implement them as needed, rather than keeping the whole package around all the time. The different address sizes need not all be implemented; in fact, many machines cannot reasonably support more than one address size. There are machines that do have different address sizes, so the complete matrix of words is defined in order to prevent the proliferation of different words for the same function. March 20, 1986 - 8 - Acknowledgements The authors are grateful to the members of the File Systems/Data Structures Working Group at the 1984 Rochester Forth Applications Conference for the valuable inputs they provided during the discussion of the topic of extended addressing. In particular, we wish to thank Dennis Ruffer, David Lindbergh, Gary Nemeth, Ferren MacIntyre, and Walt Pawley. March 20, 1986