Extended Addressing Wordset


                    Mitch Bradley, Bill Sebok


                            _A_B_S_T_R_A_C_T

         The arrival of  affordable  32-bit  machines  poses
     some  problems for Forth, which explicitly assumes that
     addresses and stacks are 16 bits wide.   We  propose  a
     consistent scheme for naming Forth addressing operators
     in an environment where addresses larger than  16  bits
     exist.  Portability of Forth code to machines with dif-
     ferent address lengths is explicitly considered in this
     nomenclature.   This naming scheme includes as a subset
     the currently-existing addressing operators.  This pro-
     posal  is  based  on  the  authors' experience in using
     Forth on machines with address  and  data  sizes  other
     than  16 bits, and on discussions held during a working
     group meeting at the 1984 Rochester Forth  Applications
     Conference.


Address and Data sizes

    We define five different sizes:

Byte (Character)
    A byte is 8 bits.  The letter `c' (for character) is used  to
    denote a byte.

Word
    A word is 16 bits.  The letter `w' is used to denote a  word.
    (Unfortunately,  the  word `word' is already used in Forth to
    mean several things, and this adds yet another meaning.  This
    nomenclature  was  chosen  anyway, because some Forth vendors
    are already using it, and it is stupid to quibble over  minor
    points.)

Long
    A long is 32 bits.   The letter `l' denotes a long.

Normal
    A normal is the same width  as  the  Forth  parameter  stack.
    Normals  are  denoted  by `n' or `addr', depending on whether
    the item is a data item or an address.

Double Stack Item


                         March 20, 1986


                              - 2 -


    A double is two stack items.  It is  denoted  either  by  the
    number `2' or by the letter `d'.

Stack Width

    The stack width is whatever  makes  sense  for  the  machine,
either  16 bits or 32 bits.  A good choice for the stack width is
the size of the particular machine's address arithmetic.

    Data or address items that are not as wide as the  stack  are
stored as the least-significant portion of a stack item, with the
high-order bits set to 0.  Items that are wider  than  the  stack
width  are  stored  as  more  than one stack item, with the most-
significant portion of the item closest to the top of the stack.

    Sign extension is considered in a later section.

Default Operators

    The usual Forth operators `@' and `!' operate on  a  `normal'
address and `normal' data, which is to say their address and data
operands are the same width as the stack.

Default-Address Operators

    These  operators  take  a  `normal'  address   and   use   an
explicitly-specified data size.

C@ ( addr -- c )
    fetches a byte using the given `normal' address.  The byte is
    left on the stack with the high-order bits set to 0.

C! ( c addr -- )
    stores the least-significant byte  of  `n'  in  the  location
    specified by the given `normal' address.

W@ ( addr -- w )
    fetches a word (16-bits) using the  given  `normal'  address.
    If the stack width is wider than 16 bits, the word is left on
    the stack with the high-order bits set to 0.

W! ( n addr -- w )
    stores the least-significant word  of  `n'  in  the  location
    specified by the given `normal' address.

L@ ( addr -- l )
    fetches a long (32-bits) using the  given  `normal'  address.
    If  the  stack width is only 16 bits, the long is left on the
    stack as two 16-bit words.  If the stack width is wider  than
    32  bits,  the  long is left on the stack with the high-order
    bits set to 0.

L! ( l addr -- )
    stores the given long `l' in the location  specified  by  the


                         March 20, 1986


                              - 3 -


    given `normal' address.

2@ ( addr -- n1 n2 )
    fetches 2 stack items using the given `normal' address.   The
    item  that is left on top of the stack is the one that was at
    the lower memory address.

2! ( n1 n2 addr -- )
    stores 2 stack items using the given `normal'  address.   The
    item `n2' is stored at the lower memory address.

Explicit Operators

    The first character in  the  name  of  an  explicit  operator
specifies  the  size  of  the  data  operand.  The last character
specifies the size  of  the  address  operand.   These  operators
behave  similarly  to  the  default  operators,  except  that the
address is explicitly either a `word' or a  `long',  rather  than
being whatever the `normal' address is.

@W ( w-addr -- n )

@L ( l-addr -- n )

!W ( n w-addr -- )

!L ( n l-addr -- )

C@W ( w-addr -- c )

C@L ( l-addr -- c )

C!W ( c w-addr -- )

C!L ( c l-addr -- )

W@W ( w-addr -- w )

W@L ( l-addr -- w )

W!W ( w w-addr -- )

W!L ( w l-addr -- )

L@W ( w-addr -- l )

L@L ( l-addr -- l )

L!W ( l w-addr -- )

L!L ( l l-addr -- )

2@W ( w-addr -- n1 n2 )


                         March 20, 1986


                              - 4 -


2@L ( l-addr -- n1 n2 )

2!W ( n1 n2 w-addr -- )

2!L ( n1 n1 w-addr -- )

Required Implementation

    It is not necessary for a particular forth system  to  imple-
ment  all the operators.  Only those address sizes and data sizes
which the machine reasonably supports should be implemented.  The
default  operators  ( @ and ! ) and the default-address operators
should be implemented.  An application  should  use  the  default
operators  whenever  possible.   The  explict operators should be
used only when  there  are  special  requirements  which  require
specifically  choosing  the  address size, with the understanding
that the code may not be portable to some machines.  It is  hoped
that  the  existence of explicit operators will actually increase
portablity, since such special  dependencies  will  then  not  be
`hidden'.   Also,  code  which  uses explicit operators should be
portable to machines which support  addresses  and  data  of  the
sizes  used (regardless of the stack width), since the definition
of the explicit operators is independent of the stack width.

Address Modification Operators

    The use of words like `1+' and `2+' to increment addresses is
a  serious  hindrance  to  portability.   For one thing, it works
poorly on word-addressed machines, where adding 1 to  an  address
may  not  be  the right way to select the next byte.  Further, it
fails if code from a 16-bit machine is run on a  machine  with  a
32-bit-wide  stack.   We  propose  the  following operators which
modify `normal' addresses.

A1+  ( addr -- addr-of-next-normal )

CA1+ ( addr -- addr-of-next-byte )

WA1+ ( addr -- addr-of-next-word )

LA1+ ( addr -- addr-of-next-long )

A+  ( addr n -- addr-of-nth-next-normal )

CA+ ( addr n -- addr-of-nth-next-byte )

WA+ ( addr n -- addr-of-nth-next-word )

LA+ ( addr n -- addr-of-nth-next-long )

    These operators are useful for incrementing pointers, because
they scale by the size of the data item specified.  Equivalently,
they may be used for indexing into an array containing  items  of
the given type.


                         March 20, 1986


                              - 5 -


Fetch/Store Operation Matrix


Operand:   Normal   Byte   Word   Long   Double

Address

Normal       @       C@     W@     L@      2@
             !       C!     W!     L!      2!

Word         @W     C@W    W@W    L@W     2@W
             !W     C!W    W!W    L!W     2!W

Long         @L     C@L    W@L    L@L     2@L
             !L     C!L    W!L    L!L     2!L


Stack Operations

    A major problem when dealing  with  multiple  data  sizes  is
managing  the  stack.   The  following stack operations provide a
minimal set of primitives for trying to cope  with  the  problem.
Not  all  possible combinations are listed here, but it should be
obvious how to name those that aren't included.

WNOVER ( w n -- w n w )

WOVER  ( w1 w2 -- w1 w2 w1 )

WNSWAP ( w n -- n w )

WSWAP  ( w1 w2 -- w2 w1 )

LNOVER ( l n -- l n l )

LOVER  ( l1 l2 -- l1 l2 l1 )

LNSWAP ( l n -- n l )

LSWAP  ( l1 l2 -- l2 l1 )

W>R   ( w -- )

WR>   ( -- w )

L>R   ( l -- )

LR>   ( -- l )

Sign Extension

    When a word item is fetched to a  32-bit  stack,  it  is  not
sign-extended.   If sign extension is desired, the following word


                         March 20, 1986


                              - 6 -


may be used.

<W@ ( addr -- sign-extended-w )

Type Conversion Operators

    These words convert stack items of one type to stack items of
another  type.  When a conversion from a larger type to a smaller
type occurs, there may be a loss of information, so these  should
be used with care.

N->L ( n -- l )
    convert a normal stack item to a long.

W->L ( w -- l )
    convert a word stack item to a long without sign extension.

S->L ( w -- l.sign-extended )
    convert a word stack item to a long with sign extension.

L->W ( l -- w )
    convert a long stack item to a  word.   The  most-significant
    word  of  the long is dropped if the stack is 16 bits wide or
    zeroed if the stack is 32 bits wide.

L->N ( l -- n )
    convert a long stack item to a normal.  If the  stack  is  16
    bits wide, this is the same as L->W.  If the stack is 32 bits
    wide, this is a no-op.

Size Constants

    The following words behave like constants.  They leave on the
stack  a  number  which is the size of a particular operand type.
They are useful in cases where it is necessary to perform  expli-
cit  address  arithmetic, such as when ALLOT'ing several longs or
when using +LOOP to step through an array.  Size is  measured  in
"addressable  units",  not  bytes,  because some machines are not
byte-addressable.  An "addressable  unit"  is  the  size  of  the
operand  that an address normally refers to.  On a byte-addressed
machine, it is a byte; on a word-addressed machine, it is a word.

/C ( -- n )
    Leaves the  size  of  a  character  (1  on  a  byte-addressed
    machine).

/W ( -- n )
    Leaves the size of a word. ( 2 on a byte-addressed machine).

/L ( -- n )
    Leaves the size of a long. ( 4 on a byte-addressed machine).

/N ( -- n )
    Leaves the size of a stack item.


                         March 20, 1986


                              - 7 -


64 Bit Items

    In anticipation of the future, the word  `extended'  is  sug-
gested  for  denoting  explicit  64-bit  items.   The  associated
mnemonic letter is `x'.  The names of words for use with  64  bit
items is an obvious extension of the above proposal.

Usage, or How to Write Portable Code

    The default operators (@ !) should be used whenever you don't
care if the operand is 16 bits or 32 bits.  Experience shows that
this is most of the time!  This way your code will work  on  both
16 bit and 32 bit forth systems.

    In those cases where you need a 32 bit number,  use  the  "L"
operators  instead  of  the  traditional  "2" operators (2@ 2SWAP
etc).  The "2" operators should be reserved  for  their  original
purpose,  i.e.  to  manipulate 2 stack items at a time.  Remember
that "2 stack items" is NOT the same thing as a  32  bit  number.
In  particular,  on a 32 bit system, a 32 bit number is one stack
item.  It does not suffice to define,  for  example,  "2SWAP"  to
mean a 32 bit number regardless of stack width, because there are
many times when "2SWAP" is  needed  for  interchanging  pairs  of
stack  items.  Thus the "L" operators are really needed for mani-
pulating 32 bit numbers.

    The double number arithmetic operators as listed in the "Dou-
ble Number Extension Word Set" of the 83 standard are okay to use
on 32 bit numbers, because their definition is consistent with 32
bit  numbers.   For consistency in naming, it is recommended that
additional copies of these operators be made, having names begin-
ning with "L".  For instance,

        : L+ D+ ;

The "D" names should be kept around for  compatibility  with  old
code.

    The use of "ROT" instead of "WLSWAP" is discouraged, as it is
not portable.  Similarly, "-ROT" should not be used for "LWSWAP".

Important Point

    It is not necessary to implement all the above words.  It  is
important  to  agree on a nomenclature, however, so that code may
be shared without confusion about what it means.  The implementa-
tion  of any one of these words is trivial, so it should be prac-
tical to implement them as needed, rather than keeping the  whole
package  around  all  the time.  The different address sizes need
not all be implemented; in fact, many machines cannot  reasonably
support  more  than one address size.  There are machines that do
have different address sizes, so the complete matrix of words  is
defined  in order to prevent the proliferation of different words
for the same function.


                         March 20, 1986


                              - 8 -


Acknowledgements

    The  authors  are  grateful  to  the  members  of  the   File
Systems/Data Structures Working Group at the 1984 Rochester Forth
Applications Conference for the  valuable  inputs  they  provided
during  the  discussion  of the topic of extended addressing.  In
particular, we wish to thank Dennis Ruffer, David Lindbergh, Gary
Nemeth, Ferren MacIntyre, and Walt Pawley.


                         March 20, 1986