A Portable File System Interface for Forth


                       W. M. Bradley

                       295 Hans Ave.
                  Mountain View, CA 94040
                       (415) 961-1302


                          _A_B_S_T_R_A_C_T

          A file system interface for  Forth  has  been
     designed  and  implemented.  This interface may be
     used for Forth systems which run under almost  any
     native  operating system, or for stand-alone Forth
     systems.  Native operating system files  are  used
     where available.  The words which the program uses
     to access files are the  same  regardless  of  the
     native  operating system.  The interface is simple
     yet general.  Performance  is  excellent;  loading
     from  files  is  roughly the same speed as loading
     from screens, and sometimes faster.  This work  is
     placed in the public domain.


W. M. Bradley          March 25, 1986     Page 1 of 16 pages


_A_b_s_t_r_a_c_t

     A file system interface for Forth has been designed and
implemented.   This  interface may be used for Forth systems
which run under almost any native operating system,  or  for
stand-alone  Forth  systems.   Native operating system files
are used where available.  The words which the program  uses
to  access  files  are  the  same  regardless  of the native
operating system.  The  interface  is  simple  yet  general.
Performance  is excellent; loading from files is roughly the
same speed as loading from screens,  and  sometimes  faster.
This work is placed in the public domain.

_M_o_t_i_v_a_t_i_o_n

     The need for a file system  for  Forth  is  often  dis-
cussed.  Some of the advantages of files over BLOCKs are:

o    Remembering which BLOCKs  make  up  an  application  is
     painful  even  when  the  storage medium is a 200 Kbyte
     floppy.  If you are using a 10 Mbyte hard disk,  it  is
     nearly  impossible.   File  systems  let you remember a
     name instead of several numbers.

o    Making backups with BLOCKs is  difficult,  because  the
     backup medium often does not have the same set of block
     numbers free as the original medium.  File  systems  do
     not have this problem.

o    Portability of source code is reduced, because the load
     screens  reference block numbers, which usually have to
     be changed  when  moving  the  application  to  another
     machine.

o    While 1K blocks tend to constrain definitions to a rea-
     sonable length, they also impose an artificial boundary
     that can lead to hard-to-read  code.   This  can  occur
     when you find yourself having to cram a definition onto
     an almost-full screen.  The first thing to go is  often
     the  spacing  that  contributes  to readable structure,
     followed by the stack comments.

o    BLOCKS waste disk space because screens usually contain
     a  large  percentage  of  trailing blanks which pad the
     line lengths out to C/L characters.

o    Most other operating systems use files.  It  is  diffi-
     cult  to  share data between Forth blocks and the files
     produced by other operating systems.

_R_e_i_n_v_e_n_t_i_n_g _t_h_e _W_h_e_e_l

     A number of proposals for file systems have appeared in
the  literature  [1,2,3,4,5,9].   Why  should  I  invent yet


W. M. Bradley          March 25, 1986     Page 2 of 16 pages


another file system?  The file  systems  that  I  have  seen
don't meet all the requirements that I consider mimportant.

o    While it is theoretically nice to  define  a  wonderful
     file  system  for  Forth  that  runs  without any other
     "inferior" operating system, in practice other  operat-
     ing  systems exist and co-exist with Forth.  It is very
     useful to be able to share files with an operating sys-
     tem that already exists on your machine.  For this rea-
     son, the Forth file system should use the same files as
     the  "other" operating system, whatever that happens to
     be.  This disqualifies file systems which  "start  from
     scratch" (I don't mean to belittle the contributions of
     others in defining nice file systems, I just think that
     the ability to use other systems' files is too valuable
     to give up).

o    Despite the use of the other operating systems'  files,
     the  words that Forth uses to access those files should
     not vary from system to system.  File system interfaces
     which  just  allow  you to execute the native operating
     system's system call from Forth lose on this count.

o    Some "file systems" are just  interfaces  to  operating
     system  files  so that BLOCKs may actually be stored in
     files.  This is useful but doesn't solve  most  of  the
     problems presented in the previous section.

o    The file system should be available to  everybody.   In
     the context of Forth, this means PUBLIC DOMAIN code!

_S_o _h_e_r_e _i_t _i_s ...

     I present for your consideration a file  system  inter-
face  which  meets  my  requirements.   It  is in the public
domain.  It can run on top of virtually any operating system
and  use  that  operating  system's native files.  The words
that a Forth application uses to access those files are  the
same no matter which operating system is the base.  A compa-
tible stand-alone version is possible,  although  I  haven't
implemented  that.   The  performance is very good - loading
from a file is approximately the same speed as loading  from
screens.   In  some  cases  the  file  loads faster than the
screens!

     An additional  benefit  is  device  independence.   The
exact  same words that are used to access conventional files
may also be used  to  access  other  I/O  devices,  such  as
printers, modems, terminals, the console, etc.

     This system has been ported to a Z-80 CP/M|= system  and
a  68000  UNIX|-  system.   I  think  that   it   is   easily
_________________________
|- UNIX is a trademark of Bell Laboratories.


W. M. Bradley          March 25, 1986     Page 3 of 16 pages


implementable for just about any existing operating system.

_P_h_i_l_o_s_o_p_h_y

     A file is viewed as a sequence of bytes.  The bytes may
be  accessed  either sequentially or randomly.  Any file may
be accessed either way, and sequential and  random  accesses
may  be  arbitrarily interspersed.  There is no concept of a
"record"; applications which need such  an  abstraction  may
easily  provide  it  themselves  using  the primitives which
exist.  This philosophy is modeled after the UNIX  [6]  file
system.   The simple yet sufficient file model is one of the
reasons why UNIX
 has achieved such popularity.

     Some operating systems (e.g. CP/M) insist on imposing a
record  structure on their files.  My file system copes with
this, and converts it to a byte-oriented interface  so  that
the  Forth programmer need not worry about the record struc-
ture imposed by the underlying operating system.

     The interface to this file system is specified  at  two
different  levels.  The most important interface (called the
"high-level interface"), the only one that the  applications
programmer  sees, is the set of WORDs that a program uses to
access files.  The other interface (the  "low  level  inter-
face")  is  the  set  of  WORDs that the file system uses to
actually access the data from whatever native operating sys-
tem  is  being  used.   It is this low-level interface which
makes it possible to easily port the  file  system  to  dif-
ferent native operating systems.

_F_i_l_e _N_a_m_e_s

     Since the file system is intended to be compatible with
many different operating systems, and since no two operating
systems use the same naming conventions for files, the  file
system  imposes no restrictions of its own on file names.  A
file name is simply a counted  string  in  the  usual  Forth
sense.   The  string  may contain any sequence of characters
that is a legal name for whatever native operating system is
being  used.  This presents a minor portability problem when
moving an application between operating systems.  My experi-
ence  with  porting  other code between dissimilar operating
systems shows that this is not a terrible problem.

     The principle of carefully specifying the  the  set  of
words  that  are  used  to access the bytes of a file, while
leaving unspecified the details of the  file  system  naming
and  hierarchy,  has  been  used successfully before.  The C
language  "standard  I/O"  library  [7]  and  the  primitive
_________________________
|= CP/M is a Trademark of Digital Research.


W. M. Bradley          March 25, 1986     Page 4 of 16 pages


routines used in the "Software Tools" [8] package  both  use
this  this  concept.  Both of these systems have been ported
to a large number of different machines and  operating  sys-
tems.

_A_d_d_r_e_s_i_n_g

     The bytes wihtin a file may be thought of as being num-
bered  from 0 up to one less than the number of bytes in the
file.  For random accesses to files, the address  is  speci-
fied  with  each  access.   The  address is a 32-bit number,
allowing files to be up to 2**32 bytes (about 4* 10**9).  If
the  native  operating  system you are using doesn't support
files this big, then it becomes the limiting factor.  A file
may  be  thought  of as an alternate address space, allowing
things like meta-compilation to a file, etc.

_B_u_f_f_e_r_s

     The file system code's primary job is to manage  a  set
of buffers (which are not necessarily the same as the set of
buffers managed by BLOCK, etc.).  The buffers have two  pur-
poses.   First,  they  are  necessary for converting record-
oriented operating system or disk interfaces  to  the  byte-
oriented  interface  provided  by  the file system.  Second,
they are the means whereby the performance of the system  is
kept  high,  since  most accesses to a file are satisfied by
bytes in the buffer, thus reducing the number of times  that
the  operating  system  must be called to perform I/O opera-
tions.  The program using the  file  system  need  not  know
about  the  existence of the buffering.  Particularly impor-
tantly, a Forth WORD may cross a buffer boundary,  so  there
is  no  need to make sure that source code lines up with any
boundaries.

_F_i_l_e _D_e_s_c_r_i_p_t_o_r_s

     The primary "handle" that is used to access a  file  is
the  "file  descriptor".   A file descriptor is actually the
address of a memory area that contains information about the
file  and  about  the buffer associated with it.  The system
manages  a  fixed  number  of  file  descriptors  which  are
automatically allocated and freed.

_H_i_g_h _L_e_v_e_l _I_n_t_e_r_f_a_c_e _G_l_o_s_s_a_r_y

     A "file descriptor" is a 16-bit number that is used  to
refer  to  a  file  once  the  file has been opened.  A file
descriptor is denoted by "fd" in stack diagrams.

     A "file name" is a counted string which  is  the  ascci
representation of the name of a file.  See the section enti-
tled "File Names".  File names  are  passed  around  on  the
stack  as  the  address  of  the  counted string.  The stack


W. M. Bradley          March 25, 1986     Page 5 of 16 pages


diagram representation is "name".

     The "file pointer" is the number of the byte which will
be returned by the next sequential access to the file.

OPEN ( name mode --- fd )
     Given the name of a  file,  return  a  file  descriptor
     which  may  be  used  to  subsequently access the file.
     "Mode" specifies what sorts of access may be  performed
     on  the file.  Legal values for mode are READ , WRITE ,
     and MODIFY (these names are defined as CONSTANTs).   If
     the  file  is  not  accessible,  for example it doesn't
     exist, then 0 will be returned as the fd value.  A suc-
     cessful  open  will  set the file pointer to 0, so that
     the next sequential access will occur at the  beginning
     of the file.

CLOSE ( fd --- )
     Stop using the given file descriptor to access the file
     that it is currently associated with.  Since there is a
     limited number of file descriptors, a  file  should  be
     CLOSEd  when  it  is  no longer being used, so that the
     file descriptor may be reused.

FPUTC ( byte fd --- )
     Put the byte on the file associated with  "fd"  at  the
     current file pointer position.  Then increment the file
     pointer.

FGETC ( fd --- byte )
     Get the byte from the file associated with  "fd"  which
     is  at  the current file pointer position.  Then incre-
     ment the file pointer.  If there are no more  bytes  in
     the  file, return EOF (-1) instead.  Note that any byte
     is in the range 0..255, whereas EOF is -1, so they  are
     distinct.

FPUTS ( addr count fd --- )
     Put "count" bytes on the  file  associated  with  "fd".
     Bytes are taken from memory starting at address "addr",
     and are put on the file starting at  the  current  file
     pointer  position.   The  file  pointer  is advanced to
     point to the byte just after the last byte put  on  the
     file.

FGETS ( addr count fd --- nread )
     Get at most "count" bytes from the file associated with
     "fd".   Bytes  are  read  from the file starting at the
     current file pointer position, and are  put  in  memory
     starting  at  address "addr".  FGETS returns the number
     of bytes actually read, which may be less  than  count,
     but  not  more.   If FGETS returns 0, it means that the
     file pointer was already at the end of the file.


W. M. Bradley          March 25, 1986     Page 6 of 16 pages


FILE! ( byte d.addr fd --- )
     Store the byte into the file associated  with  "fd"  at
     the  position indicated by "d.addr".  "d.addr" is a 32-
     bit number.  The file pointer is moved to d.addr+1.

FILE@ ( d.addr fd --- byte )
     Fetch a byte from the file associated  with  "fd"  from
     the  position indicated by "d.addr".  "d.addr" is a 32-
     bit number.  The file pointer is moved to d.addr+1.

FSEEK ( d.addr fd --- )
     Move the file pointer for the file associated with "fd"
     to  the  position indicated by "d.addr".  "d.addr" is a
     32-bit number.

GETWORD ( addr fd --- addr )
     Get the next sequence of non-white  charactersfrom  the
     file  associated  with  "fd".   The  characters will be
     stored as  a  counted  string  in  memory  starting  at
     address "addr".  A "non-white" character is any charac-
     ter except blank, tab,  newline,  or  carriage  return.
     Leading "white" characters are skipped.

GETCWORD ( addr delim fd --- addr )
     Get the next sequence of non-delimiter characters  from
     the  file associated with "fd".  The characters will be
     stored as  a  counted  string  in  memory  starting  at
     address  "addr".   Leading  delimiters are NOT ignored.
     If the next character in the file in  a  "delim",  then
     the null string (count=0) will be left at addr !

     The behavior of GETWORD and GETCWORD  with  respect  to
leading  delimiters is consistent with the way that they are
actually used in practice.  Specifically, BL WORD is used to
scan  for the next Forth word, and it is desirable to ignore
leading blanks.  Tabs, newlines, and carriage returns do not
appear in Forth screens, so it is not necessary to deal with
them.  They do appear in files created  by  other  operating
systems,  so  a  file system which attempts to be compatible
with other operating systems must also deal with them.  When
WORD  is  used with some delimiter other than BL, the inten-
tion  is  really  that  leading  delimiters  should  NOT  be
ignored!  For example, when WORD is used to scan for the ")"
which terminates a comment, it is  desireable  to  find  one
immediately  in case of the null comment ( ).  Similarly, ."
and ABORT" should work with null strings, as in ." "  ABORT"
"    .   The  usual  work-around  is to explicitly check the
input stream to see if a delimiter is there  before  calling
WORD.  This just points to the fact that there should be two
different versions of WORD, one which gets non-white charac-
ters,  ignoring  leading white-space, and one which takes an
explicit delimiter off the stack and terminates as  soon  as
it encounters a delimiter.


W. M. Bradley          March 25, 1986     Page 7 of 16 pages


     The actual delimiter which terminated  the  GETWORD  or
GETCWORD  is stored in the global variable DELIMITER.  If an
end-of-file condition terminated the operation, the value of
DELIMITER is EOF.

_F_i_l_e/_d_i_r_e_c_t_o_r_y _m_a_n_a_g_e_m_e_n_t _r_o_u_t_i_n_e_s

     The following words are used for managing entire files.
No  assumption  is  made here about any sort of hierarchy of
files which may be provided by the underlying operating sys-
tem.   In  this respect, the precise behavior of these words
is system-dependent.   Many  systems  have  a  notion  of  a
"current  directory".   If  that  is  the  case, these words
should be implemented so that the files affected are in  the
current  directory unless some other directory is explicitly
specified.  Example:  CP/M has a  "current  default  drive".
Unless the drive name is specified as part of the file name,
the defualt drive should be assumed.

MAKE ( name mode --- )
     Create a new file whose name is "name".   If  the  file
     already exists, the operation will fail.

DELETE ( name --- f )
     Get rid of the file whose name is "name".

FILES ( --- )
     Show the names of the files in  the  current  directory
     (whatever that means!).

_L_o_w _L_e_v_e_l _I_n_t_e_r_f_a_c_e

     These routines must  be  recoded  for  every  different
native  operating  system.   The  higher-level routines call
these routines through execution vectors which are a part of
the  file  descriptor  structure.  This vectoring allows the
file system to associate different low-level  routines  with
different  types  of  files.   This is the mechanism whereby
device-independent I/O is implemented.  For  instance,  CP/M
provides  totally  different system calls for accessing disk
files and other devices like the terminal  or  the  printer.
By  vectoring  these following routines, the file system can
access any of these devices transparently  to  the  applica-
tions program.


ALIGN ( d.byteno --- d.aligned )
     Round the byte  number  "d.byteno"  down  to  the  next
     lowest  boundary which must be enforced.  For instance,
     CP/M disk files are only accessible in 128-byte chunks,
     so  any  request  to read a CP/M sector must start on a
     multiple-of-128 boundary.   If  the  underlying  system
     imposes  no  such  restrictions,  this routine may just
     return d.aligned.   This  is  particualrly  useful  for


W. M. Bradley          March 25, 1986     Page 8 of 16 pages


     strictly-sequential    devices   like   terminals   and
     printers.  If the underlying system  allows  access  to
     files at arbitrary byte positions (this is the case for
     UNIX , for example), it  may  still  be  desireable  to
     align  requests  to  boundaries  which are particularly
     efficient (e.g. - 512-byte boundaries are good for UNIX
     ).  ALIGN is not called by the high-level code, but the
     alignment concept is essential  for  implementing  this
     file   system   on  record-oriented  or  block-oriented
     operating systems, so it is discussed here.

FREAD   ( bflimit bfbase d.byteno fid ---  bflimit'  bfbase'
     d.aligned  bftop  bfend  )bfbase' d.aligned bftop bfend
     )'u>(120u+1n) .br
     Read a bufferfull of data from the file.  "fid" is  the
     internal  file identifier.  "d.byteno" is the byte that
     the upper-level code is really trying to get.  "bfbase"
     is  the  lowest address in the buffer, and "bflimit" is
     one more than the highest address in the buffer.  "fid"
     is the identifier used by the underlying operating sys-
     tem to identify the  file.   The  "open"  routine  (see
     below)  discovers this number and stores it in the file
     descriptor so that it may be passed  to  the  low-level
     routines when they are called.

     Returns "bfend", the highest address (plus one) in  the
     buffer  that  is available for writing.  This is nearly
     always equal to the address of the end of  the  buffer,
     but  may  be  set  to  the  address of the start of the
     buffer if the buffer becomes invalid  for  some  reason
     (mainly when the file is first opened).  "bftop" is one
     more than the highest address that contains actual data
     read  from  the file.  If the read operation was unable
     to get enough data to fill, the buffer,  bftop  may  be
     less  than  bflimit.   "d.aligned"  is the file address
     (the 32-bit byte number) which corresponds to the  byte
     which  is  at  the buffer address "bfbase'".  "bfbase'"
     and "bflimit'" are copies of the arguments.   They  are
     returned  in case the FREAD routine wants to be sophis-
     ticated and return a  different  buffer  than  the  one
     passed to it.  This would be useful if you had a system
     which kept a cache of disk blocks, so that accesses  to
     a  block  which  was already in the cache didn't really
     cause  any  I/O,  but  instead  just  substituted   the
     appropriate buffer.

FWRITE ( bfend bftop bfbase fstart fid --- bfwrittend )
     Write the bytes between bfbase and bftop-1 to the  file
     specified  by  fid.   Fstart  is the file address which
     corresponds to the byte at buffer address  bfbase.   If
     the underlying system only allows data to be written in
     fixed-sized chunks, then data up to address  bfend  may
     be  written.   bfend - bfbase will always be a multiple
     of  the  chunk-size  if  FREAD  works  right.   Returns


W. M. Bradley          March 25, 1986     Page 9 of 16 pages


     bfwrittenend,  which  should  be >= bftop if everything
     goes well.

FCLOSE ( fid --- )
     Do  anything  necessary  to  make  sure  everything  is
     squared-away  when the file is no longer needed.  It is
     not necessary to worry about stuff that  may  be  left-
     over  in  the  higher-level  buffers;  that  will  have
     already been taken care of by the higher-level code.

FSIZE ( fid --- d.size )
     Find out how many bytes are in the file.

OPEN ( name mode --- fd )
     This is the hard one!  OPEN  prepares  the  file  whose
     name  is  "name" for access in mode "mode", returning a
     file descriptor which is used for subsequent access  to
     the file.  If the attempt fails, 0 is returned.  "Mode"
     can be either READ-ONLY, WRITE-ONLY, or READ-WRITE.

     OPEN is tricky, because it must do several things.  The
     first  thing  is  to find a file descriptor to use.  My
     implementation of this is rather simple:  there  are  a
     fixed  number of file descriptors which are kept in one
     place.  There  is  a  routine  FIND-FD  which  searches
     through  the  array  of file descriptors until if finds
     one which isn't in use.  (There is a field in the  file
     descriptor  which  tells  whether  or not in use, among
     other things).

     Another thing that OPEN must do is to set up the execu-
     tion  vectors  for  the  proper access routines for the
     file.  The open routine is very important for providing
     device  independence.   The  way  you  implement device
     independence depends strongly on what you actually have
     to  do  to  access the particular devices.  In general,
     you need a read routine and a write  routine  for  each
     device  that  must  be  accessed  using its own special
     code.  The open routine may need to look  at  the  file
     name  and  decide which read and write routines to use.
     CP/M is a good example here.  In CP/M,  besides  normal
     disk files there are 16 special names that refer to I/O
     devices (e.g. CON:, LST:).  There are several different
     system calls that must be used to access these devices.
     The OPEN routine for CP/M first checks to  see  if  the
     file  name  given  to  it  matches one of these special
     names.  If so, the addresses of appropriate READ and/or
     WRITE  routines  are stored in the execution vectors in
     the file descriptor.  If the name doesn't match one  of
     the  special names, then OPEN tries to open a disk file
     with the specified name.

     Finally, OPEN must setup the file in the  proper  mode.
     This  is done by using special READ and WRITE routines.


W. M. Bradley          March 25, 1986    Page 10 of 16 pages


     If the mode is WRITE-ONLY, the READ routine is  set  to
     NULLREAD, which basically does nothing.  If the mode is
     READ-ONLY, the WRITE routine is set to NULLREAD,  which
     causes an error condition.

     Attempts to OPEN a non-existent file fail, returning 0.

_S_o_u_r_c_e _C_o_d_e

     This code consists of a portable section and a  system-
specific  section.   The  system-specific code has currently
been implemented for both CP/M and  68000  UNIX.   Only  the
portable section is shown in this paper.  A companion paper,
"Implementations of a Portable File System Interface", gives
the system-specific code for both CP/M and 68000 UNIX.

     This code has no KNOWN functional errors.  It does have
some  rough  edges which I am currently removing, and it may
have errors that I don't know about yet.

     The code is Forth-83.  A few of the words must run fast
in  order  for the system to perform well.  For these words,
both high level versions, 68000 assembly language  versions,
and 8080 machine language versions are given.

_E_f_f_i_c_i_e_n_c_y _a_n_d _p_e_r_f_o_r_m_a_n_c_e

     An extremely important performance issue is  the  speed
of  loading from files.  As far I/O is concerned, the impor-
tant thing is  the  efficiency  of  WORD.   The  Forth  word
ENCLOSE  does  a very efficient job of speeding-up WORD when
loading from screens.  ENCLOSE makes an important  simplify-
ing  assumption:  WORDs do not cross BLOCK boundaries.  This
assumption can not be made for files, because from a  users'
perspective, there are no boundaries at all inside the file.
The buffering is completely transparent.  The solution is to
write a code word which collects as much of a word as possi-
ble without going outside  the  current  buffer.   The  high
level  word  "GETWORD" takes care of getting another buffer-
full and calling the code word again if the word being  col-
lected crosses a buffer boundary.  Most of the time the code
word only needs to be called once, since  most  words  don't
cross a buffer boundary.

     Files have one speed  advantage  over  screens.   Since
lines  on  screens  must  be padded out to C/L characters, a
significant portion of the characters stored  and  processed
are  trailing  blanks.   Lines  in files don't need trailing
blanks, so the total number of characters that must be  han-
dled is less.  Another improvement, though less significant,
is the ability to use TAB characters to replace sequences of
blanks,  further reducing the number of characters that must
be  processed.   GETWORD  treats  blanks,   tabs,   carriage
returns,  and  line  feeds as all being the same.  (It might


W. M. Bradley          March 25, 1986    Page 11 of 16 pages


even be a good idea to treat ALL control characters as being
equivalent to blanks for the purposes of WORD).

     The net result is that loading from  files  is  roughly
the  same  speed as from screens, and may be somewhat faster
or slower depending on  the  percentage  of  blanks  in  the
screens.

_C_o_m_p_i_l_i_n_g _f_r_o_m _f_i_l_e_s

     Once you have a file system, it is a simple  matter  to
rewrite the text interpreter to use files instead of screens
or the terminal.  It is especially easy if you have  a  vec-
tored  WORD.  All you have to do is open the file and change
WORD so that it executes  NEWWORD  instead.   (NEWWORD  just
looks  a the delimiter and calls GETWORD if the delimiter is
BL, or calls GETCWORD otherwise.)  It is  easy  to  write  a
word FLOAD which does this.  FLOAD can even be used inside a
file to "include" another file within the first (similar  to
screens loading other screens).

_O_u_t_s_t_a_n_d_i_n_g _I_s_s_u_e_s

     Error reporting needs to be expanded.

     The file system may need to take special action if  the
user's console is opened for output, so that output does not
get stuck in the buffer when the user is expecting to see it
now.   It  is  possible to set the buffer size to some small
number (like 1) if the output is going to the  console,  but
this  is  pretty  slow.  Another alternative is to flush the
buffer whenever a carriage return is sent.  This problem  is
looking for a clean solution.

     An editor is needed!  Many people will want to use  the
file  editor already on their system.  Nevertheless, I think
that a public-domain file editor  written  in  Forth  is  an
important thing to have.  Any volunteers?

     The low-level code for a stand-alone version  needs  to
be written.  This could be built on top of BLOCK (in partic-
ular, see [4]).  Some decisions have to  be  made;  specifi-
cally,  what  does  a  file name look like? and what sort of
hierarchy do files fit into?  Many of the previous file sys-
tem  proposals  attempt  to  answers  these  questions.  The
important point that I would like  to  make  is  that  these
questions are not the most important ones as far as a useful
and portable file system is concerned.  The important  thing
is  to  standardize  on the set of words to access the bytes
within a file, rather than how files are named.

     It is possible to implement BLOCK on top of  this  file
system,  using  a  file  to store an array of BLOCKs.  While
this approach may have  its  uses,  it  seems  like  a  step


W. M. Bradley          March 25, 1986    Page 12 of 16 pages


backwards to me.

     It is possible (and easy!) to write a set of  low-level
routines  to  make the system memory look like a file.  This
has interesting implications for meta-compiling.

_D_i_s_c_l_a_i_m_e_r

     There  are  no  new  ideas  presented  in  this  paper.
Rather,  this file system is an attempt to make available to
the Forth community a file system that is based on  already-
proven  principles  and which is implemented in an efficient
and portable fashion.

_A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s

     The design of this file interface owes  much  to  UNIX,
and  in  particular  to  the  design  of  the "Standard I/O"
library for the C language[7].  C Standard I/O has proven to
be  efficient,  portable, and easy to use, so I felt that it
would be a good model.  The system also benefited from  dis-
cussions  with  Christian  Jacobi  concerning  a file system
interface for Modula-2.

_R_e_f_e_r_e_n_c_e_s

1.   James, John S.,  "Pyramid Files: A Proposed FORTH  File
     System".  _P_r_o_c. _1_9_8_1 _F_O_R_M_L _C_o_n_v_e_n_t_i_o_n.

2.   Rible, John and Dowling, Tom, "A File System in FORTH",
     _P_r_o_c. _1_9_8_1 _F_O_R_M_L _C_o_n_v_e_n_t_i_o_n.

3.   Colburn, Don,  "A  Minimum  Automated  Contiguous  File
     Allocation  System  for  FORTH-79 Environments",  _P_r_o_c.
     _1_9_8_1 _F_O_R_M_L _C_o_n_v_e_n_t_i_o_n.

4.   Helmers, Peter, "File Naming System", _F_o_r_t_h _D_i_m_e_n_s_i_o_n_s,
     Vol. II No. 3.

5.   Delwood, Donald, "A FORTH Based File Handling  System",
     _F_o_r_t_h _D_i_m_e_n_s_i_o_n_s, Vol. IV No. 3.

6.   Ritchie, Dennis, and  Thompson,  Ken.  "The  UNIX  Time
     Sharing System", _C_o_m_m_u_n_i_c_a_t_i_o_n_s _o_f _t_h_e _A_C_M, Vol. 17 No.
     7 (July 1974).  Also see  the  October  1983  issue  of
     _B_y_t_e.

7.   Kernighan, Brian, and Ritchie, Dennis. _T_h_e  _C  _P_r_o_g_r_a_m_-
     _m_i_n_g _L_a_n_g_u_a_g_e.  Prentice-Hall, 1978.

8.   Kernighan, Brian, and Plaugher, Bill.  _S_o_f_t_w_a_r_e  _T_o_o_l_s.
     Prentice-Hall.

9.   See  the  Proceedings  of  the  1981  Rochester   FORTH


W. M. Bradley          March 25, 1986    Page 13 of 16 pages


     Standards Conference for several papers about file sys-
     tems of various sorts.


W. M. Bradley          March 25, 1986    Page 14 of 16 pages


( Buffered I/O  Constants )
-1 CONSTANT EOF
13 CONSTANT CARRET
10 CONSTANT NEWLINE
 8 CONSTANT TAB
BL     256 *  TAB     +  CONSTANT  BLTAB
CARRET 256 *  NEWLINE +  CONSTANT  CRLF

( modes for fmode field )
0  CONSTANT NOT-OPEN
1  CONSTANT READ-ONLY  2 CONSTANT WRITE-ONLY
     READ-ONLY  WRITE-ONLY
OR CONSTANT READ-WRITE


( @C@++ )
( &ptr is the address of a pointer.  Fetch the pointed-to )
( character and post-increment the pointer )

\   : @C@++ ( &ptr --- char ) DUP @ C@   1 ROT +! ;
CODE @C@++ ( &ptr --- char )
   H POP   M E MOV  H INX  M D MOV      ( de gets ptr )
           D INX                        ( ptr++ )
           D M MOV  H DCX  E M MOV      ( put back ptr )
           D DCX   XCHG                 ( hl gets orig ptr )
           A XRA   A D MOV  M E MOV
   D PUSH
NEXT JMP END-CODE

( same but pre-decrement the pointer )
\   : @--C@ ( &ptr --- char ) -1 OVER +!  ( &ptr ) @ C@ ;

( @C!++ @--C! )
( &ptr is the address of a pointer.  Store the character into )
( the pointed-to location and post-increment the pointer )

\   : @C!++ ( char &ptr --- ) SWAP OVER @ C!  1 SWAP +! ;
CODE @C!++ ( char &ptr --- )
   H POP   M E MOV  H INX  M D MOV      ( de gets ptr )
           D INX                        ( ptr++ )
           D M MOV  H DCX  E M MOV      ( put back ptr )
           D DCX   XCHG                 ( hl gets orig ptr )
           D POP  E M MOV      ( store char at orig ptr )
NEXT JMP   END-CODE

( same but pre-decrement the pointer )
\   : @--C! ( char &ptr ) -1 OVER +! ( char &ptr ) @ C! ;


( BUFFERING CONSTANTS )
: FIELD ( offset size --- offset+size )
  CREATE OVER , +              ( DOES> @ + ( High-level )
  ;CODE ( fd --- fd+offset )
  D INX  XCHG  M E MOV  H INX  M D MOV     ( offset in DE )
  H POP  D DAD  H PUSH  NEXT JMP  END-CODE
W. M. Bradley          March 25, 1986    Page 15 of 16 pages
: NEWSTRUCT 0 ;  : W 2 ;   : L 4 ;
NEWSTRUCT


W FIELD BFBASE   W FIELD BFLIMIT
W FIELD BFTOP    W FIELD BFEND     W FIELD BFCURRENT
W FIELD BFDIRTY  W FIELD FMODE     W FIELD FTYPE
L FIELD FSTART   L FIELD FTOP
L FIELD FSIZE    W FIELD FID
W FIELD F.READ   W FIELD F.WRITE   W FIELD F.CLOSE
W FIELD F.IOCTL
CONSTANT FDSIZE


W. M. Bradley          March 25, 1986    Page 16 of 16 pages


( TOBUFADDR DBETWEEN )
: TOBUFADDR ( daddr fd --- bufaddr )
  >R R@ FSTART 2@ D- ( doffset ) DROP ( offset ) R> BFBASE @ +
;
: DBETWEEN ( d dupper_limit dlower_limit --- f )
( true if d is >= lower and < upper )
  6 PICK 6 PICK ( d dupper dlower d ) D> NOT ( d dupper f )  >R
  D< R> AND
;


( SYNC FLUSHBUF                                 wmb,83Aug29 )
: SYNC ( fd --- ) ( if current > top, move up top )
  >R R@ BFTOP @ R@ BFCURRENT @ U<
  IF    R@ BFCURRENT @  DUP R@ BFTOP !  ( bftop@ )
        R@ BFBASE @ - 0   R@ FSTART 2@ D+   R@ FTOP 2!
  THEN  R> DROP ;
: FLUSHBUF ( fd --- successf )
  DUP BFDIRTY @ NOT IF EXIT THEN >R  R@ SYNC
  R@ BFEND @  R@ BFTOP @  R@ BFBASE @  R@ FSTART 2@  R@ FID @
  R@ F.WRITE @ EXECUTE  ( bfwrittenend )
  R@ BFTOP @ U<
  IF   1 ABORT" FlushBuf error " ( 0 )   THEN
  R@ FSIZE 2@  R@ FTOP 2@  D<
  IF   R@ FTOP 2@ R@ FSIZE 2! THEN  R@ FSTART 2@ R@ FTOP 2!
  0 R@ BFDIRTY !  R@ BFBASE @ DUP R@ BFTOP ! R@ BFEND !
  R> DROP 1 ;

( FILLBUF                                       wmb,83Aug29 )
: FILLBUF ( daddr fd --- )
  >R
  R@ BFLIMIT @  R@ BFBASE @  2SWAP  R@ FID @
  ( bflimit bfbase daddr fid )  R@ F.READ @ EXECUTE
     ( newbflimit newbfbase newfstart newbftop newbfend )
  R@ BFEND  !  R@ BFTOP   !  R@ FSTART 2!
  R@ BFBASE !  R@ BFLIMIT !
  R@ BFTOP @  R@ BFBASE @  - S->D  ( nvalid )
     R@ FSTART 2@  D+   R@ FTOP 2!
  R> DROP
;


( ALIGNFRAME CLOSE )
: ALIGNFRAME ( daddr fd --- )
  >R
  R@ FLUSHBUF
  IF    R@ FILLBUF
  ELSE  2DROP          THEN
W. M. Bradley          March 25, 1986    Page 17 of 16 pages
  R> DROP
;


: CLOSE ( fd --- )
  DUP FLUSHBUF DROP
  NOT-OPEN OVER FMODE !
  DUP F.CLOSE @ EXECUTE
;


W. M. Bradley          March 25, 1986    Page 18 of 16 pages


( INBUF? FSEEK )
: INBUF? ( daddr fd --- f )
  >R R@ FSTART 2@ D- IF DROP 0
  ELSE R@ BFEND @ R@ BFBASE @ - U< THEN R> DROP ;
: FSEEK ( daddr fd --- eoff )
  >R R@ SYNC
  2DUP R@ INBUF? NOT ( daddr f )
  IF    ( desired byte is not buffered )
      2DUP R@ ALIGNFRAME ( daddr )
  THEN        ( daddr )
  2DUP R@ FTOP 2@ D< NOT ( daddr f ) ( at EOF? )
  IF   DDROP R@ FTOP 2@  R@ TOBUFADDR  R> BFCURRENT ! 1
  ELSE                   R@ TOBUFADDR  R> BFCURRENT ! 0
  THEN ;


( FILE@ FGETC fast get next character )
: FILE@ ( daddr fd --- byte )
  DUP >R  FSEEK ( errf )
  IF   EOF  R> DROP
  ELSE R> BFCURRENT @C@++
  THEN ;
: FGETC ( fd --- byte )
  >R   R@ BFCURRENT @   R@ BFTOP @   <
  ( true flag means desired character is in the buffer )
  IF   R> BFCURRENT @C@++
  ELSE
     R@ FSTART 2@     R@ BFEND @ R@ BFBASE @ -    M+
     R> FILE@
  THEN ;


( FILE! )
: FILE! ( byte daddr fd --- )
  DUP >R FSEEK ( byte eoff )
  ( if daddr is >= eof address, fseek will set up bfcurrent )
  ( at the eof address, so appending to the file will occur )
  R@ BFCURRENT @C!++
  1 R> BFDIRTY !
;


( FPUTC )
: FPUTC ( byte fd --- )
  >R   R@ BFCURRENT @   R@ BFEND @ U<
  IF   R@ BFCURRENT @C!++
       1 R> BFDIRTY !
  ELSE ( desired character is not in the buffer )
W. M. Bradley          March 25, 1986    Page 19 of 16 pages
       R@ FSTART 2@
          R@ BFCURRENT @ R@ BFBASE @ -


       M+ R> FILE!
  THEN
;


W. M. Bradley          March 25, 1986    Page 20 of 16 pages


( USEBUF USEFID USEROUTINES USESIZE             wmb,83Jun05 )
: USEBUF ( bufend bufstart fd --- )
  >R 0. R@ FSTART 2!
  R@ BFBASE !   R@ BFLIMIT !
  R@ BFBASE @  DUP R@ BFCURRENT !  DUP R@ BFEND !  R@ BFTOP !
  0. R@ FTOP 2!
  0  R@ BFDIRTY !    R> DROP ;
: USEFID ( id fd --- )  FID ! ;
: NODEV 1 ABORT" NODEV CALLED " ; ( useful as filler routine )
: NULLDEV ; ( for a routine that doesn't need to do anything )
: USEROUTINES ( read write close ioctl fd --- )
  >R   CFA R@ F.IOCTL !   CFA R@ F.CLOSE !
       CFA R@ F.WRITE !   CFA R@ F.READ  !
  R>  DROP  ;
: USESIZE ( dsize fd --- ) FSIZE 2! ;


( BETTER ASSEMBLER CONDITIONALS )
20 NEWSTACK LSTK
ASSEMBLER DEFINITIONS
: RESOLVE-LEAVES    FORTH
  BEGIN
     LSTK POP ?DUP
  WHILE
     HERE SWAP !
  REPEAT            ASSEMBLER
;
: BEGIN  HERE    0 LSTK FORTH PUSH ASSEMBLER ;
: WHILE  C,   HERE LSTK FORTH PUSH ASSEMBLER 0 , ;
: UNTIL  C, ,  RESOLVE-LEAVES ;
: REPEAT JMP   RESOLVE-LEAVES ;      : IF C, HERE 0 , ;
: THEN HERE SWAP ! ;           : ELSE TH C3 IF SWAP THEN ;
FORTH DEFINITIONS

( SKIPWHITE find next non-white in this buffer )
CODE SKIPWHITE ( endaddr addr --- endaddr addr' )
  H POP  D POP D PUSH    B PUSH
  BEGIN
     L A MOV  E CMP 0=  IF    H A MOV  D CMP 0<>
  WHILE                 THEN
     M A MOV
     BLTAB B LXI   B CMP  0<> IF    C CMP  0<> IF
     CRLF  B LXI   B CMP  0<> IF    C CMP  0=
  WHILE                         THEN THEN THEN
     H INX
  REPEAT
  B POP   H PUSH
  NEXT JMP
END-CODE


( NEXTWORD find first white char in this buffer after addr )
( addr' is 1st white char, addr'' is 1st char not included )
CODE NEXTWORD ( endaddr addr --- addr addr' addr'' )
  H POP   D POP ( H has current addr ) H PUSH  B PUSH
  BEGIN       L A MOV  E CMP 0=  IF    H A MOV  D CMP 0<>
  WHILE                          THEN
W. M. Bradley          March 25, 1986    Page 21 of 16 pages
     M A MOV
     BLTAB  B LXI    B CMP 0<> WHILE   C CMP 0<> WHILE


     CRLF   B LXI    B CMP 0<> WHILE   C CMP 0<>
  WHILE
     H INX
  REPEAT            B POP  H PUSH
  L A MOV  E CMP 0<> IF  H INX  ELSE
  H A MOV  D CMP 0<> IF  H INX  THEN THEN
  H PUSH        NEXT JMP
END-CODE


W. M. Bradley          March 25, 1986    Page 22 of 16 pages


( CCOPY in 8080 code )
( 12.7 sec for 10000 iterations of copy 5 bytes - 2MHz Z80 )
CODE CCOPY ( endaddr startaddr toaddr --- newtoaddr )
  D POP H POP  XTHL  H PUSH
  B H MOV  C L MOV  B POP   XTHL
  ( H start D to B end )
  BEGIN
     L A MOV  C CMP 0=  IF    H A MOV  B CMP 0<>
  WHILE                 THEN
     M A MOV  XCHG   A M MOV
     XCHG H INX  D INX
  REPEAT
  B POP   D PUSH
  NEXT JMP
END-CODE


( CCOPY in high-level Forth )
( 25.1 seconds for 10000 iterations of copy 5 bytes - 2MHz Z80 )
\   : CCOPY   ( FromEnd FromStart To --- ToEnd )
\     >R SWAP ( FromStart FromEnd )
\     OVER -  ( FromStart n )
\     R@ OVER ( FromStart n To n )  >R
\     SWAP    ( FromStart To n ) CMOVE
\     R> R> +
\   ;


( COPYIN                                    wmb,83Oct08 )
: UMIN ( u1 u2 --- f )
  2DUP U> IF SWAP THEN DROP ;
: COPYIN ( send scurr fd --- send newscurr )
  >R R@ BFEND @  R@ BFCURRENT @  -  ( send scurr bfremaining )
  OVER +                            ( send scurr sallowed )
  3 PICK UMIN                       ( send scurr snew )
  DUP ROT  R@ BFCURRENT @ ( send snew   snew scurr bfcurrent )
  CCOPY                             ( send snew  newbfcurrent )
  R@ BFCURRENT !   1 R> BFDIRTY !
;


( FPUTS                                         wmb,83Oct08 )

: FPUTS ( addr count fd --- )
  >R OVER + SWAP  ( endaddr startaddr )
  BEGIN  R@ COPYIN  2DUP U>
  WHILE
W. M. Bradley          March 25, 1986    Page 23 of 16 pages
     R@ SYNC  R@ FTOP 2@  R@  FSEEK ( endaddr curraddr eoff )
     DROP ( endaddr curraddr )


  REPEAT
  2DROP R> DROP
;


W. M. Bradley          March 25, 1986    Page 24 of 16 pages


( FILLIT )
( fill buffer and returns the start and end addresses )
( in the newly-filled buffer.  )
: FILLIT ( fd --- endaddr' addr' )
  >R
  R@ FSTART 2@  R@ BFEND @ R@ BFBASE @ -   M+ ( fend )
  R@  FSEEK   ( eoff ) DROP
  R@ BFTOP @  R@ BFCURRENT @
  R> DROP
;


( IGNOREWHITE )
( Skip to the next non-white character, crossing buffer )
( boundaries if necessary )
: IGNOREWHITE ( fd --- bufend addr )
  >R R@ SYNC
  R@ BFTOP @  R@ BFCURRENT @  ( bufend addr )
  BEGIN   SKIPWHITE    ( bufend addr_of_first_non-white )
    DUP R@ BFCURRENT !
    2DUP = IF   2DROP  R@ FILLIT  ( bufend bufcurrent )
                2DUP =
           ELSE 1   THEN
  UNTIL  ( bufend next-addr )
  R> DROP
;


( GETWORD )
20 NEWSTACK WORDPTR
: GETWORD ( fd ToAddr --- ToAddr )
  DUP WORDPTR PUSH   1+ WORDPTR PUSH    >R
  R@ IGNOREWHITE  ( bufend curr_addr )
  BEGIN
      NEXTWORD ( FirstCharAddr LastCharAddr+1 next_in_stream )
      R@ BFCURRENT !
      DUP ROT WORDPTR POP ( Last Last First ToAddr )
      CCOPY               ( Last NewToAddr )
      WORDPTR PUSH        ( Last )  R@ BFCURRENT @ =
      IF   R@ FILLIT  DDUP =    IF DDROP 1 ELSE 0 THEN
      ELSE 1  THEN
  UNTIL  WORDPTR POP WORDPTR TOP@ - 1- ( nchars )
  WORDPTR TOP@ C!   WORDPTR POP
  R> DROP ;

( NEXTCWORD find first delim char in this buffer after addr )
( addr' is first delim, addr'' is first char not included )
CODE NEXTCWORD ( endaddr addr delim --- addr addr' addr'' )
  H POP   L A MOV
  H POP   D POP ( H has current addr ) H PUSH  B PUSH A C MOV
  BEGIN
W. M. Bradley          March 25, 1986    Page 25 of 16 pages
     L A MOV  E CMP 0=  IF    H A MOV  D CMP 0<>
  WHILE                 THEN


     M A MOV   C CMP 0<>
  WHILE
     H INX
  REPEAT            B POP  H PUSH
  L A MOV  E CMP 0<> IF  H INX  ELSE
  H A MOV  D CMP 0<> IF  H INX  THEN THEN
  H PUSH        NEXT JMP
END-CODE


W. M. Bradley          March 25, 1986    Page 26 of 16 pages


( GETCWORD )
: GETCWORD ( char fd ToAddr --- ToAddr )
  DUP WORDPTR PUSH   1+ WORDPTR PUSH    >R ( char ) R@ SYNC
  R@ BFTOP @ R@ BFCURRENT @ ( char bufend curr_addr )
  BEGIN   3 PICK
    NEXTCWORD ( c LastCharAddr+1 next_in_stream )
    R@ BFCURRENT !
    DUP ROT WORDPTR POP ( c Last Last First ToAddr )
    CCOPY               ( c Last NewToAddr )
    WORDPTR PUSH        ( c Last )  R@ BFCURRENT @  =
    IF   R@ FILLIT DDUP = IF DDROP 1 ELSE 0  THEN
    ELSE 1 THEN
  UNTIL
  WORDPTR POP WORDPTR TOP@ - 1- ( c nchars )
  WORDPTR TOP@ C!   DROP   WORDPTR POP
  R> DROP ;

( Descriptor allocation )
8 CONSTANT #FDS
: INIT-FD ( FD --- ) FMODE NOT-OPEN SWAP ! ;
CREATE FDS  #FDS FDSIZE * ALLOT

: INIT-FDS ( --- )
  #FDS FDSIZE * FDS +   FDS
  DO   I INIT-FD FDSIZE /LOOP ;

INIT-FDS

0 FDSIZE * FDS + CONSTANT STDIN
1 FDSIZE * FDS + CONSTANT STDOUT
2 FDSIZE * FDS + CONSTANT STDERR


( descriptor management )
: ?FOPEN ( FD --- F ) FMODE @ NOT-OPEN = NOT ;
: (FIND-FD ( --- FD | 0 ) ( finds a free fd if there is one )
  #FDS FDSIZE * FDS +   FDS  ( end start )
  BEGIN  DDUP >  ( check for end of array )
      IF   DUP ?FOPEN ( END FD F )
      ELSE DROP 0 0   ( END 0 0  )  THEN
  WHILE  FDSIZE +     ( END NEXTFD )
  REPEAT  ( END FD  | END 0 )
  SWAP DROP
;
: FIND-FD ( --- FD ) (FIND-FD ?DUP NOT ABORT" All fds used " ;


W. M. Bradley          March 25, 1986    Page 27 of 16 pages