A Portable File System Interface for Forth W. M. Bradley 295 Hans Ave. Mountain View, CA 94040 (415) 961-1302 _A_B_S_T_R_A_C_T A file system interface for Forth has been designed and implemented. This interface may be used for Forth systems which run under almost any native operating system, or for stand-alone Forth systems. Native operating system files are used where available. The words which the program uses to access files are the same regardless of the native operating system. The interface is simple yet general. Performance is excellent; loading from files is roughly the same speed as loading from screens, and sometimes faster. This work is placed in the public domain. W. M. Bradley March 25, 1986 Page 1 of 16 pages _A_b_s_t_r_a_c_t A file system interface for Forth has been designed and implemented. This interface may be used for Forth systems which run under almost any native operating system, or for stand-alone Forth systems. Native operating system files are used where available. The words which the program uses to access files are the same regardless of the native operating system. The interface is simple yet general. Performance is excellent; loading from files is roughly the same speed as loading from screens, and sometimes faster. This work is placed in the public domain. _M_o_t_i_v_a_t_i_o_n The need for a file system for Forth is often dis- cussed. Some of the advantages of files over BLOCKs are: o Remembering which BLOCKs make up an application is painful even when the storage medium is a 200 Kbyte floppy. If you are using a 10 Mbyte hard disk, it is nearly impossible. File systems let you remember a name instead of several numbers. o Making backups with BLOCKs is difficult, because the backup medium often does not have the same set of block numbers free as the original medium. File systems do not have this problem. o Portability of source code is reduced, because the load screens reference block numbers, which usually have to be changed when moving the application to another machine. o While 1K blocks tend to constrain definitions to a rea- sonable length, they also impose an artificial boundary that can lead to hard-to-read code. This can occur when you find yourself having to cram a definition onto an almost-full screen. The first thing to go is often the spacing that contributes to readable structure, followed by the stack comments. o BLOCKS waste disk space because screens usually contain a large percentage of trailing blanks which pad the line lengths out to C/L characters. o Most other operating systems use files. It is diffi- cult to share data between Forth blocks and the files produced by other operating systems. _R_e_i_n_v_e_n_t_i_n_g _t_h_e _W_h_e_e_l A number of proposals for file systems have appeared in the literature [1,2,3,4,5,9]. Why should I invent yet W. M. Bradley March 25, 1986 Page 2 of 16 pages another file system? The file systems that I have seen don't meet all the requirements that I consider mimportant. o While it is theoretically nice to define a wonderful file system for Forth that runs without any other "inferior" operating system, in practice other operat- ing systems exist and co-exist with Forth. It is very useful to be able to share files with an operating sys- tem that already exists on your machine. For this rea- son, the Forth file system should use the same files as the "other" operating system, whatever that happens to be. This disqualifies file systems which "start from scratch" (I don't mean to belittle the contributions of others in defining nice file systems, I just think that the ability to use other systems' files is too valuable to give up). o Despite the use of the other operating systems' files, the words that Forth uses to access those files should not vary from system to system. File system interfaces which just allow you to execute the native operating system's system call from Forth lose on this count. o Some "file systems" are just interfaces to operating system files so that BLOCKs may actually be stored in files. This is useful but doesn't solve most of the problems presented in the previous section. o The file system should be available to everybody. In the context of Forth, this means PUBLIC DOMAIN code! _S_o _h_e_r_e _i_t _i_s ... I present for your consideration a file system inter- face which meets my requirements. It is in the public domain. It can run on top of virtually any operating system and use that operating system's native files. The words that a Forth application uses to access those files are the same no matter which operating system is the base. A compa- tible stand-alone version is possible, although I haven't implemented that. The performance is very good - loading from a file is approximately the same speed as loading from screens. In some cases the file loads faster than the screens! An additional benefit is device independence. The exact same words that are used to access conventional files may also be used to access other I/O devices, such as printers, modems, terminals, the console, etc. This system has been ported to a Z-80 CP/M|= system and a 68000 UNIX|- system. I think that it is easily _________________________ |- UNIX is a trademark of Bell Laboratories. W. M. Bradley March 25, 1986 Page 3 of 16 pages implementable for just about any existing operating system. _P_h_i_l_o_s_o_p_h_y A file is viewed as a sequence of bytes. The bytes may be accessed either sequentially or randomly. Any file may be accessed either way, and sequential and random accesses may be arbitrarily interspersed. There is no concept of a "record"; applications which need such an abstraction may easily provide it themselves using the primitives which exist. This philosophy is modeled after the UNIX [6] file system. The simple yet sufficient file model is one of the reasons why UNIX has achieved such popularity. Some operating systems (e.g. CP/M) insist on imposing a record structure on their files. My file system copes with this, and converts it to a byte-oriented interface so that the Forth programmer need not worry about the record struc- ture imposed by the underlying operating system. The interface to this file system is specified at two different levels. The most important interface (called the "high-level interface"), the only one that the applications programmer sees, is the set of WORDs that a program uses to access files. The other interface (the "low level inter- face") is the set of WORDs that the file system uses to actually access the data from whatever native operating sys- tem is being used. It is this low-level interface which makes it possible to easily port the file system to dif- ferent native operating systems. _F_i_l_e _N_a_m_e_s Since the file system is intended to be compatible with many different operating systems, and since no two operating systems use the same naming conventions for files, the file system imposes no restrictions of its own on file names. A file name is simply a counted string in the usual Forth sense. The string may contain any sequence of characters that is a legal name for whatever native operating system is being used. This presents a minor portability problem when moving an application between operating systems. My experi- ence with porting other code between dissimilar operating systems shows that this is not a terrible problem. The principle of carefully specifying the the set of words that are used to access the bytes of a file, while leaving unspecified the details of the file system naming and hierarchy, has been used successfully before. The C language "standard I/O" library [7] and the primitive _________________________ |= CP/M is a Trademark of Digital Research. W. M. Bradley March 25, 1986 Page 4 of 16 pages routines used in the "Software Tools" [8] package both use this this concept. Both of these systems have been ported to a large number of different machines and operating sys- tems. _A_d_d_r_e_s_i_n_g The bytes wihtin a file may be thought of as being num- bered from 0 up to one less than the number of bytes in the file. For random accesses to files, the address is speci- fied with each access. The address is a 32-bit number, allowing files to be up to 2**32 bytes (about 4* 10**9). If the native operating system you are using doesn't support files this big, then it becomes the limiting factor. A file may be thought of as an alternate address space, allowing things like meta-compilation to a file, etc. _B_u_f_f_e_r_s The file system code's primary job is to manage a set of buffers (which are not necessarily the same as the set of buffers managed by BLOCK, etc.). The buffers have two pur- poses. First, they are necessary for converting record- oriented operating system or disk interfaces to the byte- oriented interface provided by the file system. Second, they are the means whereby the performance of the system is kept high, since most accesses to a file are satisfied by bytes in the buffer, thus reducing the number of times that the operating system must be called to perform I/O opera- tions. The program using the file system need not know about the existence of the buffering. Particularly impor- tantly, a Forth WORD may cross a buffer boundary, so there is no need to make sure that source code lines up with any boundaries. _F_i_l_e _D_e_s_c_r_i_p_t_o_r_s The primary "handle" that is used to access a file is the "file descriptor". A file descriptor is actually the address of a memory area that contains information about the file and about the buffer associated with it. The system manages a fixed number of file descriptors which are automatically allocated and freed. _H_i_g_h _L_e_v_e_l _I_n_t_e_r_f_a_c_e _G_l_o_s_s_a_r_y A "file descriptor" is a 16-bit number that is used to refer to a file once the file has been opened. A file descriptor is denoted by "fd" in stack diagrams. A "file name" is a counted string which is the ascci representation of the name of a file. See the section enti- tled "File Names". File names are passed around on the stack as the address of the counted string. The stack W. M. Bradley March 25, 1986 Page 5 of 16 pages diagram representation is "name". The "file pointer" is the number of the byte which will be returned by the next sequential access to the file. OPEN ( name mode --- fd ) Given the name of a file, return a file descriptor which may be used to subsequently access the file. "Mode" specifies what sorts of access may be performed on the file. Legal values for mode are READ , WRITE , and MODIFY (these names are defined as CONSTANTs). If the file is not accessible, for example it doesn't exist, then 0 will be returned as the fd value. A suc- cessful open will set the file pointer to 0, so that the next sequential access will occur at the beginning of the file. CLOSE ( fd --- ) Stop using the given file descriptor to access the file that it is currently associated with. Since there is a limited number of file descriptors, a file should be CLOSEd when it is no longer being used, so that the file descriptor may be reused. FPUTC ( byte fd --- ) Put the byte on the file associated with "fd" at the current file pointer position. Then increment the file pointer. FGETC ( fd --- byte ) Get the byte from the file associated with "fd" which is at the current file pointer position. Then incre- ment the file pointer. If there are no more bytes in the file, return EOF (-1) instead. Note that any byte is in the range 0..255, whereas EOF is -1, so they are distinct. FPUTS ( addr count fd --- ) Put "count" bytes on the file associated with "fd". Bytes are taken from memory starting at address "addr", and are put on the file starting at the current file pointer position. The file pointer is advanced to point to the byte just after the last byte put on the file. FGETS ( addr count fd --- nread ) Get at most "count" bytes from the file associated with "fd". Bytes are read from the file starting at the current file pointer position, and are put in memory starting at address "addr". FGETS returns the number of bytes actually read, which may be less than count, but not more. If FGETS returns 0, it means that the file pointer was already at the end of the file. W. M. Bradley March 25, 1986 Page 6 of 16 pages FILE! ( byte d.addr fd --- ) Store the byte into the file associated with "fd" at the position indicated by "d.addr". "d.addr" is a 32- bit number. The file pointer is moved to d.addr+1. FILE@ ( d.addr fd --- byte ) Fetch a byte from the file associated with "fd" from the position indicated by "d.addr". "d.addr" is a 32- bit number. The file pointer is moved to d.addr+1. FSEEK ( d.addr fd --- ) Move the file pointer for the file associated with "fd" to the position indicated by "d.addr". "d.addr" is a 32-bit number. GETWORD ( addr fd --- addr ) Get the next sequence of non-white charactersfrom the file associated with "fd". The characters will be stored as a counted string in memory starting at address "addr". A "non-white" character is any charac- ter except blank, tab, newline, or carriage return. Leading "white" characters are skipped. GETCWORD ( addr delim fd --- addr ) Get the next sequence of non-delimiter characters from the file associated with "fd". The characters will be stored as a counted string in memory starting at address "addr". Leading delimiters are NOT ignored. If the next character in the file in a "delim", then the null string (count=0) will be left at addr ! The behavior of GETWORD and GETCWORD with respect to leading delimiters is consistent with the way that they are actually used in practice. Specifically, BL WORD is used to scan for the next Forth word, and it is desirable to ignore leading blanks. Tabs, newlines, and carriage returns do not appear in Forth screens, so it is not necessary to deal with them. They do appear in files created by other operating systems, so a file system which attempts to be compatible with other operating systems must also deal with them. When WORD is used with some delimiter other than BL, the inten- tion is really that leading delimiters should NOT be ignored! For example, when WORD is used to scan for the ")" which terminates a comment, it is desireable to find one immediately in case of the null comment ( ). Similarly, ." and ABORT" should work with null strings, as in ." " ABORT" " . The usual work-around is to explicitly check the input stream to see if a delimiter is there before calling WORD. This just points to the fact that there should be two different versions of WORD, one which gets non-white charac- ters, ignoring leading white-space, and one which takes an explicit delimiter off the stack and terminates as soon as it encounters a delimiter. W. M. Bradley March 25, 1986 Page 7 of 16 pages The actual delimiter which terminated the GETWORD or GETCWORD is stored in the global variable DELIMITER. If an end-of-file condition terminated the operation, the value of DELIMITER is EOF. _F_i_l_e/_d_i_r_e_c_t_o_r_y _m_a_n_a_g_e_m_e_n_t _r_o_u_t_i_n_e_s The following words are used for managing entire files. No assumption is made here about any sort of hierarchy of files which may be provided by the underlying operating sys- tem. In this respect, the precise behavior of these words is system-dependent. Many systems have a notion of a "current directory". If that is the case, these words should be implemented so that the files affected are in the current directory unless some other directory is explicitly specified. Example: CP/M has a "current default drive". Unless the drive name is specified as part of the file name, the defualt drive should be assumed. MAKE ( name mode --- ) Create a new file whose name is "name". If the file already exists, the operation will fail. DELETE ( name --- f ) Get rid of the file whose name is "name". FILES ( --- ) Show the names of the files in the current directory (whatever that means!). _L_o_w _L_e_v_e_l _I_n_t_e_r_f_a_c_e These routines must be recoded for every different native operating system. The higher-level routines call these routines through execution vectors which are a part of the file descriptor structure. This vectoring allows the file system to associate different low-level routines with different types of files. This is the mechanism whereby device-independent I/O is implemented. For instance, CP/M provides totally different system calls for accessing disk files and other devices like the terminal or the printer. By vectoring these following routines, the file system can access any of these devices transparently to the applica- tions program. ALIGN ( d.byteno --- d.aligned ) Round the byte number "d.byteno" down to the next lowest boundary which must be enforced. For instance, CP/M disk files are only accessible in 128-byte chunks, so any request to read a CP/M sector must start on a multiple-of-128 boundary. If the underlying system imposes no such restrictions, this routine may just return d.aligned. This is particualrly useful for W. M. Bradley March 25, 1986 Page 8 of 16 pages strictly-sequential devices like terminals and printers. If the underlying system allows access to files at arbitrary byte positions (this is the case for UNIX , for example), it may still be desireable to align requests to boundaries which are particularly efficient (e.g. - 512-byte boundaries are good for UNIX ). ALIGN is not called by the high-level code, but the alignment concept is essential for implementing this file system on record-oriented or block-oriented operating systems, so it is discussed here. FREAD ( bflimit bfbase d.byteno fid --- bflimit' bfbase' d.aligned bftop bfend )bfbase' d.aligned bftop bfend )'u>(120u+1n) .br Read a bufferfull of data from the file. "fid" is the internal file identifier. "d.byteno" is the byte that the upper-level code is really trying to get. "bfbase" is the lowest address in the buffer, and "bflimit" is one more than the highest address in the buffer. "fid" is the identifier used by the underlying operating sys- tem to identify the file. The "open" routine (see below) discovers this number and stores it in the file descriptor so that it may be passed to the low-level routines when they are called. Returns "bfend", the highest address (plus one) in the buffer that is available for writing. This is nearly always equal to the address of the end of the buffer, but may be set to the address of the start of the buffer if the buffer becomes invalid for some reason (mainly when the file is first opened). "bftop" is one more than the highest address that contains actual data read from the file. If the read operation was unable to get enough data to fill, the buffer, bftop may be less than bflimit. "d.aligned" is the file address (the 32-bit byte number) which corresponds to the byte which is at the buffer address "bfbase'". "bfbase'" and "bflimit'" are copies of the arguments. They are returned in case the FREAD routine wants to be sophis- ticated and return a different buffer than the one passed to it. This would be useful if you had a system which kept a cache of disk blocks, so that accesses to a block which was already in the cache didn't really cause any I/O, but instead just substituted the appropriate buffer. FWRITE ( bfend bftop bfbase fstart fid --- bfwrittend ) Write the bytes between bfbase and bftop-1 to the file specified by fid. Fstart is the file address which corresponds to the byte at buffer address bfbase. If the underlying system only allows data to be written in fixed-sized chunks, then data up to address bfend may be written. bfend - bfbase will always be a multiple of the chunk-size if FREAD works right. Returns W. M. Bradley March 25, 1986 Page 9 of 16 pages bfwrittenend, which should be >= bftop if everything goes well. FCLOSE ( fid --- ) Do anything necessary to make sure everything is squared-away when the file is no longer needed. It is not necessary to worry about stuff that may be left- over in the higher-level buffers; that will have already been taken care of by the higher-level code. FSIZE ( fid --- d.size ) Find out how many bytes are in the file. OPEN ( name mode --- fd ) This is the hard one! OPEN prepares the file whose name is "name" for access in mode "mode", returning a file descriptor which is used for subsequent access to the file. If the attempt fails, 0 is returned. "Mode" can be either READ-ONLY, WRITE-ONLY, or READ-WRITE. OPEN is tricky, because it must do several things. The first thing is to find a file descriptor to use. My implementation of this is rather simple: there are a fixed number of file descriptors which are kept in one place. There is a routine FIND-FD which searches through the array of file descriptors until if finds one which isn't in use. (There is a field in the file descriptor which tells whether or not in use, among other things). Another thing that OPEN must do is to set up the execu- tion vectors for the proper access routines for the file. The open routine is very important for providing device independence. The way you implement device independence depends strongly on what you actually have to do to access the particular devices. In general, you need a read routine and a write routine for each device that must be accessed using its own special code. The open routine may need to look at the file name and decide which read and write routines to use. CP/M is a good example here. In CP/M, besides normal disk files there are 16 special names that refer to I/O devices (e.g. CON:, LST:). There are several different system calls that must be used to access these devices. The OPEN routine for CP/M first checks to see if the file name given to it matches one of these special names. If so, the addresses of appropriate READ and/or WRITE routines are stored in the execution vectors in the file descriptor. If the name doesn't match one of the special names, then OPEN tries to open a disk file with the specified name. Finally, OPEN must setup the file in the proper mode. This is done by using special READ and WRITE routines. W. M. Bradley March 25, 1986 Page 10 of 16 pages If the mode is WRITE-ONLY, the READ routine is set to NULLREAD, which basically does nothing. If the mode is READ-ONLY, the WRITE routine is set to NULLREAD, which causes an error condition. Attempts to OPEN a non-existent file fail, returning 0. _S_o_u_r_c_e _C_o_d_e This code consists of a portable section and a system- specific section. The system-specific code has currently been implemented for both CP/M and 68000 UNIX. Only the portable section is shown in this paper. A companion paper, "Implementations of a Portable File System Interface", gives the system-specific code for both CP/M and 68000 UNIX. This code has no KNOWN functional errors. It does have some rough edges which I am currently removing, and it may have errors that I don't know about yet. The code is Forth-83. A few of the words must run fast in order for the system to perform well. For these words, both high level versions, 68000 assembly language versions, and 8080 machine language versions are given. _E_f_f_i_c_i_e_n_c_y _a_n_d _p_e_r_f_o_r_m_a_n_c_e An extremely important performance issue is the speed of loading from files. As far I/O is concerned, the impor- tant thing is the efficiency of WORD. The Forth word ENCLOSE does a very efficient job of speeding-up WORD when loading from screens. ENCLOSE makes an important simplify- ing assumption: WORDs do not cross BLOCK boundaries. This assumption can not be made for files, because from a users' perspective, there are no boundaries at all inside the file. The buffering is completely transparent. The solution is to write a code word which collects as much of a word as possi- ble without going outside the current buffer. The high level word "GETWORD" takes care of getting another buffer- full and calling the code word again if the word being col- lected crosses a buffer boundary. Most of the time the code word only needs to be called once, since most words don't cross a buffer boundary. Files have one speed advantage over screens. Since lines on screens must be padded out to C/L characters, a significant portion of the characters stored and processed are trailing blanks. Lines in files don't need trailing blanks, so the total number of characters that must be han- dled is less. Another improvement, though less significant, is the ability to use TAB characters to replace sequences of blanks, further reducing the number of characters that must be processed. GETWORD treats blanks, tabs, carriage returns, and line feeds as all being the same. (It might W. M. Bradley March 25, 1986 Page 11 of 16 pages even be a good idea to treat ALL control characters as being equivalent to blanks for the purposes of WORD). The net result is that loading from files is roughly the same speed as from screens, and may be somewhat faster or slower depending on the percentage of blanks in the screens. _C_o_m_p_i_l_i_n_g _f_r_o_m _f_i_l_e_s Once you have a file system, it is a simple matter to rewrite the text interpreter to use files instead of screens or the terminal. It is especially easy if you have a vec- tored WORD. All you have to do is open the file and change WORD so that it executes NEWWORD instead. (NEWWORD just looks a the delimiter and calls GETWORD if the delimiter is BL, or calls GETCWORD otherwise.) It is easy to write a word FLOAD which does this. FLOAD can even be used inside a file to "include" another file within the first (similar to screens loading other screens). _O_u_t_s_t_a_n_d_i_n_g _I_s_s_u_e_s Error reporting needs to be expanded. The file system may need to take special action if the user's console is opened for output, so that output does not get stuck in the buffer when the user is expecting to see it now. It is possible to set the buffer size to some small number (like 1) if the output is going to the console, but this is pretty slow. Another alternative is to flush the buffer whenever a carriage return is sent. This problem is looking for a clean solution. An editor is needed! Many people will want to use the file editor already on their system. Nevertheless, I think that a public-domain file editor written in Forth is an important thing to have. Any volunteers? The low-level code for a stand-alone version needs to be written. This could be built on top of BLOCK (in partic- ular, see [4]). Some decisions have to be made; specifi- cally, what does a file name look like? and what sort of hierarchy do files fit into? Many of the previous file sys- tem proposals attempt to answers these questions. The important point that I would like to make is that these questions are not the most important ones as far as a useful and portable file system is concerned. The important thing is to standardize on the set of words to access the bytes within a file, rather than how files are named. It is possible to implement BLOCK on top of this file system, using a file to store an array of BLOCKs. While this approach may have its uses, it seems like a step W. M. Bradley March 25, 1986 Page 12 of 16 pages backwards to me. It is possible (and easy!) to write a set of low-level routines to make the system memory look like a file. This has interesting implications for meta-compiling. _D_i_s_c_l_a_i_m_e_r There are no new ideas presented in this paper. Rather, this file system is an attempt to make available to the Forth community a file system that is based on already- proven principles and which is implemented in an efficient and portable fashion. _A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s The design of this file interface owes much to UNIX, and in particular to the design of the "Standard I/O" library for the C language[7]. C Standard I/O has proven to be efficient, portable, and easy to use, so I felt that it would be a good model. The system also benefited from dis- cussions with Christian Jacobi concerning a file system interface for Modula-2. _R_e_f_e_r_e_n_c_e_s 1. James, John S., "Pyramid Files: A Proposed FORTH File System". _P_r_o_c. _1_9_8_1 _F_O_R_M_L _C_o_n_v_e_n_t_i_o_n. 2. Rible, John and Dowling, Tom, "A File System in FORTH", _P_r_o_c. _1_9_8_1 _F_O_R_M_L _C_o_n_v_e_n_t_i_o_n. 3. Colburn, Don, "A Minimum Automated Contiguous File Allocation System for FORTH-79 Environments", _P_r_o_c. _1_9_8_1 _F_O_R_M_L _C_o_n_v_e_n_t_i_o_n. 4. Helmers, Peter, "File Naming System", _F_o_r_t_h _D_i_m_e_n_s_i_o_n_s, Vol. II No. 3. 5. Delwood, Donald, "A FORTH Based File Handling System", _F_o_r_t_h _D_i_m_e_n_s_i_o_n_s, Vol. IV No. 3. 6. Ritchie, Dennis, and Thompson, Ken. "The UNIX Time Sharing System", _C_o_m_m_u_n_i_c_a_t_i_o_n_s _o_f _t_h_e _A_C_M, Vol. 17 No. 7 (July 1974). Also see the October 1983 issue of _B_y_t_e. 7. Kernighan, Brian, and Ritchie, Dennis. _T_h_e _C _P_r_o_g_r_a_m_- _m_i_n_g _L_a_n_g_u_a_g_e. Prentice-Hall, 1978. 8. Kernighan, Brian, and Plaugher, Bill. _S_o_f_t_w_a_r_e _T_o_o_l_s. Prentice-Hall. 9. See the Proceedings of the 1981 Rochester FORTH W. M. Bradley March 25, 1986 Page 13 of 16 pages Standards Conference for several papers about file sys- tems of various sorts. W. M. Bradley March 25, 1986 Page 14 of 16 pages ( Buffered I/O Constants ) -1 CONSTANT EOF 13 CONSTANT CARRET 10 CONSTANT NEWLINE 8 CONSTANT TAB BL 256 * TAB + CONSTANT BLTAB CARRET 256 * NEWLINE + CONSTANT CRLF ( modes for fmode field ) 0 CONSTANT NOT-OPEN 1 CONSTANT READ-ONLY 2 CONSTANT WRITE-ONLY READ-ONLY WRITE-ONLY OR CONSTANT READ-WRITE ( @C@++ ) ( &ptr is the address of a pointer. Fetch the pointed-to ) ( character and post-increment the pointer ) \ : @C@++ ( &ptr --- char ) DUP @ C@ 1 ROT +! ; CODE @C@++ ( &ptr --- char ) H POP M E MOV H INX M D MOV ( de gets ptr ) D INX ( ptr++ ) D M MOV H DCX E M MOV ( put back ptr ) D DCX XCHG ( hl gets orig ptr ) A XRA A D MOV M E MOV D PUSH NEXT JMP END-CODE ( same but pre-decrement the pointer ) \ : @--C@ ( &ptr --- char ) -1 OVER +! ( &ptr ) @ C@ ; ( @C!++ @--C! ) ( &ptr is the address of a pointer. Store the character into ) ( the pointed-to location and post-increment the pointer ) \ : @C!++ ( char &ptr --- ) SWAP OVER @ C! 1 SWAP +! ; CODE @C!++ ( char &ptr --- ) H POP M E MOV H INX M D MOV ( de gets ptr ) D INX ( ptr++ ) D M MOV H DCX E M MOV ( put back ptr ) D DCX XCHG ( hl gets orig ptr ) D POP E M MOV ( store char at orig ptr ) NEXT JMP END-CODE ( same but pre-decrement the pointer ) \ : @--C! ( char &ptr ) -1 OVER +! ( char &ptr ) @ C! ; ( BUFFERING CONSTANTS ) : FIELD ( offset size --- offset+size ) CREATE OVER , + ( DOES> @ + ( High-level ) ;CODE ( fd --- fd+offset ) D INX XCHG M E MOV H INX M D MOV ( offset in DE ) H POP D DAD H PUSH NEXT JMP END-CODE W. M. Bradley March 25, 1986 Page 15 of 16 pages : NEWSTRUCT 0 ; : W 2 ; : L 4 ; NEWSTRUCT W FIELD BFBASE W FIELD BFLIMIT W FIELD BFTOP W FIELD BFEND W FIELD BFCURRENT W FIELD BFDIRTY W FIELD FMODE W FIELD FTYPE L FIELD FSTART L FIELD FTOP L FIELD FSIZE W FIELD FID W FIELD F.READ W FIELD F.WRITE W FIELD F.CLOSE W FIELD F.IOCTL CONSTANT FDSIZE W. M. Bradley March 25, 1986 Page 16 of 16 pages ( TOBUFADDR DBETWEEN ) : TOBUFADDR ( daddr fd --- bufaddr ) >R R@ FSTART 2@ D- ( doffset ) DROP ( offset ) R> BFBASE @ + ; : DBETWEEN ( d dupper_limit dlower_limit --- f ) ( true if d is >= lower and < upper ) 6 PICK 6 PICK ( d dupper dlower d ) D> NOT ( d dupper f ) >R D< R> AND ; ( SYNC FLUSHBUF wmb,83Aug29 ) : SYNC ( fd --- ) ( if current > top, move up top ) >R R@ BFTOP @ R@ BFCURRENT @ U< IF R@ BFCURRENT @ DUP R@ BFTOP ! ( bftop@ ) R@ BFBASE @ - 0 R@ FSTART 2@ D+ R@ FTOP 2! THEN R> DROP ; : FLUSHBUF ( fd --- successf ) DUP BFDIRTY @ NOT IF EXIT THEN >R R@ SYNC R@ BFEND @ R@ BFTOP @ R@ BFBASE @ R@ FSTART 2@ R@ FID @ R@ F.WRITE @ EXECUTE ( bfwrittenend ) R@ BFTOP @ U< IF 1 ABORT" FlushBuf error " ( 0 ) THEN R@ FSIZE 2@ R@ FTOP 2@ D< IF R@ FTOP 2@ R@ FSIZE 2! THEN R@ FSTART 2@ R@ FTOP 2! 0 R@ BFDIRTY ! R@ BFBASE @ DUP R@ BFTOP ! R@ BFEND ! R> DROP 1 ; ( FILLBUF wmb,83Aug29 ) : FILLBUF ( daddr fd --- ) >R R@ BFLIMIT @ R@ BFBASE @ 2SWAP R@ FID @ ( bflimit bfbase daddr fid ) R@ F.READ @ EXECUTE ( newbflimit newbfbase newfstart newbftop newbfend ) R@ BFEND ! R@ BFTOP ! R@ FSTART 2! R@ BFBASE ! R@ BFLIMIT ! R@ BFTOP @ R@ BFBASE @ - S->D ( nvalid ) R@ FSTART 2@ D+ R@ FTOP 2! R> DROP ; ( ALIGNFRAME CLOSE ) : ALIGNFRAME ( daddr fd --- ) >R R@ FLUSHBUF IF R@ FILLBUF ELSE 2DROP THEN W. M. Bradley March 25, 1986 Page 17 of 16 pages R> DROP ; : CLOSE ( fd --- ) DUP FLUSHBUF DROP NOT-OPEN OVER FMODE ! DUP F.CLOSE @ EXECUTE ; W. M. Bradley March 25, 1986 Page 18 of 16 pages ( INBUF? FSEEK ) : INBUF? ( daddr fd --- f ) >R R@ FSTART 2@ D- IF DROP 0 ELSE R@ BFEND @ R@ BFBASE @ - U< THEN R> DROP ; : FSEEK ( daddr fd --- eoff ) >R R@ SYNC 2DUP R@ INBUF? NOT ( daddr f ) IF ( desired byte is not buffered ) 2DUP R@ ALIGNFRAME ( daddr ) THEN ( daddr ) 2DUP R@ FTOP 2@ D< NOT ( daddr f ) ( at EOF? ) IF DDROP R@ FTOP 2@ R@ TOBUFADDR R> BFCURRENT ! 1 ELSE R@ TOBUFADDR R> BFCURRENT ! 0 THEN ; ( FILE@ FGETC fast get next character ) : FILE@ ( daddr fd --- byte ) DUP >R FSEEK ( errf ) IF EOF R> DROP ELSE R> BFCURRENT @C@++ THEN ; : FGETC ( fd --- byte ) >R R@ BFCURRENT @ R@ BFTOP @ < ( true flag means desired character is in the buffer ) IF R> BFCURRENT @C@++ ELSE R@ FSTART 2@ R@ BFEND @ R@ BFBASE @ - M+ R> FILE@ THEN ; ( FILE! ) : FILE! ( byte daddr fd --- ) DUP >R FSEEK ( byte eoff ) ( if daddr is >= eof address, fseek will set up bfcurrent ) ( at the eof address, so appending to the file will occur ) R@ BFCURRENT @C!++ 1 R> BFDIRTY ! ; ( FPUTC ) : FPUTC ( byte fd --- ) >R R@ BFCURRENT @ R@ BFEND @ U< IF R@ BFCURRENT @C!++ 1 R> BFDIRTY ! ELSE ( desired character is not in the buffer ) W. M. Bradley March 25, 1986 Page 19 of 16 pages R@ FSTART 2@ R@ BFCURRENT @ R@ BFBASE @ - M+ R> FILE! THEN ; W. M. Bradley March 25, 1986 Page 20 of 16 pages ( USEBUF USEFID USEROUTINES USESIZE wmb,83Jun05 ) : USEBUF ( bufend bufstart fd --- ) >R 0. R@ FSTART 2! R@ BFBASE ! R@ BFLIMIT ! R@ BFBASE @ DUP R@ BFCURRENT ! DUP R@ BFEND ! R@ BFTOP ! 0. R@ FTOP 2! 0 R@ BFDIRTY ! R> DROP ; : USEFID ( id fd --- ) FID ! ; : NODEV 1 ABORT" NODEV CALLED " ; ( useful as filler routine ) : NULLDEV ; ( for a routine that doesn't need to do anything ) : USEROUTINES ( read write close ioctl fd --- ) >R CFA R@ F.IOCTL ! CFA R@ F.CLOSE ! CFA R@ F.WRITE ! CFA R@ F.READ ! R> DROP ; : USESIZE ( dsize fd --- ) FSIZE 2! ; ( BETTER ASSEMBLER CONDITIONALS ) 20 NEWSTACK LSTK ASSEMBLER DEFINITIONS : RESOLVE-LEAVES FORTH BEGIN LSTK POP ?DUP WHILE HERE SWAP ! REPEAT ASSEMBLER ; : BEGIN HERE 0 LSTK FORTH PUSH ASSEMBLER ; : WHILE C, HERE LSTK FORTH PUSH ASSEMBLER 0 , ; : UNTIL C, , RESOLVE-LEAVES ; : REPEAT JMP RESOLVE-LEAVES ; : IF C, HERE 0 , ; : THEN HERE SWAP ! ; : ELSE TH C3 IF SWAP THEN ; FORTH DEFINITIONS ( SKIPWHITE find next non-white in this buffer ) CODE SKIPWHITE ( endaddr addr --- endaddr addr' ) H POP D POP D PUSH B PUSH BEGIN L A MOV E CMP 0= IF H A MOV D CMP 0<> WHILE THEN M A MOV BLTAB B LXI B CMP 0<> IF C CMP 0<> IF CRLF B LXI B CMP 0<> IF C CMP 0= WHILE THEN THEN THEN H INX REPEAT B POP H PUSH NEXT JMP END-CODE ( NEXTWORD find first white char in this buffer after addr ) ( addr' is 1st white char, addr'' is 1st char not included ) CODE NEXTWORD ( endaddr addr --- addr addr' addr'' ) H POP D POP ( H has current addr ) H PUSH B PUSH BEGIN L A MOV E CMP 0= IF H A MOV D CMP 0<> WHILE THEN W. M. Bradley March 25, 1986 Page 21 of 16 pages M A MOV BLTAB B LXI B CMP 0<> WHILE C CMP 0<> WHILE CRLF B LXI B CMP 0<> WHILE C CMP 0<> WHILE H INX REPEAT B POP H PUSH L A MOV E CMP 0<> IF H INX ELSE H A MOV D CMP 0<> IF H INX THEN THEN H PUSH NEXT JMP END-CODE W. M. Bradley March 25, 1986 Page 22 of 16 pages ( CCOPY in 8080 code ) ( 12.7 sec for 10000 iterations of copy 5 bytes - 2MHz Z80 ) CODE CCOPY ( endaddr startaddr toaddr --- newtoaddr ) D POP H POP XTHL H PUSH B H MOV C L MOV B POP XTHL ( H start D to B end ) BEGIN L A MOV C CMP 0= IF H A MOV B CMP 0<> WHILE THEN M A MOV XCHG A M MOV XCHG H INX D INX REPEAT B POP D PUSH NEXT JMP END-CODE ( CCOPY in high-level Forth ) ( 25.1 seconds for 10000 iterations of copy 5 bytes - 2MHz Z80 ) \ : CCOPY ( FromEnd FromStart To --- ToEnd ) \ >R SWAP ( FromStart FromEnd ) \ OVER - ( FromStart n ) \ R@ OVER ( FromStart n To n ) >R \ SWAP ( FromStart To n ) CMOVE \ R> R> + \ ; ( COPYIN wmb,83Oct08 ) : UMIN ( u1 u2 --- f ) 2DUP U> IF SWAP THEN DROP ; : COPYIN ( send scurr fd --- send newscurr ) >R R@ BFEND @ R@ BFCURRENT @ - ( send scurr bfremaining ) OVER + ( send scurr sallowed ) 3 PICK UMIN ( send scurr snew ) DUP ROT R@ BFCURRENT @ ( send snew snew scurr bfcurrent ) CCOPY ( send snew newbfcurrent ) R@ BFCURRENT ! 1 R> BFDIRTY ! ; ( FPUTS wmb,83Oct08 ) : FPUTS ( addr count fd --- ) >R OVER + SWAP ( endaddr startaddr ) BEGIN R@ COPYIN 2DUP U> WHILE W. M. Bradley March 25, 1986 Page 23 of 16 pages R@ SYNC R@ FTOP 2@ R@ FSEEK ( endaddr curraddr eoff ) DROP ( endaddr curraddr ) REPEAT 2DROP R> DROP ; W. M. Bradley March 25, 1986 Page 24 of 16 pages ( FILLIT ) ( fill buffer and returns the start and end addresses ) ( in the newly-filled buffer. ) : FILLIT ( fd --- endaddr' addr' ) >R R@ FSTART 2@ R@ BFEND @ R@ BFBASE @ - M+ ( fend ) R@ FSEEK ( eoff ) DROP R@ BFTOP @ R@ BFCURRENT @ R> DROP ; ( IGNOREWHITE ) ( Skip to the next non-white character, crossing buffer ) ( boundaries if necessary ) : IGNOREWHITE ( fd --- bufend addr ) >R R@ SYNC R@ BFTOP @ R@ BFCURRENT @ ( bufend addr ) BEGIN SKIPWHITE ( bufend addr_of_first_non-white ) DUP R@ BFCURRENT ! 2DUP = IF 2DROP R@ FILLIT ( bufend bufcurrent ) 2DUP = ELSE 1 THEN UNTIL ( bufend next-addr ) R> DROP ; ( GETWORD ) 20 NEWSTACK WORDPTR : GETWORD ( fd ToAddr --- ToAddr ) DUP WORDPTR PUSH 1+ WORDPTR PUSH >R R@ IGNOREWHITE ( bufend curr_addr ) BEGIN NEXTWORD ( FirstCharAddr LastCharAddr+1 next_in_stream ) R@ BFCURRENT ! DUP ROT WORDPTR POP ( Last Last First ToAddr ) CCOPY ( Last NewToAddr ) WORDPTR PUSH ( Last ) R@ BFCURRENT @ = IF R@ FILLIT DDUP = IF DDROP 1 ELSE 0 THEN ELSE 1 THEN UNTIL WORDPTR POP WORDPTR TOP@ - 1- ( nchars ) WORDPTR TOP@ C! WORDPTR POP R> DROP ; ( NEXTCWORD find first delim char in this buffer after addr ) ( addr' is first delim, addr'' is first char not included ) CODE NEXTCWORD ( endaddr addr delim --- addr addr' addr'' ) H POP L A MOV H POP D POP ( H has current addr ) H PUSH B PUSH A C MOV BEGIN W. M. Bradley March 25, 1986 Page 25 of 16 pages L A MOV E CMP 0= IF H A MOV D CMP 0<> WHILE THEN M A MOV C CMP 0<> WHILE H INX REPEAT B POP H PUSH L A MOV E CMP 0<> IF H INX ELSE H A MOV D CMP 0<> IF H INX THEN THEN H PUSH NEXT JMP END-CODE W. M. Bradley March 25, 1986 Page 26 of 16 pages ( GETCWORD ) : GETCWORD ( char fd ToAddr --- ToAddr ) DUP WORDPTR PUSH 1+ WORDPTR PUSH >R ( char ) R@ SYNC R@ BFTOP @ R@ BFCURRENT @ ( char bufend curr_addr ) BEGIN 3 PICK NEXTCWORD ( c LastCharAddr+1 next_in_stream ) R@ BFCURRENT ! DUP ROT WORDPTR POP ( c Last Last First ToAddr ) CCOPY ( c Last NewToAddr ) WORDPTR PUSH ( c Last ) R@ BFCURRENT @ = IF R@ FILLIT DDUP = IF DDROP 1 ELSE 0 THEN ELSE 1 THEN UNTIL WORDPTR POP WORDPTR TOP@ - 1- ( c nchars ) WORDPTR TOP@ C! DROP WORDPTR POP R> DROP ; ( Descriptor allocation ) 8 CONSTANT #FDS : INIT-FD ( FD --- ) FMODE NOT-OPEN SWAP ! ; CREATE FDS #FDS FDSIZE * ALLOT : INIT-FDS ( --- ) #FDS FDSIZE * FDS + FDS DO I INIT-FD FDSIZE /LOOP ; INIT-FDS 0 FDSIZE * FDS + CONSTANT STDIN 1 FDSIZE * FDS + CONSTANT STDOUT 2 FDSIZE * FDS + CONSTANT STDERR ( descriptor management ) : ?FOPEN ( FD --- F ) FMODE @ NOT-OPEN = NOT ; : (FIND-FD ( --- FD | 0 ) ( finds a free fd if there is one ) #FDS FDSIZE * FDS + FDS ( end start ) BEGIN DDUP > ( check for end of array ) IF DUP ?FOPEN ( END FD F ) ELSE DROP 0 0 ( END 0 0 ) THEN WHILE FDSIZE + ( END NEXTFD ) REPEAT ( END FD | END 0 ) SWAP DROP ; : FIND-FD ( --- FD ) (FIND-FD ?DUP NOT ABORT" All fds used " ; W. M. Bradley March 25, 1986 Page 27 of 16 pages