9 Structured Data with Bit Fields 9 9 Mitch Bradley 9 A structured data item is a data item composed of several subfields of various types, analogous to the `struct' declaration of C or the `RECORD' types of Pascal and Modula-2. Structured data types are trivial to implement in Forth if each subfield is constrained to occupy an integral number of machine bytes. The problem becomes more interesting if subfields may include groups of bits within a byte or word, as in C `bit fields' or Modula-2 `BITSET's. Such bit fields arise frequently when dealing with hardware registers, which often contain groups of bits which control independent hardware functions. This paper presents a set of words for conveniently defining and accessing structured data, which may contain bit fields. Their utility is shown with a real example -- manipulation of a disk controller. Introduction A number of proposals for data structuring words have recently appeared in the literature [WALK][BASI][JOOS]. The scheme that I present is simpler than most of these, and probably not as powerful. My particular emphasis in this paper is not on data structures in general, but on data structures with imbedded bit fields. Structures or records are usually used to group together related data items. As an example, consider the following frag- ment of C [KERN] code. struct controller-registers { unsigned short data; unsigned short command; unsigned long dma-base; unsigned short dma-count; }; This structure declares that there is a structure type called `controller-registers'. This structure contains four fields, three of them short unsigned integers (2 bytes each), and one of the a long unsigned integer (4 bytes)[1]. _________________________ 9 [1] Those of you familiar with C will no doubt note that 9 March 20, 1986 - 2 - The structure could later be used by declaring a pointer to that structure, as in: struct controller-registers *ptr; To access the dma-count register, one could then use ptr->dma-count From a code generation standpoint, this is not hard to imple- ment. All the compiler has to do is to remember what type of object is represented by `dma-count' and its offset from the start of the structure. When Forth programmers are faced with accessing a data struc- ture like this one, the tendency is to use numerical offsets into the data structure. The offset calcualation is usually done explicitly when the object is referenced, using perhaps `8 +'. Explicitly calculating offsets is markedly inferior to having it done automatically for you, though. Explicit offsets buried in code make the code hard to understand and especially hard to modify. A structure capability that automatically calculates offsets is extremely easy to implement in Forth. : field ( offset size --- ofset+size ) create over , + does> ( fd pfa --- fd+offset ) @ + ; : struct 0 ; This may be used as follows: struct ( controller-registers ) 2 field data 2 field command 4 field dma-base 2 field dma-count constant controller-registers-size The word `field' just keeps a running account of the current offset at compile time, and creates a new word which when exe- cuted will add its offset to the number on the stack. The word `struct' simply initializes the current offset to 0. After the structure definition is finished, it is convenient to remember _________________________ dma-base should probably be a (char *) instead of an (un- signed long). However, it is easier to explain as it stands. March 20, 1986 - 3 - the total size of the structure by defining a constant, using the size that is already on the stack. Later the fields may be used as follows, assuming we have a variable `ptr' which contains the address of an instance of this structure. ptr @ data @ ( fetches the contents of the data field ) 100000. ptr @ dma-base 2! ( sets the dma-base field to 100000. ) Bitfields So far this has been pretty trivial, and hardly worth writing a paper about. So lets make the problem a little more interest- ing. Suppose that we want to define some fields which do not occupy an integral number of bytes. This situation can arise when there are data items that are very short, say 1 bit, and you don't want to waste an entire byte just to store 1 bit. The situation also frequently arises when dealing with hardware registers, which often pack several barely-related bits within a single byte or word. Coping with single bits isn't too bad, because it is easy to operate on the bit with a mask and a boolean operator, setting or clearing the bit as desired. The real pain comes when there is something like a 3-bit field that represents a number, and that field is somewhere in the middle of a byte. Then the problem involves a lot of masking and shifting, which is annoying to write and hard to read. C copes with this problem by providing a `bitfield' defini- tion capability within structures. Bitfields allow the program- mer to give a symbolic name to a group of bits within a word or byte, and then to access that group of bits as though it were a small integer, without having to do explicit chifting and mask- ing. I have attempted to provide this capability within Forth, and have succeeded to some extent. Here's how it works, described by example. This segment of code comes from a driver for a Xylogics Model 450 disk controller controller board[2]. 9_________________________ 9 [2] The funny byte numbering (1 first, then 0) is a result of the fact that the controller manual numbers the bytes in `little endian' order, with the least significant byte of a word at the lowest address, whereas the code is written for a 68000 machine, which is `big endian', with the most signi- ficant byte of a word at the lowest address. This problem is well-known to anybody who has ever tried to write `port- able' code to deal with real I/O devices. It ranks as a ma- jor nuisance. March 20, 1986 - 4 - struct ( xyiopb ) ( Byte 1 ) 1 resbits 1 bits intrall ( interrupt on all iopbs (450) 1 bits intrerr ( interrupt on error ) 1 bits reserve ( reserve dual port drive (450) 1 bits recal ( recalibrate on seek errors (450) 1 bits enabext ( enable extensions (450) 2 bits eccmode ( ECC actions ) ( Byte 0 ) 1 bits autoup ( auto update of IOPB ) 1 bits reloc ( use relocation ) 1 bits chain ( command chaining ) 1 bits ie ( interrupt enable ) 4 bits cmd ( command ) ( Byte 3 ) 1 field errno ( error number ) ( Byte 2 ) 1 bits iserr ( error indicator ) 2 resbits 3 bits ctype ( controller type ) 1 resbits 1 bits complete ( completion code valid (450) ( Byte 5 ) 2 bits drive ( drive type ) 4 resbits 2 bits unit ( unit number ) ( Byte 4 ) 1 bits bytebus ( use byte transfers ) 4 bits intrlv ( interleave - 1 (450) 3 bits throttle ( throttle control ) ( remaining are not bit fields ) 1 field sector ( 7: sector number ) 1 field head ( 6: head number ) 2 field cylinder ( 9,8: cylinder number ) 0 field nsect ( b,a: sector count ) 2 field status ( low byte is status ) 2 field bufoff ( d,c: buffer offset ) 2 field bufrel ( f,e: buffer offset ) 1 + ( 11: reserved byte ) 1 field bhead ( 10: base head (450) 2 field nxtoff ( 13,12: next iopb offset ) 2 field eccpatt ( 15,14: ECC pattern ) 2 field eccaddr ( 17,16: ECC address ) constant xyiopbsize Take a look at `Byte 1'. The 8 bits of this byte control 6 different functions. The most-significant-bit is not used (1 resbits). The next 5 bits control 5 separate modes, and the final 2 bits are encoded to select one of four possible ecc modes. Other bytes have different configurations of bits. The code to control this disk controller could obviously get quite March 20, 1986 - 5 - contorted if the programmer had to do explicit shifting and mask- ing in order to manipulate all these little pieces of bytes. Here is a section of the code which accesses this structure. Note how the use of the bitfield names has made the function of the code obvious. : setiopb ( buf. cnt sec head cyl cmd -- ) iopb xyiopbsize 0 fill ( ... cmd ) iopb cmd bit! ( ... cyl ) iopb cylinder ! ( ... head) iopb head c! ( ... sec ) iopb sector c! ( ... cnt ) iopb nsect ! ( ... bufhi ) 15 and 12 shift iopb bufrel ! ( ... buflo ) iopb bufoff ! drivetype @ iopb drive bit! unitno @ iopb unit bit! xythrottle @ iopb throttle bit! 1 iopb autoup bit! 1 iopb reloc bit! ( specific to the 450 -- not on the 440 ) 1 iopb enabext bit! basehead @ iopb bhead c! 2 iopb eccmode bit! ; The rabid Forth programmer will no doubt complain that this definition is too long. This is a valid point. The excuse is that this bit of code was translated almost verbatim from a known-working C program, and I had no motivation whatsoever to do a stylistic transformation strictly for the sake of `purity'. If it isn't broken, don't fix it. Implementation The code to implement bit field defining, fetching, and stor- ing follows. It is deceptively simple, but its use represents a major improvement in the writeability and understandability of programs that need to deal with such structures. Both high-level and 68000 assembly language versions are given. It was actually easier to write `bit!' in assembly language than in high-level, because the stack manipulations were complicated. This package specifically does not allow bit fields to cross byte boundaries. It would not be hard to extend it to allow bit fields to be anywhere within a word. This particular application benefited from the byte-only restriction, because the disk con- troller hardware had some funny byte-only registers. Undesire- able side effects resulted from using a word operation which March 20, 1986 - 6 - accessed 2 of them at once. I suspect that allowing bit fields to cross n-byte boun- daries, where n is the number of bytes in the machines hardware data paths, would make the code a lot more complicated. Acknowledgement The syntax for defining fields, where the current offset is kept on the stack at compile time, was suggested at a FIG meeting by (I believe) Glenn Tenney. References [BASI] Basile, James, Implementing Datastructures in FORTH _J_o_u_r. _F_O_R_T_H _A_p_p_l. _R_e_s. 1, 2, December, 1983. pp.5-16. [JOOS] Joosten, Rieks, and Nieuwenhuysen, Hans, Vectoring Arrays of Structures _J_o_u_r. _F_O_R_T_H _A_p_p_l. _R_e_s. 1, 2, December 1983. pp.35- 54 [KERN] Kernighan, B., and Ritchie, D., _T_h_e _C _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e Prentice-Hall, 1978. [WALK] Walker, Bruce W., PL/1 Data Structures _F_O_R_T_H _D_i_m_e_n_s_i_o_n_s 5, 6, March/April 1984, pp.8-12. March 20, 1986 - 7 - ( Forth 83 ) ( structures with bit fields ) variable bitoffset : bitalign ( offset size -- offset' size ) ( moves to next byte boundary if the bit offset is nonzero ) bitoffset @ if 0 bitoffset ! swap 1+ swap then ; : struct 0 bitoffset ! 0 ; : field ( offset size -- newoffset ) create bitalign over , + does> ( structure-base -- structure-member-address ) @ + ; ( reserves an n-bits field ) : resbits ( offset #bits -- newoffset ) bitoffset @ + 8 /mod ( offset bitoffset_mod_8 bitoffset/8 ) swap bitoffset ! + ; ( bits are allocated from left-right, or from msb to lsb. ) ( however, they are numbered from lsb=0 to msb=7 ) ( left-right generates the correct bit number ) : left-right-bit# ( #bits -- #bits bit# ) bitoffset @ over + 8 swap - dup 0< abort" Bit fields can't cross byte boundaries" ; ( structure describing the layout of information in the parameter field ) ( of a word created by "bits" ) struct ( bit-record ) 2 field bit-offset 2 field bit# 2 field #bits constant bit-record-size ( defines an n-bit field ) : bits ( offset #bits --namex newoffset ) create over , left-right-bit# , dup , ( offset #bits ) resbits does> ( struct_addr pfa -- byte_addr bit# #bits ) >r r@ bit-offset @ + ( byte_addr ) r@ bit# @ ( byte_addr bit# ) r> #bits @ ( byte_addr bit# #bits ) ; hex create masks 0 c, 1 c, 3 c, 7 c, 0f c, 1f c, 3f c, 7f c, ff c, March 20, 1986 - 8 - decimal : bitmask ( n -- mask ) masks + c@ ; : bit@ ( byte_addr bit# #bits -- n ) rot c@ ( bit# #bits byte ) rot negate shift ( #bits shifted-byte ) swap bitmask and ; : bit! ( n byte_addr bit# #bits -- ) bitmask ( n byte_addr bit# mask ) rot >r ( n bit# mask ) rot over and ( bit# mask masked-n ) 2 pick shift ( bit# mask shifted-masked-n ) -rot ( shifted-masked-n bit# mask ) swap shift ( shifted-masked-n shifted-mask ) not ( shifted-masked-n shifted-inverted-mask ) r@ ( shifted-masked-n shifted-inverted-mask byte_addr ) c@ and ( shifted-masked-n masked-old-value ) or ( new-value ) r> c! ; 68000 assembly language version of bit@ and bit! code bit@ ( byte_addr bit# #bits -- n ) word sp )+ d0 move ( d0 is #bits ) masks # d7 move d7 a0 lmove ( relocated address of mask table ) byte 0 d0 a0 di) d0 move ( mask to d0 ) word sp )+ d1 move ( d1 is bit# ) sp )+ d7 move d7 a0 lmove ( relocate 16 bit address to 32 bits ) long d2 clr byte a0 ) d2 move d1 d2 lsr d0 d2 andw word d2 sp -) move c; code bit! ( n byte_addr bit# #bits -- ) word sp )+ d0 move ( d0 is #bits ) masks # d7 move d7 a0 lmove ( relocated address of mask table ) byte 0 d0 a0 di) d0 move ( mask to d0 ) word sp )+ d1 move ( d1 is bit# ) sp )+ d7 move d7 a0 lmove ( relocate 16 bit address to 32 bits ) sp )+ d2 move ( d2 is n ) d0 d2 andw d1 d2 lsl ( masked, shifted, n in d2 ) d1 d0 lsl ( shifted mask in d0 ) March 20, 1986 - 9 - d0 notw ( inverted mask in d0 ) byte d0 a0 ) andw d2 a0 ) orw word c; March 20, 1986