Command Line Editing with Command Completion Mitch Bradley _A_b_s_t_r_a_c_t Given the popularity of screen editors and the `what you see is what you get' principle, it is surprising that most systems do not provide similar capabilities for editing command lines. This paper presents a screen-oriented command line editor, includ- ing a command completion feature. Command completion allows a user to abbreviate any name by typing the first few letters. The computer then completes the name if it is possible to uniquely determine which name was meant. Introduction Given a choice of editors, most people will choose a screen- oriented editor over a line editor. The `what you see is what you get' principle seems well suited to human capabilities. Most computer systems provide a full-featured screen editor for text files, but few provide the same capabilities for editing lines that are being typed to the command interpreter. Since a lot of interaction with a computer system is done by typing command lines, it seems sensible to provide powerful editing capabilities for composing those command lines. Nevertheless, most systems provide only very-limited commmand line editing, usually `erase previous character', `erase entire line', and sometimes `erase previous word'. In FORTH, the equivalent of typing a command line is typing a line to the text interpreter, usually under control of the word EXPECT. I present a full-featured screen-oriented intra-line editor to replace EXPECT. The command-set includes, among oth- ers, `forward/backward character/word', `erase previous/next character/word', and `to beginning/end of line'. The fanciest feature of this editor is called `command completion'. Command completion is a technique for automatically abbrevi- ating word names. With command completion, a user need only type the first few characters of a name, and the system will fill in the rest. Only enough characters to uniquely identify the word from among the possible choices are required. If the user doesn't type enough characters to uniquely identify the word, the March 20, 1986 - 2 - system fills out the word as far as possible, beeping to indicate that more characters are needed to select a unique word. The user may ask at any time to see all the current choices. For instance, if the dictionary contains two words `expand-word' and `expand-consciousness', the user could type `ex' and the command completion command character. The system would expand the word on the display to `expand-' and then beep. The user could then add `w' and again type the command completion command character, at which time the system would complete the word to `expand-word '. Command completion has been around for some time. The TENEX[1] operating system [BOBR] for the DEC-10[2] family of com- puters has it, as do TOPS-20[3], some versions of Emacs, several of Control Data Corporation's products, and even the microcom- puter file transfer program Kermit. It tends to be a very popu- lar feature; people miss it when they move from a system with it to one without. Furthermore, command completion removes the major objection to using long, readable command names, because it is no longer painful to type them. Editor Commands The intra-line editor is based on Emacs [STAL]. Emacs is an extensible screen-oriented editor that was developed about 10 years ago at the MIT AI lab by Richard Stallman. Since then, Emacs or something like it has been implemented on lots of dif- ferent machines. A few microcomputer editors are based on Emacs, for example MINCE (which stands for Mince Is Not Complete Emacs). Emacs tends to inspire fanatical devotion among its fans in the same way that FORTH does. I freely admit to being an Emacs fan myself. (I usually run FORTH as a subprocess INSIDE Emacs!). Emacs editing commands are invoked by typing control charac- ters. When a printable character is typed, it is simply added to the text being edited. When a control character is typed, a table is consulted to determine which editing command to perform. The association between a particular control character and a par- ticular editing function may be easily changed by the user at any time. There is a philosophical debate about the relative merits of editors which rely on control characters versus those which use printable characters as commands and have a separate `insert mode' for typing text. This editor uses control characters. If _________________________ 9 [1] `TENEX' is a registered trademark of Bolt, Beranek, and Newman, Inc. 9 [2] `DEC-10' is a trademark of Digital Equipment Corpora- tion. 9 [3] `TOPS-20' is a trademark of Digital Equipment Corpora- tion. 9 March 20, 1986 - 3 - you prefer the other style, it should be easy to modify to suit your taste. The editor always displays the line being edited on the screen. The cursor may be anywhere within the line. The charac- ter immediately to the left of the cursor is the `previous' char- acter. The character underneath the cursor is the `next' charac- ter. If a printable character is typed, it is inserted between the previous and next characters, and the rest of the line is shifted to the right to make room. It is useful to think of the cursor as being between the previous and next characters, even though it is displayed on top of the next character. A `word' is a sequence of non-blank characters, just like in FORTH. _f_o_r_w_a_r_d-_c_h_a_r_a_c_t_e_r moves the cursor forward one character, unless it's already at the end of the line. _b_a_c_k_w_a_r_d-_c_h_a_r_a_c_t_e_r moves the cursor backward one character, unless it's already at the beginning of the line. _f_o_r_w_a_r_d-_w_o_r_d moves the cursor forward one word, unless it's already at the end of the line. The cursor is left so that the last character in the word is the `previous' character. If the cursor starts out in the middle of a word, it is moved to the end of that word. Otherwise it skips blanks until a word is found, and moves to the end of that word. _b_a_c_k_w_a_r_d-_w_o_r_d moves the cursor backward one word, unless it's already at the beginning of the line. The cursor is left so that the first character in the word is the `next' character. If the cursor starts out in the middle of a word, it is moved to the beginning of that word. Otherwise it skips blanks backward until a word is found, and moves to the beginning of that word. _e_n_d-_o_f-_l_i_n_e moves the cursor to the end of the line. _b_e_g_i_n_n_i_n_g-_o_f-_l_i_n_e moves the cursor to the beginning of the line. _e_r_a_s_e-_n_e_x_t-_c_h_a_r_a_c_t_e_r erases the next character (the one under the cursor), unless the cursor is already at the end of the line, in which case it does nothing. The rest of the line moves to the left to fill in the vacated position. _e_r_a_s_e-_p_r_e_v_i_o_u_s-_c_h_a_r_a_c_t_e_r erases the previous character (the one under the cursor), unless the cursor is already at the begin- ning of the line, in which case it does nothing. The rest of the line moves to the left to fill in the vacated position. _e_r_a_s_e-_n_e_x_t-_w_o_r_d erases the next word, unless the cursor is already at the end of the line, in which case it does nothing. If the cursor starts in the middle of a word, only the end of the word is erased; the portion to the left of the cursor remains intact. If the cursor starts in the middle of a series of March 20, 1986 - 4 - blanks, everything up to the end of the next word is erased. _e_r_a_s_e-_p_r_e_v_i_o_u_s-_w_o_r_d erases the previous word, unless the cur- sor is already at the beginning of the line, in which case it does nothing. If the cursor starts in the middle of a word, only the beginning of the word is erased; the portion to the right of the cursor remains intact. If the cursor starts in the middle of a series of blanks, everything back to the beginning of the pre- vious word is erased. _k_i_l_l-_t_o-_e_n_d-_o_f-_l_i_n_e erases everything to the right of the cursor, including the character under the cursor. _i_n_s_e_r_t-_c_h_a_r_a_c_t_e_r inserts the last character typed between the previous and next characters. The rest of the line is moved to the right to make room for the new character. This is the func- tion that is executed when a printable character is typed. _q_u_o_t_e-_n_e_x_t-_c_h_a_r_a_c_t_e_r gets a new character from the keyboard, and inserts it into the line between the previous and next char- acters, even if the character is a control character. This is the way to force a control character into the line, instead of the control character causing some editing function to be per- formed. _f_i_n_i_s_h-_l_i_n_e terminates the entry of the line. EXPECT then returns the line to its caller. This function is normally exe- cuted when carriage return is typed. _r_e_t_y_p_e-_l_i_n_e redisplays the line on the screen, in case some- thing has happened to mess up the display. _g_r_a_b-_l_a_s_t-_l_i_n_e retrieves the previously-typed line so that it may be edited. This is useful for making minor modifications to the previous line. _e_x_p_a_n_d-_w_o_r_d is the command completion function. The word underneath the cursor is used as a search string. The dictionary is searched according to the CONTEXT search order for words whose first characters match the search string. If there is only one match, the word on the screen is filled out to the matched word and a blank is added to the end. If more than one match is found, the word is filled out as far as all the matches agree, and the terminal bell is rung. If no matches are found, charac- ters are successively removed from the end of the search string until at least one match is found. The word on the screen is truncated to the length of the new wearch string, and the bell is rung. _s_h_o_w-_m_a_t_c_h_e_s displays a list of all those words which match the word underneath the cursor, according to the search rules described in expand-word. March 20, 1986 - 5 - Default key assignments Here is the code that assigns keys to functions. ^x means control-x, i.e. hold down the control key and type x. ^[x means escape-x, i.e. type the escape key then type x[4]. : ^F forward-character ; : ^B backward-character ; : ^A beginning-of-line ; : ^E end-of-line ; : ^D erase-next-character ; : ^H erase-previous-character ; ( backspace ) : ^K kill-to-end-of-line ; : ^M finish-line ; ( carriage return ) : ^Q quote-next-character ; : ^L retype-line ; : ^[h erase-previous-word ; : ^[d erase-next-word ; : ^[f forward-word ; : ^[b backward-word ; : ^[= grab-last-line ; : ^@ expand-word ; : ^_ do-show ; Implementation There are a couple of `tried-and-true' methods that have been frequently used for implementing control-character-driven screen editors. One is to use a case statement or a nested conditional to decide what to do with each control character. Another method is to use a jump table or execution vector table that is indexed by the numerical value of the control character. The first is difficult to modify without recompilation, which is especially painful if the function is deep in the FORTH kernel as EXPECT usually is. The second is easy to modify, but special words are required to conveniently see what functions are currently associ- ated with what keys. The technique used here follows from the observation that the FORTH text interpreter is equivalent to a giant extensible CASE statement, whose CASEs are the words in the dictionary and whose search key is the word being interpreted. When a control key is typed, this editor converts the key to a text string of the form ^X (for control-X), and tries to interpret a word with that name from a special vocabulary. If no such word is found, the editor _________________________ 9 [4] In Emacs parlance, escape-x is sometimes called `meta-x', because there were some funny keyboards at MIT and Stanford that had several shift keys. One shift key was called `meta', and was used in place of escape. One style of keyboard had about five shift keys, and you had to type chords to get certain things done. 9 March 20, 1986 - 6 - just beeps instead of ABORT'ing. This has several nice properties. First, it is very easy to associate a function with a particular control key. For instance, suppose that you want to execute `expand-word' when control-j is typed. Make the control character vocabulary the CURRENT one, and do: : ^J expand-word ; The current set of associations between control characters and functions may be easily seen using existing tools. WORDS (the F83 name for VLIST) will show you the control characters that are defined, and the decompiler will show you the function executed by a particular control character. Keyboard macros are easy to implement using this scheme. Just define the behavior of any control character you wish with a colon definition. The definition can either be in terms of the function names or of previously-defined control characters. Multiple-keystroke commands are also easy. For instance, suppose you want to use escape-x as a command, where escape-x means type the escape key, then type the x key. Since escape is the same as control-[, just define ^[ to get a keyboard charac- ter, append it to the control-character search string, and search the vocabulary again. Any two-charcter escape sequence may then be defined in the obvious way, e.g. : ^[x ( whatever you want escape-x to do ) ; It may be argued that this technique is slow. However, com- pared to the speed at which people can type, it is plenty fast. 120 words-per-minute corresponds to 10 characters-per-second, which is 100 msec-per-character. This is a long time for a com- puter, and hardly anybody can even approach 120 words-per-minute. Besides, the whole dictionary doesn't have to be searched, only the control character vocabulary (presupposing sealed vocabu- laries). Another possible objection is that it may seem difficult to implement. It only took me a few minutes; you may look at the code and judge for yourself about the complexity. The command completion code should be relatively easy to fol- low. The algorithm is pretty simple: given a search string, go through the dictionary and make a list of the NFA's of all the words whose initial substrings match the search string. If the list contains one match, all is well; expand the word to be the same as the matched word. If the list contains several words, find the longest initial substring common to all the matches, and expand the word to that substring. If the list contains no matches, remove the last character from the search string and start over. Most of the hard work is involved in getting the March 20, 1986 - 7 - substring match code right, considering that the headers in the dictionary headers may have tag bits in the character strings. FORTH dialect This code was written using a version of the public domain F83 system by Perry and Laxen. The system has been modified somewhat to run on a 68000 Unix[5] system (actually a Sun Works- tation[6] ). Several implementation dependencies exist. In many cases these were unavoidable, and point out deficiencies in the standard. In particular: It is assumed that EXPECT is vectored so that it may be replaced with the new version. If your EXPECT isn't vectored, you have to find a way to patch the new one in. There is a word `vfind' which takes as arguments the address of a string to look for and the cfa of a vocabulary. It then searches just that vocabulary. This word is easy to implement on any par- ticular system with sealed vocabularies. It may in fact be implementable in 83-standard FORTH, but saving and restoring CON- TEXT is painful, so I implemented it in the obvious way for my system. A well-factored kernel ought to have such a word present. The other implementation dependency is in the command comple- tion package. So far as I know, the 83 standard does not provide anyy standard way of looking through the dictionary other than with FIND. FIND can't cope with looking for matching initial substrings, so there is a word `find-candidates' which looks through the dictionary to find matches. The way that it works is heavily dependent on the way that the search order concept is implemented, the way that words are linked together in the dic- tionary, and the way that hashing, if any, is done. Again, it shouldn't be hard to implement on most systems, but it can't be done as a `standard' program. The code which matches substrings in the dictionary with the search string depends on the way that names are stored in the dictionary, with tag bits and such. This code assumes that names are stored with tag bits at the beginning and end, and the ini- tial length byte may contain an immediate bit. Considering the trouble that dictionary maintenance code must go to in order to cope with these tag bits, I would argue that the space saved is not worth the trouble. I think Forth implementations should use a single consistent format for strings, and that format should be used for names in the dictionary, with no tag bits. I don't think it is possible to implement command completion at all on systems that use the `first three letters and a count' scheme for saving dictionary space. Also, some sorts of hashing _________________________ 9 [5] Unix is a trademark of Bell Laboratories. 9 [6] Sun Workstation is a trademark of Sun Microsystems, Inc. March 20, 1986 - 8 - may make it harder. The hashing function in F83 just uses the first character of the name to do four-way hashing, which is very convenient for command completion purposes. If the hashing func- tion were to require that the entire name be available to calcu- late the hash code, it would be necessary to ignore the hashing and simply search the whole dictionary. Terminal support The editor uses some `smart-terminal' features for manipulat- ing the screen display. There may still be some terminals around that don't support the necessary fuctions, but nearly all modern ones do. The system has been used with Heath/Zenith H-19 termi- nals, ANSI standard terminals, and Televideo 925 terminals. The terminal-dependent code is packaged to make it easy to add sup- port for other terminals. Future Enhancements There are some Emacs features that ought to be added to this editor. Notably, it should be possible to set a marker at some position in the line, move the cursor, and then either delete or copy the section of the line between the marker and the cursor. This wouldn't be very hard to implement. Slightly harder would be command repetition counts, allowing the user to type a number representing how many times a command should be executed. Most of the time in a screen-oriented text editor is spent editing within a line. This means that this editor could serve as the basis for a complete full-screen text editor, basically Emacs-in-Forth. A really complete Emacs implementation would require a lot of work, though, because Emacs usually has the ability to split the screen into a number of non-overlapping win- dows, editing a different file in each window. Acknowledgements I owe a debt of gratitude to the people who designed and evolved Emacs, for providing an excellent model for how an editor should behave. It is my feeling that the hard part of program design is deciding what the program SHOULD do, and that the implementation is usually trivial once the specification is right. In this case, I was absolutely convinced that the Emacs editing model was exactly right; as a consequence, the implemen- tation of the editor other than command completion was completed in one evening. I'm also indebted to the many people who have evolved the command completion concept, so that I was able to just do the implementation, without having to design the user interface. Finally, I'm indebted to Mike Perry and Henry Laxen for the fine job they did with F83. Previous versions of Forth that I have used have lacked many necessary (to me at least) features, March 20, 1986 - 9 - so that I was forced to make extensive modifications to them. I hesitated to publish code for fear of having to explain how my kernel works. F83 does so many things right that I have been able to write a lot of applications code without having to spend too much time fixing the kernel. References [STAL] Stallman, R. M. _E_M_A_C_S _M_a_n_u_a_l _f_o_r _I_T_S _U_s_e_r_s. AI Lab Memo 554, MIT, Cambridge, Mass., (1980). [BOBR] Bobrow, D. G. Tenex: a paged time-sharing system for the PDP-10. _C_o_m_m_u_n_i_c_a_t_i_o_n_s _o_f _t_h_e _A_C_M 15, 3, March 1972, p135-143. March 20, 1986