An audio/visual/spatially oriented user interface

Buzz words:
  Interactive
  Event driven

Hardware
  Audio input
    Fast A/D
      Digitized voice
  Audio output
    Fast D/A
      Digitized voice
      Synthetic voice
      Weird noises
    Midi

Software
  Fast fourier transform the audio input, and extract paramaters from
that. Certian changes of parameters over time would signal events.
A user would be able to make sounds that the system makes templates
from. Templates could be bound to events, and the stream of events
would traverse a emacs-like keymap, invoking functions that could get
their parameters from parameters in the templates that are wildcarded,
or by prompting for more input. There should be constant visual
feedback on the screen. The user should be able to invoke any function
with the keyboard, mouse, or voice. The invokation could give as
feedback some indication of the sound to make, key to press, widget to
mouse, or whatever other means the user has of invoking the function.
When the user invokes a pie menu with the mouse, the sound that would
invoke that menu would be made. A consonant sound, or some higher level
pattern, could be used. When browsing the menu, the sound for the
choice under the mouse would be made. This could be a vowl sound, so
there could be a continum of vowl sounds that represented the
directions. This could be the same throughout the whole system. Other
componants of the sound would be more dimenstions of input available.
Certian channels, such as loudness, pitch, etc..., could have standard
meanings placed on them. It should be designed in a very orthogonal,
easy to learn fashion. As the angle of the mouse in the menu changes
on the screen, the sound being made changes. When you select from the
menu, another consonant sound is made. You can then repeat that
sequence of sounds to perform the same menu selection. There should be
a way to practice sound sequences, where the machine and the user can
adapt to each other as best as each are able. When a command sound is
recognized, an event is generated, and some interested party gets it.
A program can track parameters of the sound much in the way that it
might track the mouse (x, y) position upon receiving a mouse click
event. When a selection event is recognized, the program makes the
choice with the current parameters. Command and selection sounds
should be orthogonal to the other voice parameters. They could be
consonants, and patterns of consonants, with vowls and other
parameters used to differentiate them when that channel is not being
used as a parameter.

  Event driven
  Mouse input
  Audio input
  Keyboard input
  And anything else