Version 5, last updated by simonjudge@work at Nov 21 16:18 2008 UTC

OK, This page might need updating as we understand more.

 Components:

  • Word-list : 'Dictionary' of words to search through. Wordlist.txt in DKey software
  • .dict file: - keycode to word mappings
  • Frequency-list: frequency of these words within a certain corpus
  • Corpus: lump of text to get frequency list from.
  • Key mapping: List of keys (i.e. physical things on a keyboard) to letters, e.g. 1-abc, 2-def, 3-ghij (we could and a level of abstraction so keys get mapped to these numbers, this allows some re-assignment without rebuilding lookup from wordlist)

 Disambiguting Process:

  1.  A sequence of keys is pressed (e.g. 123)
  2. The possible set of words that could relate to that key sequence is displayed (in order of likelihood by freq).
  3. The user continues key presses until word complete, if the correct word is highlighted in dis-list, the user selects 'space'...
  4. If the word is further down the dis-list, the user presses next to cycle to it, then space to select it. 


Disambiguation Methods

T9 (Tapir::Exact Only):

  1. Parse the corpus for full words
  2. Generate keycode list:
    1. parse words, for word (e.g. they):
      1. lookup keycode for each letter of word, parse the word, for each letter (e.g. t, th, the, they)
        1. add keycode::string to keycode list in order of frequency of occurance (e.g. 08 t, 084 th, 0843 the, 08439 they)
      2. Compile key-code list for all parsed words
    2. e.g. 08439 they view tidy

WHAT TO DO ABOUT CHAR-COMBINATIONS - (unigrams)???? E.g. AR of ARE - are these taken care of in parsing individual letters in word?  What to do when adding char-combinations to list, eg.. th - do you add frequencies???

NOT SURE TAPIR DOES THIS 'PROPERLY' - CHECK.

Tapir:

The tapir method is different from T9 for a number of reasons:

  • 'Next' itterates over all words with the entered suffix (e.g. th-> the, that, there, their, these) - including longer words that the number of keys already pressed.
  • There is a cost equation which can be altered to change the stresses between looking at the prediction list (to choose next) and minising keypresses.

Prefer Exact:

 The lookup list stores key sequences and the list of related letter sequences with that exact sequence as the start of the word (suffix).   DESCRIBE PROCESS. TBC.

 Both: 

 Sequences are stored in order of probability - determined by parsing the corpus...

Most Probable:

 

 

Multitap:

 

 

-----

 Questions:

  • Can we build a look up list from just words rather than part-words
  • Is the generated list just letter-strings starting with space?