Disambiguation process

OK, This page might need updating as we understand more.

 Components:

 Disambiguting Process:

  1.  A sequence of keys is pressed (e.g. 123)
  2. The possible set of words that could relate to that key sequence is displayed (in order of likelihood by freq).
  3. The user continues key presses until word complete, if the correct word is highlighted in dis-list, the user selects 'space'...
  4. If the word is further down the dis-list, the user presses next to cycle to it, then space to select it. 


Disambiguation Methods

T9 (Tapir::Exact Only):

  1. Parse the corpus for full words
  2. Generate keycode list:
    1. parse words, for word (e.g. they):
      1. lookup keycode for each letter of word, parse the word, for each letter (e.g. t, th, the, they)
        1. add keycode::string to keycode list in order of frequency of occurance (e.g. 08 t, 084 th, 0843 the, 08439 they)
      2. Compile key-code list for all parsed words
    2. e.g. 08439 they view tidy

WHAT TO DO ABOUT CHAR-COMBINATIONS - (unigrams)???? E.g. AR of ARE - are these taken care of in parsing individual letters in word?  What to do when adding char-combinations to list, eg.. th - do you add frequencies???

NOT SURE TAPIR DOES THIS 'PROPERLY' - CHECK.

Tapir:

The tapir method is different from T9 for a number of reasons:

Prefer Exact:

 The lookup list stores key sequences and the list of related letter sequences with that exact sequence as the start of the word (suffix).   DESCRIBE PROCESS. TBC.

 Both: 

 Sequences are stored in order of probability - determined by parsing the corpus...

Most Probable:

 

 

Multitap:

 

 

-----

 Questions: