Disambiguation process

History Key

  • New content
  • Removed content

Recent Versions

Choose two versions to compare, or click the link to view it.

  1. 5. over 3 years by simonjudge@work
  2. 4. over 3 years by simonjudge@work
  3. 3. over 3 years by simonjudge@work
  4. 2. over 3 years by SteveLee
  5. 1. over 3 years by simonjudge@work
 

OK, This page might need updating as we understand more.

 Components:

  • Word-list : 'Dictionary' of words to search through. Wordlist.txt in DKey software
  • .dict file: - keycode to word mappings
  • Frequency-list: frequency of these words within a certain corpus
  • Corpus: lump of text to get frequency list from.
  • Key mapping: List of keys (i.e. physical things on a keyboard) to letters, e.g. 1-abc, 2-def, 3-ghij (we could and a level of abstraction so keys get mapped to these numbers, this allows some re-assignment without rebuilding lookup from wordlist)

 Disambiguting Process:

  1.  A sequence of keys is pressed (e.g. 123)
  2. The possible set of words that could relate to that key sequence is displayed (in order of likelihood by freq).
  3. The user continues key presses until word complete, if the correct word is highlighted in dis-list, the user selects 'space'...
  4. If the word is further down the dis-list, the user presses next to cycle to it, then space to select it. 


Disambiguation Methods

T9 (Tapir::Exact Only):

  1. Parse the corpus for full words
  2. Generate keycode list:
    1. parse words, for word (e.g. they):
      1. lookup keycode for each letter of word, parse the word, for each letter (e.g. t, th, the, they)
        1. add keycode::string to keycode list in order of frequency of occurance (e.g. 08 t, 084 th, 0843 the, 08439 they)
      2. Compile key-code list for all parsed words
    2. e.g. 08439 they view tidy

WHAT TO DO ABOUT CHAR-COMBINATIONS - (unigrams)???? E.g. AR of ARE - are these taken care of in parsing individual letters in word?  What to do when adding char-combinations to list, eg.. th - do you add frequencies???

NOT SURE TAPIR DOES THIS 'PROPERLY' - CHECK.

Tapir:

The tapir method is different from T9 for a number of reasons:

  • 'Next' itterates over all words with the entered suffix (e.g. th-> the, that, there, their, these) - including longer words that the number of keys already pressed.
  • There is a cost equation which can be altered to change the stresses between looking at the prediction list (to choose next) and minising keypresses.

Prefer Exact:

 The lookup list stores key sequences and the list of related letter sequences with that exact sequence as the start of the word (suffix). (suffix).   DESCRIBE PROCESS. TBC.

 Both: 

 Sequences are stored in order of probability - determined by parsing the corpus...

Multitap:

Process

Most Probable:

Generating the lookup table is done by (CHECK!!!!): 

  1. Parse corpus, extract all unique string-sequences (STARTING WITH SPACE????) and freq...
    • Or, parse frequency tables
  2. Map string-sequences to keycodes
    1. Order sequences by order of probability.  This is determined by the disambiguation method.
  3. create lookup table:  combine identical or identical-prefix keycodes, order of list being most freq first.

Disambiguation:

 

Multitap:

  • Lookup key sequence in table, display list... 

 

 

-----

 Questions:

  • Can we build a look up list from just words rather than part-words
  • Is the generated list just letter-strings starting with space?