Skip to content

Spellcorrect exercise part2

jasonbaldridge edited this page Feb 4, 2013 · 7 revisions

If you haven't completed it already, do SpellCorrect-Exercise.

Step one: rank candidates using cosine similarity

For each spelling error, compute the cosine between it and all the candidates returned from the inverted index. Then output the top 20 candidates based on the values obtained.

Example output:

> run-main appliednlp.spell.SpellingCorrector "This Facebook app shows that she is there favorite acress in tonw" /usr/share/dict/words /tmp/masc_vocab.txt
[info] Running appliednlp.spell.SpellingCorrector This Facebook app shows that she is there favorite acress in tonw /usr/share/dict/words /tmp/masc_vocab.txt
Detecting spelling errors in: This Facebook app shows that she is there favorite acress in tonw
ERROR: acress
  Candidates: professed acoin apocrenic adherescence accouple acronym crumblingness gleesomeness unvarnishedness trest
  Top 20 (cress,0.7302967433402214) (acres,0.7302967433402214) (Paracress,0.6804138174397718) (actress,0.6172133998483676) (acre,0.6123724356957946) (acupress,0.5773502691896258) (Press,0.5477225575051661) (tress,0.5477225575051661) (Dress,0.5477225575051661) (press,0.5477225575051661) (dress,0.5477225575051661) (acred,0.5477225575051661) (crestless,0.5443310539518174) (creatress,0.5443310539518174) (acridness,0.5443310539518174) (acrestaff,0.5443310539518174) (acceptress,0.5163977794943223) (restress,0.5163977794943223) (sacredness,0.5163977794943223) (pennycress,0.5163977794943223)
ERROR: tonw
  Candidates: Teutondom belton metapeptone tokened atoningly histon toploftical Eciton toxolysis toxicosis
  Top 20 (ton,0.5773502691896258) (tonk,0.5) (tone,0.5) (tons,0.5) (tony,0.5) (tong,0.5) (tongs,0.4472135954999579) (toned,0.4472135954999579) (tonal,0.4472135954999579) (tonga,0.4472135954999579) (tonic,0.4472135954999579) (tones,0.4472135954999579) (tonus,0.4472135954999579) (toner,0.4472135954999579) (tonsor,0.4082482904638631) (tonish,0.4082482904638631) (tonkin,0.4082482904638631) (tongue,0.4082482904638631) (tonjon,0.4082482904638631) (tonous,0.4082482904638631)
[success] Total time: 14 s, completed Feb 4, 2013 4:20:29 PM

MORE TO COME

Clone this wiki locally