-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathPirinen-2011-lrec-apertium.html
737 lines (721 loc) · 58.3 KB
/
Pirinen-2011-lrec-apertium.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
<!DOCTYPE html><html>
<head>
<title>Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/.</title>
<!--Generated on Fri Sep 29 12:59:51 2017 by LaTeXML (version 0.8.2) http://dlmf.nist.gov/LaTeXML/.-->
<!--Document created on Last modifications: September 29, 2017.-->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" href="../latexml/LaTeXML.css" type="text/css">
<link rel="stylesheet" href="../latexml/ltx-article.css" type="text/css">
</head>
<body>
<div class="ltx_page_main">
<div class="ltx_page_content">
<article class="ltx_document ltx_authors_1line">
<h1 class="ltx_title ltx_title_document">Compiling Apertium morphological dictionaries with HFST and using them
in HFST applications<span class="ltx_ERROR undefined">\footnotepubrights</span>This article was published in saltmil
workshop in LREC 2011 in Malta. Original version
<span class="ltx_ERROR undefined">\url</span>http://ixa2.si.ehu.es/saltmil/.</h1>
<div class="ltx_authors">
<span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Tommi A Pirinen
</span></span>
<span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname"> Francis M. Tyers
<br class="ltx_break">University of Helsinki
</span></span>
<span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">
Universitat d’Alacant
<br class="ltx_break">FI-00014 University of Helsinki Finland
</span></span>
<span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">
E-03071 Alacant Spain
<br class="ltx_break"><span class="ltx_ERROR undefined">\url</span>[email protected]
</span></span>
<span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname"> <span class="ltx_ERROR undefined">\url</span>[email protected]
<br class="ltx_break">
</span></span>
</div>
<div class="ltx_date ltx_role_creation">Last modifications: September 29, 2017</div>
<div class="ltx_abstract">
<h6 class="ltx_title ltx_title_abstract">Abstract</h6>
<p class="ltx_p">In this paper we aim to improve interoperability and re-usability of the
morphological dictionaries of Apertium machine translation system by
formulating a generic finite-state compilation formula that is implemented in
HFST finite-state system to compile Apertium dictionaries into general purpose
finite-state automata. We demonstrate the use of the resulting automaton in
FST-based spell-checking system.
<br class="ltx_break">Keywords: finite-state, dictionary, spell-checking</p>
</div>
<section id="S1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2>
<div id="S1.p1" class="ltx_para">
<p class="ltx_p">Finite-state automata are one of the most effective format for representing
natural language morphologies in computational format. The finite-state
automata, once compiled and optimised via process of minimisation are very
effective for parsing running text. This format is also used when running
morphological dictionaries in machine-translation system
Apertium <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib5" title="Apertium: a free/open-source platform for rule-based machine translation" class="ltx_ref">3</a>]</cite><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup><span class="ltx_ERROR undefined">\url</span>http://www.apertium.org</span></span></span>. In this
paper we propose a generic compilation formula to compile the
dictionaries into weighted finite state automata for use with any FST
tool or application. We implement this system using a free/libre
open-source finite-state API
HFST <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib6" title="HFSTâframework for compiling and applying morphologies" class="ltx_ref">7</a>]</cite><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup><span class="ltx_ERROR undefined">\url</span>http://hfst.sf.net</span></span></span>. HFST is a general
purpose programming interface using a selection of freely-available
finite-state libraries for the handling of finite-state automata.</p>
</div>
<div id="S1.p2" class="ltx_para">
<p class="ltx_p">While Apertium uses the dictionaries and the finite-state automata for machine
translation, HFST is used in multitude of other applications ranging from
basic morphological analysis <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib6" title="HFSTâframework for compiling and applying morphologies" class="ltx_ref">7</a>]</cite>
to end-user applications such as spell-checking <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib16" title="Finite-state spell-checking with weighted language and error models" class="ltx_ref">10</a>]</cite> and
predictive text-entry for mobile phones <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib3" title="Improving predictive entry of finnish text messages using irc logs" class="ltx_ref">13</a>]</cite>. In this
article we show how to generate automatically a spell-checker from an Apertium
dictionary and evaluate roughly the usability of the automatically generated
spell-checker.</p>
</div>
<div id="S1.p3" class="ltx_para">
<p class="ltx_p">The rest of the article is laid out as follows: In section <a href="#S2" title="2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>
we describe the generic compilation formula for the HFST-based compilation of
Apertium dictionaries and the formula for induction of spell-checkers error
model from Apertium’s dictionary. In section <a href="#S3" title="3 Materials ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a> we introduce
the Apertium dictionary repository and the specific dictionaries we use to
evaluate our systems. In section <a href="#S4" title="4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a> we evaluate speed and
memory usage of compilation and application of our formula against Apertium’s
own system and show that our system has roughly same coverage and explain
the differences arise from.</p>
</div>
</section>
<section id="S2" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">2 </span>Methods</h2>
<div id="S2.p1" class="ltx_para">
<p class="ltx_p">The compilation of Apertium dictionaries is relatively straight-forward. We
assume here standard notations for finite-state algebra. The morphological
combinatorics of Apertium dictionaries are defined in following terms: There is
one set of root morphs (finite strings) and arbitrary number of named sets of
affix morphs called <span class="ltx_text ltx_font_typewriter">pardef</span>s. Each set of affix morphs is associated with a
name. Each morph can also be associated with a paradigm reference pointing to a
named subset of affixes. As an example, a language of singular and plural of
<em class="ltx_emph">cat</em> and <em class="ltx_emph">dog</em> in English would be described by root dictionary
consisting of morphs <span class="ltx_text ltx_font_typewriter">cat</span> and <span class="ltx_text ltx_font_typewriter">dog</span>, both of which point on the
right-hand side to pardef named <span class="ltx_text ltx_font_typewriter">number</span>. The number affix morphs are
defined then as set of two morphs, namely <span class="ltx_text ltx_font_typewriter">s</span> for plural marker and
empty string for singular marker.</p>
</div>
<div id="S2.p2" class="ltx_para">
<p class="ltx_p">Each morph can be compiled into single-path finite-state automaton<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">3</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">3</sup>the
full formula allows any finite-state language as morph, compiled from regular
expressions, the extension to this is trivial but for readability we present
the formula for string morphs</span></span></span> containing the actual morph as string of UTF-8
arcs <math id="S2.p2.m1" class="ltx_Math" alttext="m" display="inline"><mi>m</mi></math>. The morphs in the root dictionary are extended from left or right
sides by joiner markers iff they have a pardef definition there and each affix
dictionary is extended on the left (for suffixes) or right (for prefixes) by
the pardef name marker. In the example of <em class="ltx_emph">cats, dogs</em> language this would
mean finite state paths <span class="ltx_text ltx_font_typewriter">c a t NUMBER</span>, <span class="ltx_text ltx_font_typewriter">d o g NUMBER</span>,
<span class="ltx_text ltx_font_typewriter">NUMBER s</span> and <span class="ltx_text ltx_font_typewriter">NUMBER <math id="S2.p2.m2" class="ltx_Math" alttext="\epsilon" display="inline"><mi mathvariant="normal">ϵ</mi></math></span>, where <math id="S2.p2.m3" class="ltx_Math" alttext="\epsilon" display="inline"><mi>ϵ</mi></math> as usual
marks zero-length string<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">4</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">4</sup>In the current implementation we have used
temporarily a special non-epsilon marker as this decreases the local
indeterminism and thus compilation time</span></span></span>. These sets of roots and affixes can
be compiled into disjunction of such joiner delimited morphs. Now, the
morphotactics can be defined as related to joiners by any such path that
contains joiners only as pairs of adjacent identical paradigm references, such
as <span class="ltx_text ltx_font_typewriter">c a t NUMBER NUMBER s</span> or <span class="ltx_text ltx_font_typewriter">d o g NUMBER NUMBER <math id="S2.p2.m4" class="ltx_Math" alttext="\epsilon" display="inline"><mi mathvariant="normal">ϵ</mi></math></span>,
but not <span class="ltx_text ltx_font_typewriter">c a t NUMBER d o g NUMBER</span> or <span class="ltx_text ltx_font_typewriter">NUMBER s NUMBER s</span>. The
finite-state formula for this morphotactics is defined by</p>
</div>
<div id="S2.p3" class="ltx_para">
<table id="S2.E1" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E1.m1" class="ltx_Math" alttext="M_{x}=(\Sigma\cup\bigcup_{x\in p}xx)^{\star}," display="block"><mrow><mrow><msub><mi>M</mi><mi>x</mi></msub><mo>=</mo><msup><mrow><mo stretchy="false">(</mo><mrow><mi mathvariant="normal">Σ</mi><mo>∪</mo><mrow><munder><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><mi>x</mi><mo>∈</mo><mi>p</mi></mrow></munder><mrow><mi>x</mi><mo></mo><mi>x</mi></mrow></mrow></mrow><mo stretchy="false">)</mo></mrow><mo>⋆</mo></msup></mrow><mo>,</mo></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td>
</tr>
</table>
</div>
<div id="S2.p4" class="ltx_para">
<p class="ltx_p">where <math id="S2.p4.m1" class="ltx_Math" alttext="p" display="inline"><mi>p</mi></math> is set of pardef names and <math id="S2.p4.m2" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math> the set
of symbols in morphs not including the set of pardef names. Now the final
dictionary is simply composition of these morphotactic rules over the repetion
of affixes and roots:</p>
</div>
<div id="S2.p5" class="ltx_para">
<table id="S2.E2" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E2.m1" class="ltx_Math" alttext="(M_{a}\cup M_{r})^{\star}\circ M_{x}," display="block"><mrow><mrow><msup><mrow><mo stretchy="false">(</mo><mrow><msub><mi>M</mi><mi>a</mi></msub><mo>∪</mo><msub><mi>M</mi><mi>r</mi></msub></mrow><mo stretchy="false">)</mo></mrow><mo>⋆</mo></msup><mo>∘</mo><msub><mi>M</mi><mi>x</mi></msub></mrow><mo>,</mo></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td>
</tr>
</table>
</div>
<div id="S2.p6" class="ltx_para">
<p class="ltx_p">where <math id="S2.p6.m1" class="ltx_Math" alttext="M_{a}" display="inline"><msub><mi>M</mi><mi>a</mi></msub></math> is the disjunction of affixes with joiners, <math id="S2.p6.m2" class="ltx_Math" alttext="M_{r}" display="inline"><msub><mi>M</mi><mi>r</mi></msub></math> the
disjunction of roots with joiners, and <math id="S2.p6.m3" class="ltx_Math" alttext="M_{x}" display="inline"><msub><mi>M</mi><mi>x</mi></msub></math> the morphotactics defined in
formula <a href="#S2.E1" title="(1) ‣ 2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>. This is a variation of morphology compilation
formula presented in various HFST documentation, such as <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib6" title="HFSTâframework for compiling and applying morphologies" class="ltx_ref">7</a>]</cite>.
</p>
</div>
<section id="S2.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">2.1 </span>Implementation Details</h3>
<div id="S2.SS1.p1" class="ltx_para">
<p class="ltx_p">There are lot of finer details we will not thoroughly cover in this article, as
they are mainly engineering details. In this section we shortly summarise
specific features of HFST-based FST compilation that result in meaningful
differences in automaton structure or working. One of the main source of
differences is that HFST automata are two-sided and compiled only ones from the
source code whereas Apertium generates two different automata for analysis and
generation. In these automata the structure may be different, since Apertium
dictionaries have ways of marking morphs limited to generation or analysis
only, so they will only be included in one of the automatons. Our approach to
this is to use special symbols called flag-diacritics <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib44" title="Finite state morphology" class="ltx_ref">1</a>]</cite> to
limit the paths as analysis only or generation only on runtime, but still
including all paths in the one transducer that gets compiled.</p>
</div>
<div id="S2.SS1.p2" class="ltx_para">
<p class="ltx_p">Another main difference in processing comes from the special word-initial,
word-final and separate morphs that in Apertium are contained in separate
automata altogether, but HFST tools do not support use of multiple automata
for analysis, so these special morphs will be concatenated optionally to
beginning or end of the word, or disjuncted to the final automata respectively.
These special morphs include things like article <em class="ltx_emph">l’</em> in French as bound
form.</p>
</div>
</section>
<section id="S2.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">2.2 </span>Creating a Spell-Checker Automatically</h3>
<div id="S2.SS2.p1" class="ltx_para">
<p class="ltx_p">To create a finite-state spell-checker we need two automata, one for the
language model, for which the dictionary compiled as described earlier will do,
and one for the error model <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib16" title="Finite-state spell-checking with weighted language and error models" class="ltx_ref">10</a>]</cite>. A classic baseline error
model is based on the edit distance
algorithm <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib31" title="Binary codes capable of correcting deletions, insertions, and reversals" class="ltx_ref">6</a>, <a href="#bib.bib29" title="A technique for computer detection and correction of spelling errors" class="ltx_ref">2</a>]</cite>, that defines typing errors of
four types: pressing extra key (insertion), not pressing a key (deletion),
pressing wrong key (change) and pressing two keys in wrong order (swap). There
have been many finite-state formulations of this, we use the one defined
in <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib26" title="Fast string correction with levenshtein-automata" class="ltx_ref">12</a>, <a href="#bib.bib16" title="Finite-state spell-checking with weighted language and error models" class="ltx_ref">10</a>]</cite>. The basic version of this where the
typing errors of each sort have equal likelihood for each letters can be
induced from the compiled language model, and this is what we use in this
paper. The induction of this model is relatively straightforward; when
compiling the automaton, save each unique UTF-8 codepoint found in the
morphs<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">5</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">5</sup>The description format of Apertium requires declaration of
exemplar character set as well, but as this is only used in the tokenisation
algorithm <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib2" title="Incremental construction and maintenance of morphological analysers based on augmented letter transducers" class="ltx_ref">4</a>]</cite> , which is not going to be used, we induce
the set from the morphs</span></span></span>. For each character generate the identities in start
and end state to model correctly typed runs. For each of the error types the
generate one arc from initial state to the end state modelling that error,
except for swap which it requires one auxiliary state for each character pair.</p>
</div>
</section>
</section>
<section id="S3" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">3 </span>Materials</h2>
<div id="S3.p1" class="ltx_para">
<p class="ltx_p">The Apertium project hosts a large number of morphological dictionaries for
each of the languages translated. From these we have selected three
dictionaries to be tested: Basque from Basque-Spanish pair as it is
released dictionary with the biggest on-disk size, Norwegian Nynorsk from the Norwegian pair as a language
that has some additional morphological complexity, such as compounding, and
Manx from as a language that currently lacks spell-checking tools to
demonstrate the plausibility of automatic conversion of Apertium dictionary
into a spell-checker<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">6</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">6</sup>We also provide a Makefile script to
recreate results of this article for any language in Apertium’s repository</span></span></span>.</p>
</div>
<div id="S3.p2" class="ltx_para">
<p class="ltx_p">To evaluate the use of resulting morphological dictionaries and spell-checkers
we use following Wikipedia database
dumps<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">7</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">7</sup><span class="ltx_ERROR undefined">\url</span>http://download.wikipedia.org/</span></span></span>:
<span class="ltx_text ltx_font_typewriter">euwiki-20120219-pages-articles.xml.bz2</span>,
<span class="ltx_text ltx_font_typewriter">nnwiki-20120215-pages-articles.xml.bz2</span>, and
<span class="ltx_text ltx_font_typewriter">gvwiki-20120215-pages-articles.xml.bz2</span>. For the purpose of this
article we performed very crude cleanup and preprocessing to Wikipedia data
picking up the text elements of the article and discarding most of Wikipedia
markup naïvely<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">8</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">8</sup>For details see the script in
<span class="ltx_ERROR undefined">\url</span>http://hfst.svn.sourceforge.net/viewvc/hfst/trunk/lrec-2011-apertium/.</span></span></span>.</p>
</div>
</section>
<section id="S4" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">4 </span>Test Setting and Evaluation</h2>
<div id="S4.p1" class="ltx_para">
<p class="ltx_p">To get one view on differences made by generic compilation formula instead of
direct automata building used by Apertium we look at the created automata, this
will also give us a rough idea of what its efficiency might be. In
table <a href="#S4.T1" title="Table 1 ‣ 4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a> we give the counts of nodes and edges, in that
order, in the graphs compiled from the dictionaries. Note, that in case
of Apertium it is the sum of all the separate automata states and edges that
is counted. The small differences in sizes of graphs are mostly caused by
the different handling of generation vs. analysis mode. The difference in sizes
of automata on disk in is shown in table <a href="#S4.T2" title="Table 2 ‣ 4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>.
The size of HFST automata can be attributed to the clever compression
algorithm used by HFST <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib36" title="HFST runtime format—a compacted transducer format allowing for fast lookup" class="ltx_ref">14</a>]</cite>.</p>
</div>
<figure id="S4.T1" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_l ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">Lang.</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">
<span class="ltx_text ltx_font_bold">Apertium </span><span class="ltx_text ltx_font_typewriter" style="font-size:90%;">LR</span>
</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">
<span class="ltx_text ltx_font_bold">Apertium </span><span class="ltx_text ltx_font_typewriter" style="font-size:90%;">RL</span>
</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">HFST</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r ltx_border_t">Basq.</th>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">30,114</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">34,005</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">34,824</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row ltx_border_l ltx_border_r"></th>
<td class="ltx_td ltx_align_right ltx_border_r">59,321</td>
<td class="ltx_td ltx_align_right ltx_border_r">68,030</td>
<td class="ltx_td ltx_align_right ltx_border_r">68,347</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r">Norg.</th>
<td class="ltx_td ltx_align_right ltx_border_r">56,226</td>
<td class="ltx_td ltx_align_right ltx_border_r">55,722</td>
<td class="ltx_td ltx_align_right ltx_border_r">56,871</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row ltx_border_l ltx_border_r"></th>
<td class="ltx_td ltx_align_right ltx_border_r">138,217</td>
<td class="ltx_td ltx_align_right ltx_border_r">132,475</td>
<td class="ltx_td ltx_align_right ltx_border_r">139,259</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r">Manx</th>
<td class="ltx_td ltx_align_right ltx_border_r">13,055</td>
<td class="ltx_td ltx_align_right ltx_border_r">12,955</td>
<td class="ltx_td ltx_align_right ltx_border_r">12,920</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row ltx_border_b ltx_border_l ltx_border_r"></th>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">28,220</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">27,062</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">27,031</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 1: </span>Size of HFST-based system against original (count of nodes first, then
edges)
</figcaption>
</figure>
<figure id="S4.T2" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_l ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">Lang.</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">
<span class="ltx_text ltx_font_bold">Apertium </span><span class="ltx_text ltx_font_typewriter" style="font-size:90%;">LR</span>
</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">
<span class="ltx_text ltx_font_bold">Apertium </span><span class="ltx_text ltx_font_typewriter" style="font-size:90%;">RL</span>
</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">HFST</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r ltx_border_t">Basq.</th>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">252 KiB</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">289 KiB</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">1,7 MiB</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r">Norg.</th>
<td class="ltx_td ltx_align_right ltx_border_r">558 KiB</td>
<td class="ltx_td ltx_align_right ltx_border_r">535 KiB</td>
<td class="ltx_td ltx_align_right ltx_border_r">3,7 MiB</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_l ltx_border_r">Manx</th>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">108 KiB</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">110 KiB</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">709 KiB</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 2: </span>Size of HFST-based system against original (as B on disk)
</figcaption>
</figure>
<div id="S4.p2" class="ltx_para">
<p class="ltx_p">To test efficiency we measure times of running various tasks. The times and
memory usage have been measured using GNU <span class="ltx_text ltx_font_typewriter">time</span> utility and
<span class="ltx_text ltx_font_typewriter">getrusage</span> system call’s <span class="ltx_text ltx_font_typewriter">ru_utime</span> field, averaged over three
test runs. The tests were performed on quad-core Intel Xeon E5450 @ 3.00 GHz
with 64 GiB of RAM.</p>
</div>
<div id="S4.p3" class="ltx_para">
<p class="ltx_p">First we measure speed of analysing a full corpus with the result automaton.
The speed is measured in the table <a href="#S4.T3" title="Table 3 ‣ 4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>, in
seconds to precision that was available in our system. Curiously the results
do not give direct advantage to either of the system but it seems to
depend on the language which system is a better choice for corpus analysis.</p>
</div>
<figure id="S4.T3" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_l ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">Language</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">Apertium</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">HFST</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r ltx_border_t">Basque</th>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">32.0 s</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">18.4 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r">Norwegian</th>
<td class="ltx_td ltx_align_right ltx_border_r">2.4 s</td>
<td class="ltx_td ltx_align_right ltx_border_r">5.5 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_l ltx_border_r">Manx</th>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">1.6 s</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">2.2 s</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 3: </span>Speed of HFST-based system against original in corpus analysis
(as s in user time)
</figcaption>
</figure>
<div id="S4.p4" class="ltx_para">
<p class="ltx_p">Similarly we measure the speed of current compilation process in
table <a href="#S4.T4" title="Table 4 ‣ 4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>. In here there’s an obvious advantage to
manual building of the automaton (see <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib1" title="Construcción y minimización eficiente de transductores de letras a partir de diccionarios con paradigmas" class="ltx_ref">11</a>]</cite> for the precise algorithm
used) over the finite-state algebra method, as is
in line with earlier results for lexc building in <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib17" title="HFST tools for morphology—an efficient open-source package for construction of morphological analyzers" class="ltx_ref">8</a>]</cite>.</p>
</div>
<figure id="S4.T4" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_l ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">Language</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">Apertium time</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">HFST time</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r ltx_border_t">Basque</th>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">35.7 s</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">160.0 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r">Norwegian</th>
<td class="ltx_td ltx_align_right ltx_border_r">6.6 s</td>
<td class="ltx_td ltx_align_right ltx_border_r">200.2 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_l ltx_border_r">Manx</th>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">0.8 s</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">11.2 s</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 4: </span>Speed of HFST-based system against original in compilation
(as seconds of user time)
</figcaption>
</figure>
<div id="S4.p5" class="ltx_para">
<p class="ltx_p">Finally we evaluate the usability of dictionaries meant for machine translation
as spell-checkers by running the finite-state spell checkers we produced
automatically through a large corpus and show the measure both speed and
quality of the results. The errors were automatically generated to Wikipedia
text’s correct words using simple algorithm that may generate one Levenshtein
error per each character position at probability of <math id="S4.p5.m1" class="ltx_Math" alttext="\frac{1}{33}" display="inline"><mfrac><mn>1</mn><mn>33</mn></mfrac></math>. This test
shows only rudimentary results on the plausibility of using machine translation
dictionary for spell-checking; for more thorough evaluation of efficiency of
finite-state spell-checking see <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib4" title="Language independent text correction using finite state automata" class="ltx_ref">5</a>]</cite>.</p>
</div>
<figure id="S4.T5" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_l ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">Language</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t"><span class="ltx_text ltx_font_bold">Speed (words/sec)</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r ltx_border_t">Basque</th>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">7,900</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_l ltx_border_r">Norwegian</th>
<td class="ltx_td ltx_align_right ltx_border_r">9,200</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_l ltx_border_r">Manx</th>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">4,700</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 5: </span>Efficiency of spelling correction in artificial test setup, average
over three runs.</figcaption>
</figure>
</section>
<section id="S5" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">5 </span>Conclusions</h2>
<div id="S5.p1" class="ltx_para">
<p class="ltx_p">In this article we have shown a general formula to compile morphological
dictionaries from machine-translation system Apertium in generic FST system of
HFST and using the result in HFST-based application of spell-checking.</p>
</div>
</section>
<section id="S6" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">6 </span>Future Work</h2>
<div id="S6.p1" class="ltx_para">
<p class="ltx_p">In this article we showed a basic method to gain more inter-operability between
generic FST system of HFST and a specialised morphological dictionary writing
formalism of machine-translation system Apertium by implementing a generic
compilation formula to compile the language descriptions. In future research
we are leveraging this and other related formulas into automatic optimisation
of the final automata using the information present in the language description
to optimise instead of relying generic graph algorithms for the final minimised
result automata.</p>
</div>
<div id="S6.p2" class="ltx_para">
<p class="ltx_p">We demonstrated importing the compiled dictionary as a language model and
inducing error model for real-world spell-checking applications. Further
development in this direction should aim for interoperable formalisms, formats
and mechanisms for language models and end applications of all relevant
language technology tools.</p>
</div>
</section>
<section id="Sx1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">Acknowledgements</h2>
<div id="Sx1.p1" class="ltx_para">
<p class="ltx_p">We thank the HFST and Apertium contributors for fruitful internet relayed chats,
and the two anonymous reviewers for their helpful suggestions.</p>
</div>
</section>
<section id="bib" class="ltx_bibliography">
<h2 class="ltx_title ltx_title_bibliography">References</h2>
<ul id="L1" class="ltx_biblist">
<li id="bib.bib44" class="ltx_bibitem ltx_bib_book">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[1]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">K. R. Beesley and L. Karttunen</span><span class="ltx_text ltx_bib_year"> (2003)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Finite state morphology</span>.
</span>
<span class="ltx_bibblock"> <span class="ltx_text ltx_bib_publisher">CSLI publications</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text isbn ltx_bib_external">ISBN 978-1575864341</span></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.SS1.p1" title="2.1 Implementation Details ‣ 2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.1</span></a>.
</span>
</li>
<li id="bib.bib29" class="ltx_bibitem ltx_bib_article">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[2]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. J. Damerau</span><span class="ltx_text ltx_bib_year"> (1964)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A technique for computer detection and correction of spelling errors</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Commun. ACM</span> (<span class="ltx_text ltx_bib_number">7</span>).
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.SS2.p1" title="2.2 Creating a Spell-Checker Automatically ‣ 2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>.
</span>
</li>
<li id="bib.bib5" class="ltx_bibitem ltx_bib_article">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[3]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">M. L. Forcada, M. GinestÃ-Rosell, J. Nordfalk, J. OâRegan, S. Ortiz-Rojas, J. A. Pérez-Ortiz, F. Sánchez-MartÃnez, G. RamÃrez-Sánchez and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2011-07)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Apertium: a free/open-source platform for rule-based machine translation</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Machine Translation</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text issn ltx_bib_external">ISSN 0922-6567</span>,
<a href="http://www.springerlink.com/content/h134p1j73377071k/export-citation/" title="" class="ltx_ref ltx_bib_external">Link</a>,
<a href="http://dx.doi.org/10.1007/s10590-011-9090-0" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S1.p1" title="1 Introduction ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>.
</span>
</li>
<li id="bib.bib2" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[4]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">A. Garrido-Alenda, M. L. Forcada and R. C. Carrasco</span><span class="ltx_text ltx_bib_year"> (2002)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Incremental construction and maintenance of morphological analysers based on augmented letter transducers</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of TMI 2002 (Theoretical and Methodological Issues in Machine Translation, Keihanna/Kyoto, Japan)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 53–62</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.SS2.p1" title="2.2 Creating a Spell-Checker Automatically ‣ 2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>.
</span>
</li>
<li id="bib.bib4" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[5]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">A. Hassan, S. Noeman and H. Hassan</span><span class="ltx_text ltx_bib_year"> (2008)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Language independent text correction using finite state automata</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Third International Joint Conference on Natural Language Processing</span>,
</span>
<span class="ltx_bibblock">Vol. <span class="ltx_text ltx_bib_volume">2</span>, <span class="ltx_text ltx_bib_pages"> pp. 913–918</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p5" title="4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>.
</span>
</li>
<li id="bib.bib31" class="ltx_bibitem ltx_bib_article">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[6]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">V. I. Levenshtein</span><span class="ltx_text ltx_bib_year"> (1966)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Binary codes capable of correcting deletions, insertions, and reversals</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Soviet Physics—Doklady 10, 707â710. Translated from Doklady Akademii Nauk SSSR</span>, <span class="ltx_text ltx_bib_pages"> pp. 845–848</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.SS2.p1" title="2.2 Creating a Spell-Checker Automatically ‣ 2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>.
</span>
</li>
<li id="bib.bib6" class="ltx_bibitem ltx_bib_inbook">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[7]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">K. Lindén, M. Silfverberg, E. Axelson, S. Hardwick and Pirinen</span><span class="ltx_text ltx_bib_year"> (2011)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">HFSTâframework for compiling and applying morphologies</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Systems and Frameworks for Computational Morphology</span>, <span class="ltx_text ltx_bib_editor">C. Mahlow and M. Pietrowski (Eds.)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_series">Communications in Computer and Information Science</span>, Vol. <span class="ltx_text ltx_bib_volume">Vol. 100</span>, <span class="ltx_text ltx_bib_pages"> pp. 67–85</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text isbn ltx_bib_external">ISBN 978-3-642-23137-7</span></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S1.p1" title="1 Introduction ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>,
<a href="#S1.p2" title="1 Introduction ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>,
<a href="#S2.p6" title="2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>.
</span>
</li>
<li id="bib.bib17" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[8]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">K. Lindén, M. Silfverberg and T. Pirinen</span><span class="ltx_text ltx_bib_year"> (2009)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">HFST tools for morphology—an efficient open-source package for construction of morphological analyzers</span>.
</span>
<span class="ltx_bibblock">See <span class="ltx_text ltx_bib_crossref"><cite class="ltx_cite"><a href="#bib.bib42" title="Workshop on systems and frameworks for computational morphology, sfcm 2009, zürich, switzerland, september 2009, proceedings" class="ltx_ref">Workshop on systems and frameworks for computational morphology, sfcm 2009, zürich, switzerland, september 2009, proceedings, Mahlow and Piotrowski</a></cite></span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 28–47</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p4" title="4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>.
</span>
</li>
<li id="bib.bib42" class="ltx_bibitem ltx_bib_proceedings">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[9]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_editor">C. Mahlow and M. Piotrowski (Eds.)</span><span class="ltx_text ltx_bib_year"> (2009)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Workshop on systems and frameworks for computational morphology, sfcm 2009, zürich, switzerland, september 2009, proceedings</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_series">Lecture Notes in Computer Science</span>, Vol. <span class="ltx_text ltx_bib_volume">41</span>, <span class="ltx_text ltx_bib_publisher">Springer</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text isbn ltx_bib_external">ISBN 978-3-642-04130-3</span></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#bib.bib17" title="HFST tools for morphology—an efficient open-source package for construction of morphological analyzers" class="ltx_ref">8</a>.
</span>
</li>
<li id="bib.bib16" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[10]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">T. A. Pirinen and K. Lindén</span><span class="ltx_text ltx_bib_year"> (2010)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Finite-state spell-checking with weighted language and error models</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Seventh SaLTMiL workshop on creation and use of basic lexical resources for less-resourced languagages</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_place">Valletta, Malta</span>, <span class="ltx_text ltx_bib_pages"> pp. 13–18</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="http://siuc01.si.ehu.es/%5C%7Ejipsagak/SALTMIL2010_Proceedings.pdf" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S1.p2" title="1 Introduction ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>,
<a href="#S2.SS2.p1" title="2.2 Creating a Spell-Checker Automatically ‣ 2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>.
</span>
</li>
<li id="bib.bib1" class="ltx_bibitem ltx_bib_article">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[11]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">S. O. Rojas, M. L. Forcada and G. R. Sánchez</span><span class="ltx_text ltx_bib_year"> (2005)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Construcción y minimización eficiente de transductores de letras a partir de diccionarios con paradigmas</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Procesamiento del Lenguaje Natural</span> (<span class="ltx_text ltx_bib_number">35</span>), <span class="ltx_text ltx_bib_pages"> pp. 51–57</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p4" title="4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>.
</span>
</li>
<li id="bib.bib26" class="ltx_bibitem ltx_bib_article">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[12]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">K. Schulz and S. Mihov</span><span class="ltx_text ltx_bib_year"> (2002)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Fast string correction with levenshtein-automata</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">International Journal of Document Analysis and Recognition</span> <span class="ltx_text ltx_bib_volume">5</span>, <span class="ltx_text ltx_bib_pages"> pp. 67–85</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.SS2.p1" title="2.2 Creating a Spell-Checker Automatically ‣ 2 Methods ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>.
</span>
</li>
<li id="bib.bib3" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[13]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">M. Silfverberg, M. Hyvärinen and T. Pirinen</span><span class="ltx_text ltx_bib_year"> (2011)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Improving predictive entry of finnish text messages using irc logs</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 69–76</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S1.p2" title="1 Introduction ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>.
</span>
</li>
<li id="bib.bib36" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[14]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">M. Silfverberg and K. Lindén</span><span class="ltx_text ltx_bib_year"> (2009-13 July)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">HFST runtime format—a compacted transducer format allowing for fast lookup</span>.
</span>
<span class="ltx_bibblock">See <span class="ltx_text ltx_bib_crossref"><cite class="ltx_cite"><a href="#bib.bib37" title="Pre-proceedings of the eighth international workshop on finite-state methods and natural language processing (fsmnlp 2009), pretoria, south africa, july 21st - 24th 2009" class="ltx_ref">Pre-proceedings of the eighth international workshop on finite-state methods and natural language processing (fsmnlp 2009), pretoria, south africa, july 21st - 24th 2009, Watson<span class="ltx_text ltx_bib_etal"> et al.</span></a></cite></span>,
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="http://www.ling.helsinki.fi/~klinden/pubs/fsmnlp2009runtime.pdf" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p1" title="4 Test Setting and Evaluation ‣ Compiling Apertium morphological dictionaries with HFST and using them in HFST applications\footnotepubrightsThis article was published in saltmil workshop in LREC 2011 in Malta. Original version \urlhttp://ixa2.si.ehu.es/saltmil/." class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>.
</span>
</li>
<li id="bib.bib37" class="ltx_bibitem ltx_bib_proceedings">
<span class="ltx_bibtag ltx_bib_key ltx_role_refnum">[15]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_editor">B. Watson, D. Courie, L. Cleophas and P. Rautenbach (Eds.)</span><span class="ltx_text ltx_bib_year"> (2009-13 July)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Pre-proceedings of the eighth international workshop on finite-state methods and natural language processing (fsmnlp 2009), pretoria, south africa, july 21st - 24th 2009</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_series"></span>, Vol. <span class="ltx_text ltx_bib_volume"></span>, <span class="ltx_text ltx_bib_publisher"></span>.
</span>
<span class="ltx_bibblock">Note: <span class="ltx_text ltx_bib_note">Handout CD-ROM containing the accompanying papers for the presentations during the FSMNLP 2009 workshop. Published by the University of Pretoria, Pretoria, South Africa.</span>
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text isbn ltx_bib_external">ISBN 978-1-86854-743-2</span></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#bib.bib36" title="HFST runtime format—a compacted transducer format allowing for fast lookup" class="ltx_ref">14</a>.
</span>
</li>
</ul>
</section>
</article>
</div>
<footer class="ltx_page_footer">
<div class="ltx_page_logo">Generated on Fri Sep 29 12:59:51 2017 by <a href="http://dlmf.nist.gov/LaTeXML/">LaTeXML <img src="" alt="[LOGO]"></a>
</div></footer>
</div>
</body>
</html>