-
Notifications
You must be signed in to change notification settings - Fork 3
/
Pirinen-2019-wmt-finnish-english.html
975 lines (915 loc) · 67.3 KB
/
Pirinen-2019-wmt-finnish-english.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
<!DOCTYPE html><html>
<head>
<title>Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task</title>
<!--Generated on Tue Aug 6 14:37:36 2019 by LaTeXML (version 0.8.4) http://dlmf.nist.gov/LaTeXML/.-->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" href="../latexml/LaTeXML.css" type="text/css">
<link rel="stylesheet" href="../latexml/ltx-article.css" type="text/css">
</head>
<body>
<div class="ltx_page_main">
<div class="ltx_page_content">
<article class="ltx_document ltx_authors_1line">
<h1 class="ltx_title ltx_title_document">Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019
shared task</h1>
<div class="ltx_authors">
<span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Tommi A Pirinen
<br class="ltx_break">Universität Hamburg
<br class="ltx_break">Hamburger Zentrum für Sprachkorpora
<br class="ltx_break"><a href="[email protected]" title="" class="ltx_ref ltx_url ltx_font_typewriter">[email protected]</a>
</span></span>
</div>
<div class="ltx_abstract">
<h6 class="ltx_title ltx_title_abstract">Abstract</h6>
<p class="ltx_p">In this paper I describe a rule-based, bi-directional machine translation
system for the Finnish—English language pair. The original system is
based on the existing data of FinnWordNet, omorfi and apertium-eng.
I have built the disambiguation, lexical selection and translation rules
by hand. The dictionaries and rules have been developed based
on the shared task data. I describe in this article the use of the
shared task data as a kind of a test-driven development workflow in RBMT
development and show that it suits perfectly to a modern software
engineering continuous integration workflow of RBMT and yields big increases
to BLEU scores with minimal effort. The system described in the article
is mainly developed during shared tasks.</p>
</div>
<section id="S1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2>
<span id="footnote1" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup>
<span class="ltx_tag ltx_tag_note">1</span>
Official version in ACL Anthology:
<a href="https://www.aclweb.org/anthology/sigs/sigmt/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://www.aclweb.org/anthology/sigs/sigmt/</a> (to appear), CC-BY version
4.0 international: <a href="https://creativecommons.org/licenses/by/4.0/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://creativecommons.org/licenses/by/4.0/</a></span></span></span>
<div id="S1.p1" class="ltx_para">
<p class="ltx_p">This paper describes our submission for Finnish—English language pair to the
machine translation shared task of the <span class="ltx_text ltx_font_italic">Fourth conference on machine
translation</span> (WMT19) at ACL 2019. Traditionally <span class="ltx_text ltx_font_italic">rule-based machine
translation</span> (RBMT) is not in the focus for WMT shared tasks, however,
there are two reasons I experimented with this system this year. One is that we
have had an extensively large amount of lesser used resources for this pair:
omorfi<span id="footnote2" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup>
<span class="ltx_tag ltx_tag_note">2</span>
<a href="https://github.com/flammie/omorfi" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/flammie/omorfi</a></span></span></span> <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib128" title="Development and use of computational morphology of finnish in the open source and open science era: notes on experiences with omorfi development." class="ltx_ref">11</a>]</cite> has well
over 400,000
lexemes<span id="footnote3" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">3</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">3</sup>
<span class="ltx_tag ltx_tag_note">3</span>
<a href="https://flammie.github.io/omorfi/statistics.html" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://flammie.github.io/omorfi/statistics.html</a></span></span></span>,
apertium-eng<span id="footnote4" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">4</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">4</sup>
<span class="ltx_tag ltx_tag_note">4</span>
<a href="http://wiki.apertium.org/wiki/English" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/English</a></span></span></span> has over
40,000 lexemes and
apertium-fin-eng<span id="footnote5" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">5</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">5</sup>
<span class="ltx_tag ltx_tag_note">5</span>
<a href="https://github.com/apertium/apertium-fin-eng" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium/apertium-fin-eng</a></span></span></span>
over 160,000 lexeme-to-lexeme translations. One of our key interests in the
shared task like this is that it provides an ideal data for test-driven
development of lexical resources.</p>
</div>
<div id="S1.p2" class="ltx_para">
<p class="ltx_p">One concept I experimented with the shared task is various degrees of
automation—expert supervision for the lexical data enrichment. In this
experiment I used automatic methods to refine the lexical selection of the
machine translation, and semi-automatised workflows for the generation
of the lexical data, as well as some expert-driven development of the more
grammatical rules like noun phrase chunking and determiner generation.
It might be noteworthy that this machine translator I describe in the
article is not actively developed outside the shared tasks,
so the article is moreso motivated as
an exploration of the workflow and methods on semi-automatically generated
shallow RBMT than a description of a fully developed RBMT.</p>
</div>
<div id="S1.p3" class="ltx_para">
<p class="ltx_p">The rest of the article is organised as follows: In Section <a href="#S2" title="2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>
I describe the components of our RBMT pipeline, in Section <a href="#S3" title="3 RBMT development workflow ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>
I describe the development workflow and in Section <a href="#S4" title="4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a> I show
the shared task results and I perform error
analysis and discuss the results and finally in Section <a href="#S5" title="5 Concluding remarks ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">5</span></a> we
summarise the findings.</p>
</div>
</section>
<section id="S2" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">2 </span>System description and setup</h2>
<div id="S2.p1" class="ltx_para">
<p class="ltx_p">The morphological analyser for Finnish is based on omorfi <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib128" title="Development and use of computational morphology of finnish in the open source and open science era: notes on experiences with omorfi development." class="ltx_ref">11</a>]</cite>, a large
morphological lexical database for Finnish. Data from omorfi has been converted
into Apertium format and is freely available in the apertium-style format in the
github repository
apertium-fin<span id="footnote6" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">6</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">6</sup>
<span class="ltx_tag ltx_tag_note">6</span>
<a href="https://github.com/apertium/apertium-fin" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium/apertium-fin</a></span></span></span>. For
English I have used Apertium’s standard English analyser
apertium-eng<span id="footnote7" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">7</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">7</sup>
<span class="ltx_tag ltx_tag_note">7</span>
<a href="https://github.com/apertium/apertium-eng" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium/apertium-eng</a></span></span></span>. Both
analysers were downloaded from github in the beginning of the shared task and we
have updated and further developed them based on the development data during the
shared task. I developed the Apertium’s Finnish-English<span id="footnote8" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">8</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">8</sup>
<span class="ltx_tag ltx_tag_note">8</span>
<a href="https://github.com/apertium/apertium-fin-eng" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium/apertium-fin-eng</a></span></span></span> dictionary initially
based on the FinnWordNet’s translated data, which was over 260,000 Wordnet-style
lexical items; of these I discarded most which had multiple spaces in them or
didn’t match any source or target words in Finnish and English dictionaries,
ending with around 150,000 lexical translations. The size of dictionaries at
the time of writing is summarized in Table <a href="#S2.T1" title="Table 1 ‣ 2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>, however
more up-to-date numbers can be found in Apertium’s
Wiki <span id="footnote9" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">9</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">9</sup>
<span class="ltx_tag ltx_tag_note">9</span>
<a href="http://wiki.apertium.org/wiki/List_of_dictionaries" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/List_of_dictionaries</a></span></span></span></p>
</div>
<figure id="S2.T1" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt">Dictionary</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_tt">Lexemes</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_tt">Manual rules</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left ltx_border_t">Finnish</td>
<td class="ltx_td ltx_align_right ltx_border_t">426,425</td>
<td class="ltx_td ltx_align_right ltx_border_t">143</td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left">English</td>
<td class="ltx_td ltx_align_right">40,185</td>
<td class="ltx_td ltx_align_right">187</td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left ltx_border_bb">Finnish-English</td>
<td class="ltx_td ltx_align_right ltx_border_bb">164,501</td>
<td class="ltx_td ltx_align_right ltx_border_bb">273</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 1: </span>Sizes of dictionaries. The numbers are numbers of unique word
entries or translation entries as defined in the dictionary, e.g., homonymy
judgements have been made by the dictionary
writers. The rule counts are combined counts of all sorts of linguistic
rules: disambiguation, lexical selection, transfer and so
forth.</figcaption>
</figure>
<div id="S2.p2" class="ltx_para">
<p class="ltx_p">The system is based on the Apertium<span id="footnote10" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">10</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">10</sup>
<span class="ltx_tag ltx_tag_note">10</span>
<a href="https://github.com/apertium" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium</a></span></span></span>
machine translation platform <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib40" title="Apertium: a free/open-source platform for rule-based machine translation platform" class="ltx_ref">3</a>]</cite>, a shallow transfer
rule-based machine translation toolkit. For morphological analysis and
generation, HFST<span id="footnote11" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">11</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">11</sup>
<span class="ltx_tag ltx_tag_note">11</span>
<a href="https://hfst.github.io" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://hfst.github.io</a></span></span></span> <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib82" title="Hfst—framework for compiling and applying morphologies" class="ltx_ref">8</a>]</cite> is
used and for morphological disambiguation VISL
CG-3 <span id="footnote12" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">12</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">12</sup>
<span class="ltx_tag ltx_tag_note">12</span>
<a href="http://visl.sdu.dk/cg3.html" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://visl.sdu.dk/cg3.html</a></span></span></span> is used. The whole platform
as well as all the linguistic data are licensed under the GNU General Public
Licence (GPL).</p>
</div>
<div id="S2.p3" class="ltx_para">
<p class="ltx_p">Apertium is a modular NLP system based on UNIX command-line ideology. The source
text is processed step-by-step to form a shallow analysis (morphological
analysis), then translated (lexical transfer) and re-arranged (structural
transfer) to target language analyses and finally generated (morphological
generation). Each of the steps can be processed with arbitrary command-line tool
that transforms the input in expected formats. All of the steps also involve
ambiguity or one-to-many mappings, that requires a decision, and while these
decisions can be made using expert written rules, the writing of the rules is
also a demanding task, and it is interesting to see how much can be achieved by
simply bootstrapping the rulesets using automatic rule acquisition.</p>
</div>
<div id="S2.p4" class="ltx_para">
<p class="ltx_p">To illuminate how apertium does RBMT in Finnish—English, and the kinds of
ambiguities I resolve, I show in Table <a href="#S2.T3" title="Table 3 ‣ 2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a> examples of the
ambiguities with an example sentence. The ambiguity of source morphology is the
true ambiguity rate of the language (according to the morphological analyser),
i.e. how many potential interpretation each word has. It is no surprise that
Finnish has relatively high ambiguity rate, however, English is nearly
unambiguous is more due to limitation of apertium’s English dictionary than
feature of English per se, given that English has a bit of productive
zero-derivations, e.g. verbing nouns and vice versa. The lexical selection
ambiguity is the translation dictionary’s rate of choices per source word, and
FinnWordNet on average has 5 synonyms per word to suggest. The target morphology
ambiguity is the rate of allomorphy or free variation, in Finnish as target
language there’s some systematic problems, such as plural genitives and
partitives, whereas English literally has two incidents in the whole dev set:
<span class="ltx_text ltx_font_italic">sown / sowed</span> and <span class="ltx_text ltx_font_italic">fish / fishes</span>. Assuming a perfect RBMT
system would keep all options open, until final decision, the number of
hypotheses at the end would be at least
<math id="S2.p4.m1" class="ltx_Math" alttext="\mathrm{MA_{SL}}\times\mathrm{LS_{SL\rightarrow TL}}\times\mathrm{MA_{TL}}" display="inline"><mrow><msub><mi>MA</mi><mi>SL</mi></msub><mo>×</mo><msub><mi>LS</mi><mrow><mi>SL</mi><mo>→</mo><mi>TL</mi></mrow></msub><mo>×</mo><msub><mi>MA</mi><mi>TL</mi></msub></mrow></math>,
where <math id="S2.p4.m2" class="ltx_Math" alttext="MA" display="inline"><mrow><mi>M</mi><mo></mo><mi>A</mi></mrow></math> is morphological ambiguity rate, <math id="S2.p4.m3" class="ltx_Math" alttext="LS" display="inline"><mrow><mi>L</mi><mo></mo><mi>S</mi></mrow></math> is lexical selection ambiguity
rate, <math id="S2.p4.m4" class="ltx_Math" alttext="{}_{SL}" display="inline"><msub><mi></mi><mrow><mi>S</mi><mo></mo><mi>L</mi></mrow></msub></math> is source language and <math id="S2.p4.m5" class="ltx_Math" alttext="{}_{TL}" display="inline"><msub><mi></mi><mrow><mi>T</mi><mo></mo><mi>L</mi></mrow></msub></math> is target language. For
Finnish—English I show the example figures of the ambiguities based on the
development and test sets in Table <a href="#S2.T2" title="Table 2 ‣ 2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>.</p>
</div>
<figure id="S2.T2" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt">Feature:</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_tt"><span class="ltx_text ltx_font_bold">Source</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_tt"><span class="ltx_text ltx_font_bold">Lexical</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_tt"><span class="ltx_text ltx_font_bold">Target</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_tt"><span class="ltx_text ltx_font_italic">Total</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">Corpus</th>
<td class="ltx_td ltx_align_right"><span class="ltx_text ltx_font_bold">morphology</span></td>
<td class="ltx_td ltx_align_right">selection</td>
<td class="ltx_td ltx_align_right">morphology</td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_t">Finnish dev set</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1.68</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">5.04</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1.0002</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">8.46</th>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">Finnish test set</th>
<td class="ltx_td ltx_align_right">1.69</td>
<td class="ltx_td ltx_align_right">4.80</td>
<td class="ltx_td ltx_align_right">1.0003</td>
<td class="ltx_td ltx_align_right">8.13</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_t">English dev set</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1.04</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1.15</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1.0013</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1.19</th>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb">English test set</th>
<td class="ltx_td ltx_align_right ltx_border_bb">1.03</td>
<td class="ltx_td ltx_align_right ltx_border_bb">1.12</td>
<td class="ltx_td ltx_align_right ltx_border_bb">1.0006</td>
<td class="ltx_td ltx_align_right ltx_border_bb">1.15</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 2: </span>Ambiguity influencing RBMT Finnish-to-English and
English-to-Finnish</figcaption>
</figure>
<div id="S2.p5" class="ltx_para">
<p class="ltx_p">The rule-based machine translation process as it is performed by apertium is
shown in Table <a href="#S2.T3" title="Table 3 ‣ 2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>. The first step of the RBMT here is
morphological analysis, in apertium this covers both tokenisation and
morphological analysis as seen here; in apertium-eng the expression ‘in front
of’ is considered to be a single token and is packaged as a preposition (we have
also omitted an ambiguity between attributive and nominal reading of the house,
since the distinction does not currently make difference in English to Finnish
translation, in order to fit the table in the paper). The morphological analysis
in apertium is performed by finite-state morphological analysis as defined
in <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib9" title="Finite state morphology" class="ltx_ref">1</a>]</cite> and implemented in open source format
by <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib82" title="Hfst—framework for compiling and applying morphologies" class="ltx_ref">8</a>]</cite>. After analysis, the next step is to disambiguate,
i.e. pick 1-best lists of morphological analyses; in apertium this is done by
constraint grammar, as described by <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib61" title="Constraint grammar as a framework for parsing unrestricted text" class="ltx_ref">5</a>]</cite> and
implemented in open source by VISL CG
3.<span id="footnote13" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">13</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">13</sup>
<span class="ltx_tag ltx_tag_note">13</span>
<a href="http://visl.sdu.dk/cg3.html" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://visl.sdu.dk/cg3.html</a></span></span></span>. In lexical translation phase,
each lemma is looked up from the translation dictionary, and in lexical
selection the translation that is most suitable by the context and statistics is
selected. In the structural transfer phase a number of things is performed: the
English morphological analyses are rewritten into Finnish analyses, e.g. the
adjective and noun will receive a genitive case tag due to the adposition, and
the adposition is moved before the noun phrase since it is a preposition in
Finnish and postposition in English, and the article is just removed, as the use
of articles is non-standard in Finnish. Finally the Finnish analysis is
generated into a surface string using a finite-state morphological analyser,
since they are inherently bidirectional this needs no extra software or
algorithms.</p>
</div>
<figure id="S2.T3" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt"><span class="ltx_text ltx_font_bold">Input:</span></th>
<th class="ltx_td ltx_align_justify ltx_th ltx_th_column ltx_border_tt" style="width:286.2pt;">In front of the big house</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Morphological analysis:</span></th>
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:286.2pt;">In front of.<span class="ltx_text ltx_font_smallcaps">Prep</span> the.<span class="ltx_text ltx_font_smallcaps">Det.Def.Sp</span>
big.<span class="ltx_text ltx_font_smallcaps">Adj</span> house.<span class="ltx_text ltx_wrap ltx_font_smallcaps">N.Sg</span>
</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Morphological disambiguation:</span></th>
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:286.2pt;">In front of.<span class="ltx_text ltx_font_smallcaps">Prep</span> the.<span class="ltx_text ltx_font_smallcaps">Det.Def.Sp</span>
big.<span class="ltx_text ltx_font_smallcaps">Adj</span> house.<span class="ltx_text ltx_wrap ltx_font_smallcaps">N.Sg</span>
</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Lexical translation:</span></th>
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:286.2pt;">In front of.<span class="ltx_text ltx_font_smallcaps">Prep<math id="S2.T3.m1" class="ltx_Math" alttext="\rightarrow" display="inline"><mo mathvariant="normal">→</mo></math></span>Edessä.<span class="ltx_text ltx_font_smallcaps">Post</span>
the.<span class="ltx_text ltx_font_smallcaps">Det.Def.Sp<math id="S2.T3.m2" class="ltx_Math" alttext="\rightarrow" display="inline"><mo mathvariant="normal">→</mo></math></span>se.<span class="ltx_text ltx_font_smallcaps">Det.Def.Sp</span>
big.<span class="ltx_text ltx_font_smallcaps">Adj<math id="S2.T3.m3" class="ltx_Math" alttext="\rightarrow" display="inline"><mo mathvariant="normal">→</mo></math></span>iso<math id="S2.T3.m4" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>raju<math id="S2.T3.m5" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>paha<math id="S2.T3.m6" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>kova<math id="S2.T3.m7" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>…jalomielinen.<span class="ltx_text ltx_font_smallcaps">adj</span>
house.<span class="ltx_text ltx_font_smallcaps">N.Sg<math id="S2.T3.m8" class="ltx_Math" alttext="\rightarrow" display="inline"><mo mathvariant="normal">→</mo></math></span>huone<math id="S2.T3.m9" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>talo<math id="S2.T3.m10" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>suku<math id="S2.T3.m11" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>…edustajainhuone.<span class="ltx_text ltx_wrap ltx_font_smallcaps">N.Sg</span>
</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Lexical selection:</span></th>
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:286.2pt;">Edessä.<span class="ltx_text ltx_font_smallcaps">Post</span> se.<span class="ltx_text ltx_font_smallcaps">Det.Def.Sp</span> iso.<span class="ltx_text ltx_font_smallcaps">Adj</span> talo.<span class="ltx_text ltx_wrap ltx_font_smallcaps">N.Sg</span>
</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Structural transfer:</span></th>
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:286.2pt;">iso.<span class="ltx_text ltx_font_smallcaps">Adj.Pos.Sg.Gen</span> talo.<span class="ltx_text ltx_font_smallcaps">N.Sg.Gen</span>
Edessä.<span class="ltx_text ltx_wrap ltx_font_smallcaps">Post</span>
</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb ltx_border_t"><span class="ltx_text ltx_font_bold">Finnish translation:</span></th>
<td class="ltx_td ltx_align_justify ltx_border_bb ltx_border_t" style="width:286.2pt;">ison talon Edessä</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 3: </span>Translation process for the English phrase ‘In front of the big house’
</figcaption>
</figure>
</section>
<section id="S3" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">3 </span>RBMT development workflow</h2>
<div id="S3.p1" class="ltx_para">
<p class="ltx_p">I present here different levels of automation in the RBMT workflow: in
Subsection <a href="#S3.SS1" title="3.1 Lexical selection training ‣ 3 RBMT development workflow ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.1</span></a> I have automated the generation of
rules, in Subsection <a href="#S3.SS2" title="3.2 Lexicon development workflow ‣ 3 RBMT development workflow ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2</span></a> I have a semi-automated
workflow and finally in Subsection <a href="#S3.SS3" title="3.3 Grammar development ‣ 3 RBMT development workflow ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.3</span></a> I have
an expert-driven development workflow.</p>
</div>
<section id="S3.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.1 </span>Lexical selection training</h3>
<div id="S3.SS1.p1" class="ltx_para">
<p class="ltx_p">One of the key components of this experiment was to try automatic rule-creation
mechanisms for the converted Wordnet dictionary refinement. A large number of
translation quality issues in the initial converted Wordnet dictionary was a
high number of low-frequency ‘synonyms’ in translations. To overcome this some
automatic methods were used. For automatic bootstrapping of the lexical
selection rules I used Europarl corpus <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib66" title="Europarl: a parallel corpus for statistical machine translation" class="ltx_ref">6</a>]</cite> data and the
methods demonstrated by <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib150" title="Flexible finite-state lexical selection for rule-based machine translation" class="ltx_ref">12</a>]</cite>. Since the result of this
training seemed also insufficient, I experimented with another system to
generate more rules for lexical
selection.<span id="footnote14" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">14</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">14</sup>
<span class="ltx_tag ltx_tag_note">14</span>
<a href="https://svn.code.sf.net/p/apertium/svn/trunk/apertium-swe-nor/dev/lex-learn-unigram.sh" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://svn.code.sf.net/p/apertium/svn/trunk/apertium-swe-nor/dev/lex-learn-unigram.sh</a></span></span></span>
On top of that, I have updated the lexical selection with some manual rules,
that were either not covered by Europarl hits or skewed wrongly for the news
domain, for example, the word ‘letter’ seemed to mainly have translations of
<span class="ltx_text ltx_font_italic">kirje</span> (a message written on paper), while in the development set all
the sentences I sampled, a more suitable translation would of been
<span class="ltx_text ltx_font_italic">kirjain</span> (a character of alphabet). The resulting lexical selection
rule sets are summarised in the table <a href="#S3.T4" title="Table 4 ‣ 3.1 Lexical selection training ‣ 3 RBMT development workflow ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>. The first
method of creating rules is based on n-gram patterns, due to restricted time and
processing resources I have only included bigrams into this model, and the
second model only considers unigrams. The results are added up in the table
lines <span class="ltx_text ltx_font_italic">+ bigrams</span> and <span class="ltx_text ltx_font_italic">+ unigrams</span> respectively.</p>
</div>
<figure id="S3.T4" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_tt">Orig. Fin-Eng</th>
<td class="ltx_td ltx_align_right ltx_border_tt">18,066</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">+ bigrams</th>
<td class="ltx_td ltx_align_right">24,662</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">+ unigrams</th>
<td class="ltx_td ltx_align_right">30,049</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">Orig. Eng-Fin</th>
<td class="ltx_td ltx_align_right ltx_border_t">22</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">+ bigrams</th>
<td class="ltx_td ltx_align_right">24,631</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb">+ unigrams</th>
<td class="ltx_td ltx_align_right ltx_border_bb">25,748</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 4: </span>Lexical selection rules statistically generated</figcaption>
</figure>
</section>
<section id="S3.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.2 </span>Lexicon development workflow</h3>
<div id="S3.SS2.p1" class="ltx_para">
<p class="ltx_p">One of the key components of this experiment is to show that a
<span class="ltx_text ltx_font_italic">shared-task driven development</span> (STDD) is a usable workflow
for the development of the lexical data in rule-based machine translation
system. As such, a ‘training’ phase in the RBMT development has been replaced
by a very simple semi-automated native speaker -driven project workflow
consisting of following:</p>
</div>
<div id="S3.SS2.p2" class="ltx_para">
<ol id="S3.I1" class="ltx_enumerate">
<li id="S3.I1.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">1.</span>
<div id="S3.I1.i1.p1" class="ltx_para">
<p class="ltx_p">Collect all lexemes unknown to source language dictionary, and add them
with necessary morpholexical information</p>
</div>
</li>
<li id="S3.I1.i2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">2.</span>
<div id="S3.I1.i2.p1" class="ltx_para">
<p class="ltx_p">Collect all lexemes unknown to bilingual translation dictionary,
and add their translations</p>
</div>
</li>
<li id="S3.I1.i3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">3.</span>
<div id="S3.I1.i3.p1" class="ltx_para">
<p class="ltx_p">Collect all lexemes unknown to the target language dictionary,
and add them to the dictionary with necessary morpholexical information</p>
</div>
</li>
</ol>
</div>
<div id="S3.SS2.p3" class="ltx_para">
<p class="ltx_p">The semi-automation that I have developed lies in collecting the different
unknown lexemes or <span class="ltx_text ltx_font_italic">out-of-vocabulary</span> items (OOVs), and guessing a
lexical entry or multiple plausible entries for them and have the dictionary
writer select and correct them.</p>
</div>
</section>
<section id="S3.SS3" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.3 </span>Grammar development</h3>
<div id="S3.SS3.p1" class="ltx_para">
<p class="ltx_p">An expert-driven part of the RBMT workflow in our current methodology is the
grammar development. This consists manually reading the sentences produced by
the MT system to spot systematic errors caused by grammatical differences
between languages. For the purposes of this shared task and the workshop, the
linguistics or grammar are not a central concept, so I will not detail it here
in detail. In practice this concerns of such grammatical rules as mapping
between no articles in Finnish to articles in English, mapping between case or
possessive suffixes and their corresponding lexical representations in English
and so forth. The details can be seen in the code that is available in github.</p>
</div>
</section>
</section>
<section id="S4" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">4 </span>Evaluation, error analysis and discussion</h2>
<div id="S4.p1" class="ltx_para">
<p class="ltx_p">The automatic measurements as used by the shared task are given in the
table <a href="#S4.T5" title="Table 5 ‣ 4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">5</span></a>. I show here the BLEU <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib109" title="BLEU: a method for automatic evaluation of machine translation" class="ltx_ref">10</a>]</cite> and the CharacTER scores.
BLEU, as it is a kind of industry standard, and CharacTER <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib154" title="Character: translation edit rate on character level" class="ltx_ref">13</a>]</cite> as it is maybe more suited for
morphologically complex languages. As the automatic scores show, the rule-based
system has still room for improvement.</p>
</div>
<figure id="S4.T5" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_tt"><span class="ltx_text ltx_font_bold">Corpus</span></th>
<td class="ltx_td ltx_align_right ltx_border_tt">BLEU-cased</td>
<td class="ltx_td ltx_align_right ltx_border_tt">CharacTER</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">MSRA.NAO</th>
<td class="ltx_td ltx_align_right ltx_border_t">27.4</td>
<td class="ltx_td ltx_align_right ltx_border_t">0.515</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">HelsinkiNLP RBMT</th>
<td class="ltx_td ltx_align_right">8.9</td>
<td class="ltx_td ltx_align_right">0.650</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">apertium-eng-fin</span></th>
<td class="ltx_td ltx_align_right">4.3</td>
<td class="ltx_td ltx_align_right">0.756</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">USYD</th>
<td class="ltx_td ltx_align_right ltx_border_t">33.0</td>
<td class="ltx_td ltx_align_right ltx_border_t">0.494</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb"><span class="ltx_text ltx_font_bold">apertium-fin-eng</span></th>
<td class="ltx_td ltx_align_right ltx_border_bb">7.6</td>
<td class="ltx_td ltx_align_right ltx_border_bb">0.736</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 5: </span>automatic scores from <a href="http://matrix.statmt.org" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://matrix.statmt.org</a>, we show our scores (boldfaced), the highest
ranking RBMT and the highest ranking NMT for reference.</figcaption>
</figure>
<div id="S4.p2" class="ltx_para">
<p class="ltx_p">I find that a linguistic error analysis is one of the most interesting part of
this experiment. The reason for this is is that the experiment’s scientific
contribution lies more in the extension of linguistic resources and workflows
than machine learning algorithm design. It is noteworthy, that in
the sustainable workflow I demonstrate in this article, error analysis is a
part of the workflow, namely, adding of the lexical data and rules follows the
layout given in Section <a href="#S3" title="3 RBMT development workflow ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a> and is the same for development and
error analysis phase. I have, to that effect, categorised the errors in
translations along the workflow:</p>
</div>
<div id="S4.p3" class="ltx_para">
<ol id="S4.I1" class="ltx_enumerate">
<li id="S4.I1.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">1.</span>
<div id="S4.I1.i1.p1" class="ltx_para">
<p class="ltx_p">OOV in source language dictionary (including typos and non-words)</p>
</div>
</li>
<li id="S4.I1.i2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">2.</span>
<div id="S4.I1.i2.p1" class="ltx_para">
<p class="ltx_p">OOV in bilingual dictionary</p>
</div>
</li>
<li id="S4.I1.i3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">3.</span>
<div id="S4.I1.i3.p1" class="ltx_para">
<p class="ltx_p">OOV in target language dictionary</p>
</div>
</li>
<li id="S4.I1.i4" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">4.</span>
<div id="S4.I1.i4.p1" class="ltx_para">
<p class="ltx_p">disambiguation or lexical selection fail</p>
</div>
</li>
<li id="S4.I1.i5" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">5.</span>
<div id="S4.I1.i5.p1" class="ltx_para">
<p class="ltx_p">structural failure or higher level
</p>
</div>
</li>
</ol>
</div>
<div id="S4.p4" class="ltx_para">
<p class="ltx_p">The OOV’s can be calculated automatically from the corpus data, but the higher
level failures need human annotation. A summary of the errors can be seen in the
table <a href="#S4.T6" title="Table 6 ‣ 4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a>, this is based on the errors that were fixed as a part
of error analysis process. As a result of this workflow, I have improved the
BLEU points of apertium-fin-eng over the years, as can be seen in the
table <a href="#S4.T7" title="Table 7 ‣ 4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">7</span></a>.</p>
</div>
<figure id="S4.T6" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt">Error</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_tt">count</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">OOVs in Finnish</th>
<td class="ltx_td ltx_align_right ltx_border_t">763</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">OOVs in English</th>
<td class="ltx_td ltx_align_right">943</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb">OOVs in Fin↔Eng</th>
<td class="ltx_td ltx_align_right ltx_border_bb">2696</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 6: </span>Classification of mainly lexical errors in apertium-fin-eng
submissions for 2019</figcaption>
</figure>
<figure id="S4.T7" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_tt"><span class="ltx_text ltx_font_bold">Corpus</span></th>
<td class="ltx_td ltx_align_right ltx_border_tt">BLEU-cased</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">apertium-eng-fin 2015</th>
<td class="ltx_td ltx_align_right ltx_border_t">2.9</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_center ltx_th ltx_th_row">2017</th>
<td class="ltx_td ltx_align_right">3.5</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_center ltx_th ltx_th_row">2019</th>
<td class="ltx_td ltx_align_right">4.3</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">apertium-fin-eng 2015</th>
<td class="ltx_td ltx_align_right ltx_border_t">6.9</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_center ltx_th ltx_th_row">2017</th>
<td class="ltx_td ltx_align_right">6.3</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb">2019</th>
<td class="ltx_td ltx_align_right ltx_border_bb">7.6</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 7: </span>Progress of apertium-fin-eng over the years using only the WMT
shared task driven development method.</figcaption>
</figure>
<div id="S4.p5" class="ltx_para">
<p class="ltx_p">The OOV numbers might look moderately large but a major part falls under proper
nouns, which are generally low frequency and do not cause a large problem in
translation pipeline, the untranslated proper noun is recognisable and the
mapping of adpositions and case inflections will fail where applicable. The task
of adding proper nouns to the dictionaries is also simplest, they are easy to
gather from the text, and for English and bilingual dictionaries no further
classification is necessary; for the Finnish dictionary entry generation,
paradigm guessing is necessary, although the paradigms used in foreign names are
much more limited than with other parts-of-speech to be added. In the
<span class="ltx_text ltx_font_italic">newstest 2019</span> data there was a number of words that I decided not to
add to our dictionaries, unlike our usual workflow where I aim at virtual 100
% coverage with gold corpora. The unadded words were for example words like
“Toimiluvanmuodossatoteutettavajulki-senjayksityisensektorinkumppanuus”, which
seems to have a large number of missing spaces and extra hyphen, these as well
as extraneous spaces were quite common in the data in our error analysis as well
as ‘words’ like ‘OIet’, ‘OIi’, ‘OIin’, ‘OIisi’, ‘OIIut’, i.e. forms of ‘olla’
(to be) where lowercase L has been replaced with uppercase I. While I do
account for common spelling mistakes in our dictionaries, these kind of errors
are probably more suited for robustness testing and implemented with spelling
correction methods for specific problematic generated text, such as OCR. We
will look into implementing spelling correction into our pipeline in the
future. Comparing the performance of RBMT to NMT, it can be clearly seen that
contemporary NMT is better suited for error tolerance, in part because it can
be more character-based than token-based, in part because any large training
data set will actually have some OCR errors and run-in tokens.</p>
</div>
<div id="S4.p6" class="ltx_para">
<p class="ltx_p">After OOV-errors one of the biggest easily solvable problems is ambiguity, so
word sense disambiguation and lexical selection. For lexical selection I found
about 200 lexical translations that were still badly wrong and could be solved
without coming up complex context conditions. For disambiguation problems, a
surprisingly common problem was sentence-initial proper noun that is a common
noun as well, as a high frequency example, for the word ‘trump’ meaning a
winning suit in card games (= Finnish ‘valtti’) would get selected over the
POTUS, plausibly when most of the training and development before WMT 2019 did
not contain so many proper noun Trumps. Also rather common problem still is the
ambiguity in English verb forms, and between English zero derivations.</p>
</div>
<div id="S4.p7" class="ltx_para">
<p class="ltx_p">In the structural transfer a large number of errors are caused by long-distance
re-ordering. For example for Finnish to English proper noun phrases regardless
of length of the phrase, the Finnish shows case in last word or postposition
after the last word, English has preposition before the word, but when phrase
gets chunked partially the adpositions or case suffixes end up in the middle
with a rather jarring effect to the translated sentence. The same applies for
other effects where generating correct language depends on correct chunk
detection, e.g. the article generation is very limited in the current code
because the articles need to be generated from nothing, when translating from
Finnish to English, only at the very beginning of specific noun phrases.</p>
</div>
<div id="S4.p8" class="ltx_para">
<p class="ltx_p">Finally a number of problems were caused for such grammatical differences
between languages that do not have a good solution in lexical rule-based machine
translation, such as difference between English noun phrases and corresponding
Finnish compound nouns or for example the common English class of -able suffixed
adjectives that does not have accurate lexical Finnish translation at all.</p>
</div>
<div id="S4.p9" class="ltx_para">
<p class="ltx_p">In terms of where RBMT is perhaps more usable than NMT, one important factor is
how predictable and systematic the errors are when they appear. For example just
looking at the first page of the top-ranking system in
Finnish-to-English<span id="footnote15" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">15</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">15</sup>
<span class="ltx_tag ltx_tag_note">15</span>
<a href="http://matrix.statmt.org/matrix/output/1903?score_id=39757" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://matrix.statmt.org/matrix/output/1903?score_id=39757</a></span></span></span> one can
see the Finnish “Aika nopeasti saatiin hommat sovittua, Kouki sanoi” translated into
“Pretty quickly we got the gays agreed, Kouki said.” whereas the correct translation
is “We reached a pretty quick agreement, Kouki said.”, the big problem with the neural
translation is that it is deceptively fluent language but conveys something completely
different, comparing to the rule-based version: “Kinda swiftly let jobs agreed, Kouki said.” which
is not fluent at all, but doesn’t hallucinate gays there so it may be more usable for
post-editing. For further research in the problems of NMT for real-world use, see for
example <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib100" title="Translators’ perceptions of literary post-editing using statistical and neural machine translation" class="ltx_ref">9</a>]</cite>.</p>
</div>
<div id="S4.p10" class="ltx_para">
<p class="ltx_p">In comparison to neural and statistical systems, the rule-based approach does not
generally fare well as measured with automatic metrics like BLEU, for a human
evaluation refer to <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib17" title="Findings of the 2019 conference on machine translation (wmt19)" class="ltx_ref">2</a>]</cite>. However, the experiment I describe here is also
not the most actively developed machine translators, rather I use the experiment
to gauge the effects the described workflow has to quality of semi-automatically
generated RBMT, to see how more developed systems fare on the same task you
should also refer to <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib57" title="Rule-based machine translation from english to finnish" class="ltx_ref">4</a>, <a href="#bib.bib72" title="GF wide-coverage english-finnish mt system for wmt 2015" class="ltx_ref">7</a>]</cite>.</p>
</div>
</section>
<section id="S5" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">5 </span>Concluding remarks</h2>
<div id="S5.p1" class="ltx_para">
<p class="ltx_p">In this article I’ve shown a workflow of <span class="ltx_text ltx_font_italic">shared task driven
development</span> for rule-based machine translations, namely the lexicons and rules.
I show that a small effort to update lexical data based on yearly released gold
corpora increases BLEU points and enlarges dictionaries as well as improves
rulesets sizes and qualities by a significant amount. In future I aim to build
more automatisation for the workflow to make it trivially usable with continuous
integration.</p>
</div>
<div id="S5.p2" class="ltx_para">
<p class="ltx_p">The systems are all available as free/libre open-source software under the GNU
GPL licence, and can be downloaded from the internet.</p>
</div>
</section>
<section id="Sx1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">Acknowledgements</h2>
<div id="Sx1.p1" class="ltx_para">
<p class="ltx_p">This work has been written while employed in the <span class="ltx_text ltx_font_italic">Hamburger Zentrum für
Sprachkorpora</span> by CLARIN-D. Thanks to all contributors of the related projects:
omorfi, FinnWordNet, Apertium, and everyone helping with
apertium-fin, apertium-eng and apertium-fin-eng.</p>
</div>
</section>
<section id="bib" class="ltx_bibliography">
<h2 class="ltx_title ltx_title_bibliography">References</h2>
<ul id="bib.L1" class="ltx_biblist">
<li id="bib.bib9" class="ltx_bibitem ltx_bib_book">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[1]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">K. R. Beesley and L. Karttunen</span><span class="ltx_text ltx_bib_year"> (2003)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Finite state morphology</span>.
</span>
<span class="ltx_bibblock"> <span class="ltx_text ltx_bib_publisher">CSLI publications</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text isbn ltx_bib_external">ISBN 978-1575864341</span></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.p5" title="2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§2</span></a>.
</span>
</li>
<li id="bib.bib17" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[2]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">O. Bojar, C. Federmann, M. Fishel, Y. Graham, B. r. Haddow, M. Huck, P. +Koehn, C. Monz, M. Müller, and M. Post</span><span class="ltx_text ltx_bib_year"> (2019-08)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Findings of the 2019 conference on machine translation (wmt19)</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Fourth Conference on Machine Translation, Volume 2: Shared Task Papers</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_place">Florence, Italy</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p10" title="4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4</span></a>.
</span>
</li>
<li id="bib.bib40" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[3]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">M. L. Forcada, M. G. Rosell, J. Nordfalk, J. O’Regan, S. Ortiz-Rojas, J. A. Pérez-Ortiz, G. Ramírez-Sánchez, F. Sánchez-Martínez, and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2011)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Apertium: a free/open-source platform for rule-based machine translation platform</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Machine Translation</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.p2" title="2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§2</span></a>.
</span>
</li>
<li id="bib.bib57" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[4]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">A. Hurskainen and J. Tiedemann</span><span class="ltx_text ltx_bib_year"> (2017)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Rule-based machine translation from english to finnish</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Second Conference on Machine Translation</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 323–329</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p10" title="4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4</span></a>.
</span>
</li>
<li id="bib.bib61" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[5]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. Karlsson</span><span class="ltx_text ltx_bib_year"> (1990)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Constraint grammar as a framework for parsing unrestricted text</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 13th International Conference of Computational Linguistics</span>, <span class="ltx_text ltx_bib_editor">H. Karlgren (Ed.)</span>,
</span>
<span class="ltx_bibblock">Vol. <span class="ltx_text ltx_bib_volume">3</span>, <span class="ltx_text ltx_bib_place">Helsinki</span>, <span class="ltx_text ltx_bib_pages"> pp. 168–173</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.p5" title="2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§2</span></a>.
</span>
</li>
<li id="bib.bib66" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[6]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">P. Koehn</span><span class="ltx_text ltx_bib_year"> (2005)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Europarl: a parallel corpus for statistical machine translation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">MT summit</span>,
</span>
<span class="ltx_bibblock">Vol. <span class="ltx_text ltx_bib_volume">5</span>, <span class="ltx_text ltx_bib_pages"> pp. 79–86</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS1.p1" title="3.1 Lexical selection training ‣ 3 RBMT development workflow ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.1</span></a>.
</span>
</li>
<li id="bib.bib72" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[7]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">P. Kolachina and A. Ranta</span><span class="ltx_text ltx_bib_year"> (2015)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">GF wide-coverage english-finnish mt system for wmt 2015</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Tenth Workshop on Statistical Machine Translation</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 141–144</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p10" title="4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4</span></a>.
</span>
</li>
<li id="bib.bib82" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[8]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">K. Lindén, E. Axelson, S. Hardwick, T. A. Pirinen, and M. Silfverberg</span><span class="ltx_text ltx_bib_year"> (2011)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Hfst—framework for compiling and applying morphologies</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Systems and Frameworks for Computational Morphology</span>, <span class="ltx_text ltx_bib_pages"> pp. 67–85</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.p2" title="2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§2</span></a>,
<a href="#S2.p5" title="2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§2</span></a>.
</span>
</li>
<li id="bib.bib100" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[9]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">J. Moorkens, A. Toral, S. Castilho, and A. Way</span><span class="ltx_text ltx_bib_year"> (2018)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Translators’ perceptions of literary post-editing using statistical and neural machine translation</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Translation Spaces</span> <span class="ltx_text ltx_bib_volume">7</span> (<span class="ltx_text ltx_bib_number">2</span>), <span class="ltx_text ltx_bib_pages"> pp. 240–262</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p9" title="4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4</span></a>.
</span>
</li>
<li id="bib.bib109" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[10]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">K. Papineni, S. Roukos, T. Ward, and W. Zhu</span><span class="ltx_text ltx_bib_year"> (2002)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">BLEU: a method for automatic evaluation of machine translation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 40th annual meeting on
association for computational linguistics</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 311–318</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p1" title="4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4</span></a>.
</span>
</li>
<li id="bib.bib128" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[11]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">T. A. Pirinen</span><span class="ltx_text ltx_bib_year"> (2015)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Development and use of computational morphology of finnish in the open source and open science era: notes on experiences with omorfi development.</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">SKY Journal of Linguistics</span> <span class="ltx_text ltx_bib_volume">28</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S1.p1" title="1 Introduction ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§1</span></a>,
<a href="#S2.p1" title="2 System description and setup ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§2</span></a>.
</span>
</li>
<li id="bib.bib150" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[12]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers, F. Sánchez-Martínez, M. L. Forcada, <span class="ltx_text ltx_bib_etal">et al.</span></span><span class="ltx_text ltx_bib_year"> (2012)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Flexible finite-state lexical selection for rule-based machine translation</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS1.p1" title="3.1 Lexical selection training ‣ 3 RBMT development workflow ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.1</span></a>.
</span>
</li>
<li id="bib.bib154" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[13]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">W. Wang, J. Peter, H. Rosendahl, and H. Ney</span><span class="ltx_text ltx_bib_year"> (2016)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Character: translation edit rate on character level</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers</span>,
</span>
<span class="ltx_bibblock">Vol. <span class="ltx_text ltx_bib_volume">2</span>, <span class="ltx_text ltx_bib_pages"> pp. 505–510</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.p1" title="4 Evaluation, error analysis and discussion ‣ Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4</span></a>.
</span>
</li>
</ul>
</section>
</article>
</div>
<footer class="ltx_page_footer">
<div class="ltx_page_logo">Generated on Tue Aug 6 14:37:36 2019 by <a href="http://dlmf.nist.gov/LaTeXML/">LaTeXML <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg==" alt="[LOGO]"></a>
</div></footer>
</div>
</body>
</html>