-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathPirinen-2009-sfcm.html
2007 lines (1900 loc) · 163 KB
/
Pirinen-2009-sfcm.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html><html>
<head>
<title>HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3 </title>
<!--Generated on Thu Sep 28 18:07:52 2017 by LaTeXML (version 0.8.2) http://dlmf.nist.gov/LaTeXML/.-->
<!--Document created on Last modification: September 28, 2017.-->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" href="../latexml/LaTeXML.css" type="text/css">
<link rel="stylesheet" href="../latexml/ltx-article.css" type="text/css">
</head>
<body>
<div class="ltx_page_main">
<div class="ltx_page_content">
<article class="ltx_document ltx_authors_1line">
<h1 class="ltx_title ltx_title_document">
<span class="ltx_text ltx_font_smallcaps">HFST</span> Tools for Morphology – An Efficient Open-Source
Package for Construction of Morphological Analyzers
<span class="ltx_ERROR undefined">\footnotepubrights</span>This article was published in Proceedings of SFCM 2009 in
Zürich <span class="ltx_ERROR undefined">\url</span>http://sfcm2009.org.
<span class="ltx_ERROR undefined">\springerpostprintdoi</span>10.1007/978-3-642-04131-0_3
</h1>
<div class="ltx_authors">
<span class="ltx_creator ltx_role_author">
<span class="ltx_personname"> Krister Lindén
</span></span>
<span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Miikka Silfverberg
</span></span>
<span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Tommi Pirinen
<br class="ltx_break">University of Helsinki
<br class="ltx_break">[email protected]
<br class="ltx_break">
</span></span>
</div>
<div class="ltx_date ltx_role_creation">Last modification: September 28, 2017</div>
<div class="ltx_abstract">
<h6 class="ltx_title ltx_title_abstract">Abstract</h6>
<p class="ltx_p">Morphological analysis of a wide range of languages can be implemented
efficiently using finite-state transducer technologies. Over the last
30 years, a number of attempts have been made to create tools for
computational morphologies. The two main competing approaches have
been parallel vs. cascaded rule application. The parallel rule
application was originally introduced by Koskenniemi
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx1" title="" class="ltx_ref">1983</a>]</cite> and implemented in tools like <span class="ltx_text ltx_font_smallcaps">TwolC</span> and
<span class="ltx_text ltx_font_smallcaps">LexC</span>. Currently many applications of morphologies could use
dictionaries encoding the a priori likelihoods of words and
expressions as well as the likelihood of relations to other
representations or languages. We have made the choice to create
open-source tools and language descriptions in order to let as many as
possible participate in the effort. The current article presents some
of the main tools that we have created such as <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span>,
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and <span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span>. We evaluate
their efficiency in comparison to some similar tools and libraries. In
particular, we evaluate them using several full-fledged morphological
descriptions. Our tools compare well with similar open source tools,
even if we still have some challenges ahead before we can catch up
with the commercial tools. We demonstrate that for various reasons a
parallel rule approach still seems to be more efficient than a
cascaded rule approach when developing finite-state morphologies.</p>
</div>
<section id="S1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2>
<div id="S1.p1" class="ltx_para">
<p class="ltx_p">Morphological analysis of a wide range of languages can be implemented
efficiently using finite-state technologies based on finite-state
transducers. Our goal is to implement efficient tools for creating and
manipulating finite-state transducer morphologies for different uses
and purposes. The task is daunting and we cannot do it alone.</p>
</div>
<div id="S1.p2" class="ltx_para">
<p class="ltx_p">Over the last 30 years, a number of attempts have been made to create
tools for computational morphologies and some of them have withstood
the test of time better than others. A major effort that has shaped
the landscape and incorporated many lasting ideas is the morphological
development tools created by Xerox. It started with the insight that
we can use transducers to describe or encode phonological processes
and relate various levels of linguistic abstraction using tools like
<span class="ltx_text ltx_font_smallcaps">TwolC</span> introduced by Koskenniemi and Karttunen
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx1" title="" class="ltx_ref">1983</a>, <a href="#bib.bibx2" title="" class="ltx_ref">1987</a>, <a href="#bib.bibx3" title="" class="ltx_ref">1992</a>]</cite>. To efficiently compile
large-scale lexicons into transducers, we need special lexicon
compilers like <span class="ltx_text ltx_font_smallcaps">LexC</span> described by Karttunen
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx4" title="" class="ltx_ref">1993</a>, <a href="#bib.bibx5" title="" class="ltx_ref">1994</a>]</cite>.</p>
</div>
<div id="S1.p3" class="ltx_para">
<p class="ltx_p">Such tools do not solve all the problems. Writing full-scale
dictionaries in <span class="ltx_text ltx_font_smallcaps">LexC</span> may well be compared to having
programmers write sophisticated applications in C without access to
any of the modern high-level libraries. It is possible, but unless it
is done in some principled way, one may easily end up with
spaghetti-code that is difficult to maintain. This is not the fault of
the lexicon compiler, but the general programming solution is to
create several descriptions that are small and independent,
i.e. modular. With this insight and as computers became more
powerful, the initial calculus that was conceived for abstract objects
like automata and transducers in <span class="ltx_text ltx_font_smallcaps">TwolC</span> and <span class="ltx_text ltx_font_smallcaps">LexC</span> was
expanded and migrated into the lexical programming environment
<span class="ltx_text ltx_font_smallcaps">xfst</span> documented by Beesly and Karttunen <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx8" title="" class="ltx_ref">2003</a>]</cite>,
where smaller lexical modules for various purposes can be tailored and
combined using finite-state calculus operations.</p>
</div>
<div id="S1.p4" class="ltx_para">
<p class="ltx_p">The previous effort is well-worth studying, but currently additional
ideas have established themselves such as weighted transducers for
modeling aspects of language that deal with preferences or trends
rather than strict rules or on/off phenomena. Many applications of
morphologies could use dictionaries encoding the a priori likelihoods
of words and expressions as well as the likelihood of their relations
to phonetic representations or their lexical relations to other words
in the same language or in different languages. The efforts to explore
weighted finite-state transducers for natural language processing are
ongoing in information retrieval, speech processing and machine
translation to name a few of the main application areas involved.</p>
</div>
<div id="S1.p5" class="ltx_para">
<p class="ltx_p">Since we do not pretend that we could develop all the morphologies for
all the languages ourselves, or even all the aspects of the tools
needed to develop these morphologies, we have made the choice to
create open-source tools and language descriptions. We hope as many as
possible will participate in the effort by developing the tools
further for common needs and special purposes.</p>
</div>
<div id="S1.p6" class="ltx_para">
<p class="ltx_p">In addition to the open source tools, we also encourage the commercial
use of the final transducers created by the tools by providing runtime
software<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup><span class="ltx_ERROR undefined">\url</span>https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstRuntimeInterface</span></span></span>
that is free for commercial purposes. Eventually this will allow
software applications simply to select the appropriate transducer in
order to process a language correctly allowing the programmer to
ignore special characteristics of individual languages.</p>
</div>
<div id="S1.p7" class="ltx_para">
<p class="ltx_p">Recently, a number of open-source finite-state processing environments
have emerged, e.g. for unweighted transducers there are the
<em class="ltx_emph">SFST–Stuttgart Finite-State Transducer
Tools</em><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup><span class="ltx_ERROR undefined">\url</span>http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html</span></span></span>
by Schmid <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx10" title="" class="ltx_ref">2005</a>]</cite>, <em class="ltx_emph">foma: a finite-state machine toolkit and
library</em><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">3</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">3</sup><span class="ltx_ERROR undefined">\url</span>http://foma.sourceforge.net/</span></span></span> by Huldén,
etc., and for weighted transducers there are
<em class="ltx_emph">Vaucanson</em><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">4</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">4</sup><span class="ltx_ERROR undefined">\url</span>http://www.lrde.epita.fr/cgi-bin/twiki/view/Projects/Vaucanson</span></span></span>
by Lombardy et al. <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx9" title="" class="ltx_ref">2004</a>]</cite>, <em class="ltx_emph">OpenFST
Library</em><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">5</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">5</sup><span class="ltx_ERROR undefined">\url</span>http://www.openfst.org/</span></span></span> by Allauzen et
al. <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx12" title="" class="ltx_ref">2007</a>]</cite>, etc. These are valuable contributions to the open
source software that we can build on.</p>
</div>
<div id="S1.p8" class="ltx_para">
<p class="ltx_p">Our particular goal currently is in providing the basic facilities for
efficiently developing, compiling and running morphologies with or
without weights. To achieve our goal, we decided to create a unified
API<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">6</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">6</sup><span class="ltx_ERROR undefined">\url</span>http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/documentation.shtml</span></span></span>,
which is capable of interfacing various weighted and unweighted
finite-state transducer libraries allowing us to incorporate new
libraries as needed. Currently, we have interfaces to <span class="ltx_text ltx_font_smallcaps">SFST</span>
and <span class="ltx_text ltx_font_smallcaps">OpenFST</span>. On top of the unified API, we created a set of
basic
tools<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">7</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">7</sup><span class="ltx_ERROR undefined">\url</span>https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstHome</span></span></span>,
e.g. <span class="ltx_text ltx_font_smallcaps">HFST-TwolC, HFST-LexC, HFST-Compose-Intersect,
HFST-Test, HFST-Lookup</span>, etc. With these tools, we created or used
real full-fledged morphological descriptions of different languages
from different
language-families<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">8</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">8</sup><span class="ltx_ERROR undefined">\url</span>http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/index.shtml</span></span></span>,
e.g. <em class="ltx_emph">English, Finnish, French, Northern Sámi</em> and
<em class="ltx_emph">Swedish</em>. We used the morphological descriptions for testing the
functionality of the tools and for evaluating the performance of the
different libraries through a unified interface on the full-fledged
morphological development and compilation tasks.</p>
</div>
<div id="S1.p9" class="ltx_para">
<p class="ltx_p">The current article presents some of the main tools that we have
created: <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> in Sect. 2, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> in Sect. 3
and <span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span> in Sect. 4. For each tool, we
present the main theoretical underpinnings of the implementation and
illustrate them with some examples. We highlight the main design
decisions that influenced the efficiency of the implementation and
how, if at all, our implementations differ from their namesakes. In
Sect. 5, we briefly present the morphological descriptions that we use
for demonstrating and comparing the efficiency of the
implementation. In Sect. 6, we evaluate the performance of our tools
for parallel-rule application and compare them with the performance of
the <span class="ltx_text ltx_font_italic">foma</span> <span class="ltx_text ltx_font_smallcaps">LexC</span> compiler and the <span class="ltx_text ltx_font_italic">Xerox</span> tools,
as well as the cascaded rule compiler of <span class="ltx_text ltx_font_smallcaps">SFST</span>. In Sect. 7, we
discuss the test results and present some aspects of future research
and development. In Sect. 8, we draw the conclusions.</p>
</div>
</section>
<section id="S2" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">2 </span><span class="ltx_text ltx_font_smallcaps">HFST-LexC</span>
</h2>
<div id="S2.p1" class="ltx_para">
<p class="ltx_p">A lexicon compiler is a program that reads sets of morphemes and their
morphotactic combinations in order to create a finite-state transducer
of a lexicon. This finite state transducer is called a <em class="ltx_emph">lexical
transducer</em><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx5" title="" class="ltx_ref">1994</a>]</cite>. The lexical transducer may be further
adjusted with e.g. phonological rules. The example for our lexicon
compiler is set by <span class="ltx_text ltx_font_smallcaps">LexC</span> of Xerox <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx8" title="" class="ltx_ref">2003</a>]</cite>. In
<span class="ltx_text ltx_font_smallcaps">LexC</span>, morphemes are arranged into named sets called
sub-lexicons. Each entry of a sub-lexicon is a pair of finite
strings<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">9</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">9</sup>Entries of regular expression form are not covered
here to simplify the presentation, but a full definition of an entry
in this formalism allows an entry to be a regular language.</span></span></span>
associated with the name of a sub-lexicon called a <em class="ltx_emph">continuation
class</em>.</p>
</div>
<div id="S2.p2" class="ltx_para">
<p class="ltx_p">Below, we highlight the main design decisions that influenced the
efficiency of the implementation and some of the we present the main
theory for compiling a <span class="ltx_text ltx_font_smallcaps">LexC</span> description into a finite-state
transducer. Our morphology example outlines the nominal inflection of
four Finnish nouns as shown in Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>. This
example is a highly simplified version of the actual morphology.</p>
</div>
<figure id="S2.T1" class="ltx_table"><pre class="ltx_verbatim ltx_font_typewriter">
Multichar_Symbols
+noun +1 +a +d +h +m +AV+ +AV- +AVA +AVD +AVH +AVM
+all +gen +ptv +sg ~A ~K ~P
LEXICON Root
akku+noun+1+a:ak~Ku+AVA N1b "battery";
alku+noun+1+d:al~Ku+AVD N1b "beginning";
kumpu+noun+1+h:kum~Pu+AVH N1b "heap";
kyky+noun+1+m:ky~Ky+AVM N1b "capability";
LEXICON N1b
NounSg ;
NounPtvA ;
LEXICON NounPtvA
+sg+ptv:~A+AV+ Ennd ;
LEXICON NounSg
+sg+gen:n+AV- Ennd ;
+sg+all:l+AV-le Ennd ;
:n+AV- Compounding ;
LEXICON Compounding
Root ;
LEXICON Ennd
# ;
</pre>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 1: </span>A simplified <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> lexicon for some Finnish
nouns.</figcaption>
</figure>
<div id="S2.p3" class="ltx_para">
<p class="ltx_p">There are at least three time consuming parts of the <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> compilation
process. First the compiler needs to parse the strings representing
the entry morphemes. Traditionally <span class="ltx_text ltx_font_smallcaps">LexC</span> allows multiple
characters in a single symbol. The problem of finding the optimal
partition of a string when compiling it into a finite-state transducer
is optimizing the <em class="ltx_emph">tokenization</em> algorithm. The tokenization is
discussed in Sect. <a href="#S2.SS1" title="2.1 Efficient Tokenizing of a Sub-Lexicon Entry ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.1</span></a>. The set of entries
in each sub-lexicon form a <em class="ltx_emph">union</em>. There are a few alternative
strategies for creating unions, which are briefly outlined in
Sect. <a href="#S2.SS2" title="2.2 Efficient Union of Sub-Lexicon Entries ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>. The combining of sub-lexicons is
described in Sect. <a href="#S2.SS3" title="2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.3</span></a> on morphotax.</p>
</div>
<section id="S2.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">2.1 </span>Efficient Tokenizing of a Sub-Lexicon Entry</h3>
<div id="S2.SS1.p1" class="ltx_para">
<p class="ltx_p">Lexicon entries are tokenized using a simple left-to-right longest
match algorithm. The entry can be tokenized by going through the entry
string, position by position, and looking up the longest symbols
available using a very simple greedy algorithm. An alternative, but
less efficient, strategy is to compose the entry string with a
tokenizer-transducer, implementing greedy left-to-right matching, to
achieve the correct partition.</p>
</div>
</section>
<section id="S2.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">2.2 </span>Efficient Union of Sub-Lexicon Entries</h3>
<div id="S2.SS2.p1" class="ltx_para">
<p class="ltx_p">The finite-state form of a sub-lexicon is a union of entry
transducers. Building a union of entry transducers is a relatively
straight-forward process. However, iteratively taking the union of n
entries with the n+1<math id="S2.SS2.p1.m1" class="ltx_Math" alttext="{}^{th}" display="inline"><msup><mi></mi><mrow><mi>t</mi><mo></mo><mi>h</mi></mrow></msup></math> entry is not ideal. A faster approach,
given that all our entries are simple finite strings is to build the
sub-lexicon transducer as a prefix tree, <em class="ltx_emph">trie</em>.</p>
</div>
</section>
<section id="S2.SS3" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">2.3 </span>Efficient Implementation of Morphotax over Sublexicons</h3>
<div id="S2.SS3.p1" class="ltx_para">
<p class="ltx_p">There are two strategies for making the sub-lexicon combinations. A
trivial, and fast, strategy is to connect the entries using
continuation classes to sub-lexicons of the same name with an epsilon
arc. An alternative method, which we implemented, is to use named
auxiliary symbols and the finite-state algebra to create
overgenerating combinations which are filtered by composition to
achieve legal combinations. This is further described in the next
subsection.</p>
</div>
<section id="S2.SS3.SSS1" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">2.3.1 </span>Combining sublexicons using standard finite-state transducer algebra.</h4>
<div id="S2.SS3.SSS1.p1" class="ltx_para">
<p class="ltx_p">We assume all standard finite state operations to be known. For an
introduction, see Beesly and Karttunen <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx8" title="" class="ltx_ref">2003</a>]</cite>. We use the
following notation: <math id="S2.SS3.SSS1.p1.m1" class="ltx_Math" alttext="\cup" display="inline"><mo>∪</mo></math> is union, <math id="S2.SS3.SSS1.p1.m2" class="ltx_Math" alttext="\cap" display="inline"><mo>∩</mo></math> is intersection, <math id="S2.SS3.SSS1.p1.m3" class="ltx_Math" alttext="\circ" display="inline"><mo>∘</mo></math>
is composition, juxtaposition is concatenation. Latin characters
represent symbols of language and the <math id="S2.SS3.SSS1.p1.m4" class="ltx_Math" alttext="\varepsilon" display="inline"><mi>ε</mi></math> symbol is used for
a zero-length string. Capital Greek letters <math id="S2.SS3.SSS1.p1.m5" class="ltx_Math" alttext="\Sigma,\Gamma" display="inline"><mrow><mi mathvariant="normal">Σ</mi><mo>,</mo><mi mathvariant="normal">Γ</mi></mrow></math>
represent subsets of an alphabet. We define <math id="S2.SS3.SSS1.p1.m6" class="ltx_Math" alttext="\Sigma=\{a,b,\ldots\}" display="inline"><mrow><mi mathvariant="normal">Σ</mi><mo>=</mo><mrow><mo stretchy="false">{</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo>,</mo><mi mathvariant="normal">…</mi><mo stretchy="false">}</mo></mrow></mrow></math> as a subset of the alphabet used for representing the
morphophonology of the language in <span class="ltx_text ltx_font_smallcaps">LexC</span> definitions. <math id="S2.SS3.SSS1.p1.m7" class="ltx_Math" alttext="\Gamma" display="inline"><mi mathvariant="normal">Γ</mi></math>
is the alphabet of the <em class="ltx_emph">auxiliary</em> symbols used in our rules in
the morphotax implementation. We assume that <math id="S2.SS3.SSS1.p1.m8" class="ltx_Math" alttext="\Sigma\cup\Gamma=\emptyset" display="inline"><mrow><mrow><mi mathvariant="normal">Σ</mi><mo>∪</mo><mi mathvariant="normal">Γ</mi></mrow><mo>=</mo><mi mathvariant="normal">∅</mi></mrow></math>. We use the symbol <math id="S2.SS3.SSS1.p1.m9" class="ltx_Math" alttext="J\in\Sigma" display="inline"><mrow><mi>J</mi><mo>∈</mo><mi mathvariant="normal">Σ</mi></mrow></math> for <em class="ltx_emph">joiners</em> to
delimit and combine morphemes in our morphotax. A joiner for an entry
with a continuation class named <math id="S2.SS3.SSS1.p1.m10" class="ltx_Math" alttext="x" display="inline"><mi>x</mi></math> is denoted as <math id="S2.SS3.SSS1.p1.m11" class="ltx_Math" alttext="J_{x}" display="inline"><msub><mi>J</mi><mi>x</mi></msub></math> and a joiner
for a sub-lexicon named <math id="S2.SS3.SSS1.p1.m12" class="ltx_Math" alttext="y" display="inline"><mi>y</mi></math> is denoted as <math id="S2.SS3.SSS1.p1.m13" class="ltx_Math" alttext="J_{y}" display="inline"><msub><mi>J</mi><mi>y</mi></msub></math>.</p>
</div>
<div id="S2.SS3.SSS1.p2" class="ltx_para">
<p class="ltx_p">We introduce the compilation of lexicons using the example-lexicon in
Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>.</p>
</div>
<div id="S2.SS3.SSS1.p3" class="ltx_para">
<p class="ltx_p">A single entry in a sub-lexicon, i.e., a line of code in a
<span class="ltx_text ltx_font_smallcaps">LexC</span> file, is referred to as a morpheme denoted by
<math id="S2.SS3.SSS1.p3.m1" class="ltx_Math" alttext="\mathcal{M}" display="inline"><mi class="ltx_font_mathcaligraphic">ℳ</mi></math>. A morpheme can be a subset of the language
<math id="S2.SS3.SSS1.p3.m2" class="ltx_Math" alttext="\Sigma^{\star}" display="inline"><msup><mi mathvariant="normal">Σ</mi><mo>⋆</mo></msup></math> appended with the joiner of a continuation class
(<a href="#S2.E1" title="(1) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>).</p>
</div>
<div id="S2.SS3.SSS1.p4" class="ltx_para">
<table id="S2.E1" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E1.m1" class="ltx_Math" alttext="\mathcal{M}=\Sigma^{\star}~{}J" display="block"><mrow><mi class="ltx_font_mathcaligraphic">ℳ</mi><mo>=</mo><mrow><mpadded width="+3.3pt"><msup><mi mathvariant="normal">Σ</mi><mo>⋆</mo></msup></mpadded><mo></mo><mi>J</mi></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p5" class="ltx_para">
<p class="ltx_p">E.g. the <span class="ltx_text ltx_font_smallcaps">LexC</span> string entry <em class="ltx_emph">akku:ak<math id="S2.SS3.SSS1.p5.m1" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>Ku+AVA</em>
with a continuation class <em class="ltx_emph">N1b</em> becomes
<em class="ltx_emph">a k k:<math id="S2.SS3.SSS1.p5.m2" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>K u:+AVA <math id="S2.SS3.SSS1.p5.m3" class="ltx_Math" alttext="\varepsilon" display="inline"><mi>ε</mi></math>:<math id="S2.SS3.SSS1.p5.m4" class="ltx_Math" alttext="J_{N1b}" display="inline"><msub><mi>J</mi><mrow><mi>N</mi><mo></mo><mn>1</mn><mo></mo><mi>b</mi></mrow></msub></math></em>.</p>
</div>
<div id="S2.SS3.SSS1.p6" class="ltx_para">
<p class="ltx_p">A sub-lexicon <math id="S2.SS3.SSS1.p6.m1" class="ltx_Math" alttext="\mathcal{L}" display="inline"><mi class="ltx_font_mathcaligraphic">ℒ</mi></math> defined by (<a href="#S2.E2" title="(2) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>) is a union of
morphemes as specified in Sect. <a href="#S2.SS2" title="2.2 Efficient Union of Sub-Lexicon Entries ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>.</p>
</div>
<div id="S2.SS3.SSS1.p7" class="ltx_para">
<table id="S2.E2" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E2.m1" class="ltx_Math" alttext="\mathcal{L}=J~{}\bigcup_{\mathcal{M}_{x}\in\mathcal{M}}~{}(~{}\mathcal{M}_{x}~%
{})" display="block"><mrow><mi class="ltx_font_mathcaligraphic">ℒ</mi><mo>=</mo><mrow><mpadded width="+3.3pt"><mi>J</mi></mpadded><mo></mo><mrow><mpadded width="+3.3pt"><munder><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><msub><mi class="ltx_font_mathcaligraphic">ℳ</mi><mi>x</mi></msub><mo>∈</mo><mi class="ltx_font_mathcaligraphic">ℳ</mi></mrow></munder></mpadded><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mpadded width="+3.3pt"><msub><mi class="ltx_font_mathcaligraphic">ℳ</mi><mi>x</mi></msub></mpadded><mo stretchy="false">)</mo></mrow></mrow></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p8" class="ltx_para">
<p class="ltx_p">E.g. the lexicon named <em class="ltx_emph">Root</em> consisting of <em class="ltx_emph">akku</em> and
<em class="ltx_emph">alku</em> with continuation class <em class="ltx_emph">N1b</em> becomes
<math id="S2.SS3.SSS1.p8.m1" class="ltx_Math" alttext="J_{Root}~{}(~{}a~{}k~{}k~{}u~{}J_{N1b}~{}|~{}a~{}l~{}k~{}u~{}J_{N1b}~{})" display="inline"><mrow><mpadded width="+3.3pt"><msub><mi>J</mi><mrow><mi>R</mi><mo></mo><mi>o</mi><mo></mo><mi>o</mi><mo></mo><mi>t</mi></mrow></msub></mpadded><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mpadded width="+3.3pt"><mi>a</mi></mpadded><mpadded width="+3.3pt"><mi>k</mi></mpadded><mpadded width="+3.3pt"><mi>k</mi></mpadded><mpadded width="+3.3pt"><mi>u</mi></mpadded><mpadded width="+3.3pt"><msub><mi>J</mi><mrow><mi>N</mi><mo></mo><mn>1</mn><mo></mo><mi>b</mi></mrow></msub></mpadded><mo rspace="5.8pt" stretchy="false">|</mo><mpadded width="+3.3pt"><mi>a</mi></mpadded><mpadded width="+3.3pt"><mi>l</mi></mpadded><mpadded width="+3.3pt"><mi>k</mi></mpadded><mpadded width="+3.3pt"><mi>u</mi></mpadded><mpadded width="+3.3pt"><msub><mi>J</mi><mrow><mi>N</mi><mo></mo><mn>1</mn><mo></mo><mi>b</mi></mrow></msub></mpadded><mo stretchy="false">)</mo></mrow></mrow></math>.</p>
</div>
<div id="S2.SS3.SSS1.p9" class="ltx_para">
<p class="ltx_p">We create a filter <math id="S2.SS3.SSS1.p9.m1" class="ltx_Math" alttext="\mathcal{F}" display="inline"><mi class="ltx_font_mathcaligraphic">ℱ</mi></math> defined by (<a href="#S2.E3" title="(3) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>) for legal
morpheme combinations by pairing up adjacent joiners.</p>
</div>
<div id="S2.SS3.SSS1.p10" class="ltx_para">
<table id="S2.E3" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E3.m1" class="ltx_Math" alttext="\mathcal{F}=\bigcup_{J_{x}\in J}~{}(~{}J_{x}~{}J_{x}~{})" display="block"><mrow><mi class="ltx_font_mathcaligraphic">ℱ</mi><mo>=</mo><mrow><mpadded width="+3.3pt"><munder><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><msub><mi>J</mi><mi>x</mi></msub><mo>∈</mo><mi>J</mi></mrow></munder></mpadded><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mrow><mpadded width="+3.3pt"><msub><mi>J</mi><mi>x</mi></msub></mpadded><mo></mo><mpadded width="+3.3pt"><msub><mi>J</mi><mi>x</mi></msub></mpadded></mrow><mo stretchy="false">)</mo></mrow></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(3)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p11" class="ltx_para">
<p class="ltx_p">To account for the special starting lexicon and the special ending
lexicon, we define <math id="S2.SS3.SSS1.p11.m1" class="ltx_Math" alttext="J_{Root}\in J" display="inline"><mrow><msub><mi>J</mi><mrow><mi>R</mi><mo></mo><mi>o</mi><mo></mo><mi>o</mi><mo></mo><mi>t</mi></mrow></msub><mo>∈</mo><mi>J</mi></mrow></math> and <math id="S2.SS3.SSS1.p11.m2" class="ltx_Math" alttext="J_{\#}\notin J" display="inline"><mrow><msub><mi>J</mi><mi mathvariant="normal">#</mi></msub><mo>∉</mo><mi>J</mi></mrow></math>. The root
lexicon can be used in continuation classes as a target, e.g. for the
compounding mechanism, but the end lexicon is not available as a
lexicon name, so it is not part of the regular morphotax. To
accommodate this, we extend the filter definition to
<math id="S2.SS3.SSS1.p11.m3" class="ltx_Math" alttext="\mathcal{F}^{\prime}" display="inline"><msup><mi class="ltx_font_mathcaligraphic">ℱ</mi><mo>′</mo></msup></math> as in (<a href="#S2.E4" title="(4) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>).</p>
</div>
<div id="S2.SS3.SSS1.p12" class="ltx_para">
<table id="S2.E4" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E4.m1" class="ltx_Math" alttext="\mathcal{F}^{\prime}=J_{Root}~{}(~{}\Sigma^{\star}~{}\mathcal{F}~{})^{\star}~{%
}\Sigma^{\star}~{}J_{\#}" display="block"><mrow><msup><mi class="ltx_font_mathcaligraphic">ℱ</mi><mo>′</mo></msup><mo>=</mo><mrow><mpadded width="+3.3pt"><msub><mi>J</mi><mrow><mi>R</mi><mo></mo><mi>o</mi><mo></mo><mi>o</mi><mo></mo><mi>t</mi></mrow></msub></mpadded><mo></mo><mpadded width="+3.3pt"><msup><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mrow><mpadded width="+3.3pt"><msup><mi mathvariant="normal">Σ</mi><mo>⋆</mo></msup></mpadded><mo></mo><mpadded width="+3.3pt"><mi class="ltx_font_mathcaligraphic">ℱ</mi></mpadded></mrow><mo stretchy="false">)</mo></mrow><mo>⋆</mo></msup></mpadded><mo></mo><mpadded width="+3.3pt"><msup><mi mathvariant="normal">Σ</mi><mo>⋆</mo></msup></mpadded><mo></mo><msub><mi>J</mi><mi mathvariant="normal">#</mi></msub></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(4)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p13" class="ltx_para">
<p class="ltx_p">This allows us to create the final transducer <math id="S2.SS3.SSS1.p13.m1" class="ltx_Math" alttext="\mathcal{R}" display="inline"><mi class="ltx_font_mathcaligraphic">ℛ</mi></math> with only
legal combinations of sub-lexicons by composition (<a href="#S2.E5" title="(5) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">5</span></a>).</p>
</div>
<div id="S2.SS3.SSS1.p14" class="ltx_para">
<table id="S2.E5" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E5.m1" class="ltx_Math" alttext="\mathcal{R}=\bigcup_{\mathcal{L}_{x}\in\mathcal{L}}~{}(~{}\mathcal{L}_{x}~{})^%
{\star}~{}\circ~{}\mathcal{F}^{\prime}" display="block"><mrow><mi class="ltx_font_mathcaligraphic">ℛ</mi><mo>=</mo><mrow><mpadded width="+3.3pt"><munder><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><msub><mi class="ltx_font_mathcaligraphic">ℒ</mi><mi>x</mi></msub><mo>∈</mo><mi class="ltx_font_mathcaligraphic">ℒ</mi></mrow></munder></mpadded><mrow><mpadded width="+3.3pt"><msup><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mpadded width="+3.3pt"><msub><mi class="ltx_font_mathcaligraphic">ℒ</mi><mi>x</mi></msub></mpadded><mo stretchy="false">)</mo></mrow><mo>⋆</mo></msup></mpadded><mo rspace="5.8pt">∘</mo><msup><mi class="ltx_font_mathcaligraphic">ℱ</mi><mo>′</mo></msup></mrow></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(5)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p15" class="ltx_para">
<p class="ltx_p">E.g., for the sublexicons <em class="ltx_emph">Root</em>, <em class="ltx_emph">N1b</em>, <em class="ltx_emph">NounSg</em>, and
<em class="ltx_emph">Ennd</em> in Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>, and their entries
<em class="ltx_emph">akku</em> and <em class="ltx_emph">+sg+all:lle</em>, we get the disjunction of lexicons
<math id="S2.SS3.SSS1.p15.m1" class="ltx_Math" alttext="L^{\star}" display="inline"><msup><mi>L</mi><mo>⋆</mo></msup></math>, which we filter using <math id="S2.SS3.SSS1.p15.m2" class="ltx_Math" alttext="L^{\star}\circ F^{\prime}" display="inline"><mrow><msup><mi>L</mi><mo>⋆</mo></msup><mo>∘</mo><msup><mi>F</mi><mo>′</mo></msup></mrow></math> as
shown in Table <a href="#S2.T2" title="Table 2 ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>.</p>
</div>
<figure id="S2.T2" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_align_middle">
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left"><math id="S2.T2.m1" class="ltx_Math" alttext="L^{\star}=" display="inline"><mrow><msup><mi mathsize="90%">L</mi><mo mathsize="90%" stretchy="false">⋆</mo></msup><mo mathsize="90%" stretchy="false">=</mo><mi></mi></mrow></math></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m2" class="ltx_Math" alttext="(~{}~{}J_{Root}~{}a~{}k~{}k~{}u~{}J_{N1b}~{}~{}|~{}~{}J_{N1b}~{}J_{NounSg}~{}~%
{}|" display="inline"><mrow><mo maxsize="90%" minsize="90%" rspace="9.1pt">(</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">R</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">t</mi></mrow></msub></mpadded><mpadded width="+3.3pt"><mi mathsize="90%">a</mi></mpadded><mpadded width="+3.3pt"><mi mathsize="90%">k</mi></mpadded><mpadded width="+3.3pt"><mi mathsize="90%">k</mi></mpadded><mpadded width="+3.3pt"><mi mathsize="90%">u</mi></mpadded><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mn mathsize="90%">1</mn><mo></mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="9.1pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mn mathsize="90%">1</mn><mo></mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">u</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">S</mi><mo></mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%">|</mo></mrow></math></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td"></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m3" class="ltx_Math" alttext="J_{NounSg}~{}\!+\!sg\!:\!l~{}\!+all\!:\!l~{}\varepsilon\!:\!e~{}J_{Ennd}~{}~{}%
|~{}~{}J_{Ennd}~{}J_{\#}~{}~{})^{\star}" display="inline"><mrow><mpadded width="+1.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">u</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">S</mi><mo></mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">+</mo><mi mathsize="90%">s</mi><mpadded width="-1.7pt"><mi mathsize="90%">g</mi></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mpadded width="+1.6pt"><mi mathsize="90%">l</mi></mpadded><mo mathsize="90%" stretchy="false">+</mo><mi mathsize="90%">a</mi><mi mathsize="90%">l</mi><mpadded width="-1.7pt"><mi mathsize="90%">l</mi></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mpadded width="+3.3pt"><mi mathsize="90%">l</mi></mpadded><mpadded width="-1.7pt"><mi mathsize="90%">ε</mi></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mpadded width="+3.3pt"><mi mathsize="90%">e</mi></mpadded><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="9.1pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mi mathsize="90%" mathvariant="normal">#</mi></msub></mpadded><mo maxsize="90%" minsize="90%">)</mo><msup><mi></mi><mo mathsize="90%" stretchy="false">⋆</mo></msup></mrow></math></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left"><math id="S2.T2.m4" class="ltx_Math" alttext="F=" display="inline"><mrow><mi mathsize="90%">F</mi><mo mathsize="90%" stretchy="false">=</mo><mi></mi></mrow></math></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m5" class="ltx_Math" alttext="J_{Root}~{}J_{Root}~{}|~{}J_{N1b}~{}J_{N1b}~{}|~{}J_{NounSg}~{}J_{NounSg}~{}|~%
{}J_{Ennd}~{}J_{Ennd}" display="inline"><mrow><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">R</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">t</mi></mrow></msub></mpadded><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">R</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">t</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="5.8pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mn mathsize="90%">1</mn><mo></mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mn mathsize="90%">1</mn><mo></mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="5.8pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">u</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">S</mi><mo></mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">u</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">S</mi><mo></mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="5.8pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">d</mi></mrow></msub></mrow></math></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left"><math id="S2.T2.m6" class="ltx_Math" alttext="L^{\star}\circ F^{\prime}=" display="inline"><mrow><mrow><msup><mi mathsize="90%">L</mi><mo mathsize="90%" stretchy="false">⋆</mo></msup><mo mathsize="90%" stretchy="false">∘</mo><msup><mi mathsize="90%">F</mi><mo mathsize="90%" stretchy="false">′</mo></msup></mrow><mo mathsize="90%" stretchy="false">=</mo><mi></mi></mrow></math></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m7" class="ltx_Math" alttext="J_{Root}~{}a~{}k~{}k~{}u~{}J_{N1b}~{}~{}J_{N1b}~{}J_{NounSg}" display="inline"><mrow><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">R</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">t</mi></mrow></msub></mpadded><mo></mo><mpadded width="+3.3pt"><mi mathsize="90%">a</mi></mpadded><mo></mo><mpadded width="+3.3pt"><mi mathsize="90%">k</mi></mpadded><mo></mo><mpadded width="+3.3pt"><mi mathsize="90%">k</mi></mpadded><mo></mo><mpadded width="+3.3pt"><mi mathsize="90%">u</mi></mpadded><mo></mo><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mn mathsize="90%">1</mn><mo></mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mo></mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mn mathsize="90%">1</mn><mo></mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mo></mo><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">u</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">S</mi><mo></mo><mi mathsize="90%">g</mi></mrow></msub></mrow></math></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td"></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m8" class="ltx_Math" alttext="J_{NounSg}~{}\!+\!sg\!:\!l~{}\!+\!all\!:\!l~{}\varepsilon\!:\!e~{}J_{Ennd}~{}~%
{}J_{Ennd}~{}J_{\#}" display="inline"><mrow><mrow><mpadded width="+1.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo></mo><mi mathsize="90%">o</mi><mo></mo><mi mathsize="90%">u</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">S</mi><mo></mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">+</mo><mrow><mi mathsize="90%">s</mi><mo></mo><mpadded width="-1.7pt"><mi mathsize="90%">g</mi></mpadded></mrow></mrow><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mrow><mpadded width="+1.6pt"><mi mathsize="90%">l</mi></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">+</mo><mrow><mi mathsize="90%">a</mi><mo></mo><mi mathsize="90%">l</mi><mo></mo><mpadded width="-1.7pt"><mi mathsize="90%">l</mi></mpadded></mrow></mrow><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mrow><mpadded width="+3.3pt"><mi mathsize="90%">l</mi></mpadded><mo></mo><mpadded width="-1.7pt"><mi mathsize="90%">ε</mi></mpadded></mrow><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mrow><mpadded width="+3.3pt"><mi mathsize="90%">e</mi></mpadded><mo></mo><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><mo></mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">n</mi><mo></mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><mo></mo><msub><mi mathsize="90%">J</mi><mi mathsize="90%" mathvariant="normal">#</mi></msub></mrow></mrow></math></td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 2: </span>Filtering a single path in <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> with a
morphotax filter.</figcaption>
</figure>
<div id="S2.SS3.SSS1.p16" class="ltx_para">
<p class="ltx_p">Finally, all the symbols in <math id="S2.SS3.SSS1.p16.m1" class="ltx_Math" alttext="\Gamma" display="inline"><mi mathvariant="normal">Γ</mi></math> are removed. While this is
trivial, it introduces some indeterminism in the final transducer,
which would otherwise have been introduced by building direct epsilon
arcs. Its influence on the performance is further detailed in
Sect. <a href="#S6" title="6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a>.</p>
</div>
<div id="S2.SS3.SSS1.p17" class="ltx_para">
<p class="ltx_p">According to our experiments, attaching weights to each entry works
without modification of the lexicon compilation method.</p>
</div>
</section>
</section>
</section>
<section id="S3" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">3 </span><span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
</h2>
<div id="S3.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_italic">Two-level rules</span> are parallel constraints on symbol-pair
strings governing the realizations of lexical word-forms as
corresponding surface-strings. They were introduced by Koskenniemi
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx1" title="" class="ltx_ref">1983</a>]</cite> and have been used for modeling the phonology of
numerous natural languages. <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> is an accurate and
efficient open-source two-level rule compiler. It compiles grammars of
two-level rules into sets of finite-state transducers. The rules are
represented as regular-expression operations closely resembling
familiar phonological re-write rules both to appearance and semantics.</p>
</div>
<div id="S3.p2" class="ltx_para">
<p class="ltx_p">The most widely known two-level rule-compiler existing at the moment
is the <em class="ltx_emph">Xerox Two-Level Rule Compiler</em> (later <span class="ltx_text ltx_font_smallcaps">TwolC</span>)
presented by Karttunen el al. <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx3" title="" class="ltx_ref">1992</a>]</cite>. It is proprietary
software, which imposes some limitations upon its use. The
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> compiler has been designed to be an open-source
substitute for the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler and has a syntax and
semantics very similar to those of the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler. Hence
existing two-level grammars, designed to compile under the
<span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler, require very few modifications to compile
correctly under <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>.</p>
</div>
<div id="S3.p3" class="ltx_para">
<p class="ltx_p">Besides being an open-source program, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> also has
other benefits compared with the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler. Resolution
of rule-conflicts is an important part of compiling two-level
grammars. We know of at least one instance, where the <span class="ltx_text ltx_font_smallcaps">TwolC</span>
compiler resolves rule-conflicts in an incorrect way (see
Sect. <a href="#S3.SS2.SSS2" title="3.2.2 Resolving rule-conflicts. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2.2</span></a>). It also compiles epenthesis
rules in a way, which denies the grammar-writer the full expressive
power of two-level rules (see Sect. <a href="#S3.SS2.SSS1" title="3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2.1</span></a>). In
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> we have been able to remedy these shortcomings by
compiling the rules with the <span class="ltx_text ltx_font_italic">generalized
restriction-operation</span> (later <span class="ltx_text ltx_font_italic">GR-operation</span>), presented by
Yli-Jyrä and Koskenniemi <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx11" title="" class="ltx_ref">2006</a>]</cite>. It allows compilation of
two-level rules in a uniform way and makes conflict-resolution easy to
tackle, while still permitting efficient compilation.</p>
</div>
<div id="S3.p4" class="ltx_para">
<p class="ltx_p">In Sect. <a href="#S3.SS1" title="3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.1</span></a>, we demonstrate the syntax and semantics
of a two-level grammar-file using a small example from Finnish
morphology. The example grammar maps the lexical forms given by the
example lexicon in Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>, presented in
Sect. <a href="#S2" title="2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>, into surface-forms.</p>
</div>
<div id="S3.p5" class="ltx_para">
<p class="ltx_p">It is not possible to demonstrate all features of <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
in this article, but we try to highlight the few most important
differences to show that it is easy to migrate from the <span class="ltx_text ltx_font_smallcaps">TwolC</span>
compiler to <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>.</p>
</div>
<div id="S3.p6" class="ltx_para">
<p class="ltx_p">We use the GR-operation to compile the grammar-rules in
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>. In Sect. <a href="#S3.SS2.SSS1" title="3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2.1</span></a>, we explain how the
different types of two-level rules are compiled. Rule-conflicts and
their resolution are covered in Sect. <a href="#S3.SS2.SSS2" title="3.2.2 Resolving rule-conflicts. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2.2</span></a>.</p>
</div>
<section id="S3.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.1 </span>An Example Grammar</h3>
<div id="S3.SS1.p1" class="ltx_para">
<p class="ltx_p">An input-file for <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> consists of five parts:
<span class="ltx_text ltx_font_italic">the Alphabet</span>, <span class="ltx_text ltx_font_italic">the Rule-variables</span>, <span class="ltx_text ltx_font_italic">the Sets</span>,
<span class="ltx_text ltx_font_italic">the Definitions</span> and <span class="ltx_text ltx_font_italic">the Rules</span>. The file-format has
been modeled on the format used by the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler, and
all parts of the grammar are present also in the <span class="ltx_text ltx_font_smallcaps">TwolC</span>
compiler except for the part declaring rule-variables. There are a few
other differences, as well, most of which we will mention below. A
complete list of known differences can be found in the
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
documentation<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">10</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">10</sup><span class="ltx_ERROR undefined">\url</span>https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstTwolC</span></span></span>.</p>
</div>
<figure id="S3.T3" class="ltx_table"><pre class="ltx_verbatim ltx_font_typewriter">
Alphabet
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö
a b c d e f g h i j k l m n o p q r s t u v w x y z å ä ö
%+AV%+:0 %+AV%-:0 %+AVA:0 %+AVD:0 %+AVH:0 %+AVM:0
%~A:a %~A:ä
%~K:k %~K:0 %~K:v
%~P:p %~P:m ;
Rule-variables
Cm Cs Cw ;
Sets
Gradations = %+AV%+ %+AV%- %+AVA %+AVD %+AVH %+AVM ;
BackVowels = a o u A O U ;
UpperCaseVowels = A E I O U Y Å Ä Ö ;
LowerCaseVowels = a e i o u y å ä ö ;
Vowels = UpperCaseVowels LowerCaseVowels ;
Definitions
AlphaSeq = [ \Gradations: ]* ;
NonVowelSeq = [ \:Vowels ]* ;
Rules
"~K:0 Gradation"
%~K:0 <=> _ AlphaSeq Gradations:0 AlphaSeq %+AV%-:0 ;
"%~K:v and %~P:m Gradation"
Cs:Cw <=> _ AlphaSeq Cm:0 AlphaSeq %+AV%-:0 ;
where Cs in ( %~K %~P )
Cw in ( v m )
Cm in ( %+AVM %+AVH ) matched ;
"Vowel Harmony"
%~A:a <=> :BackVowels NonVowelSeq _ ;
</pre>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 3: </span>An example <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> grammar governing the surface
realizations of the forms presented in the example lexicon in
Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>.</figcaption>
</figure>
<section id="S3.SS1.SSS1" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.1 </span>The Alphabet.</h4>
<div id="S3.SS1.SSS1.p1" class="ltx_para">
<p class="ltx_p">The alphabet of a two-level grammar contains all lexical symbols
specified in the <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> grammar together with their
possible surface realizations. In the example grammar in
Table <a href="#S3.T3" title="Table 3 ‣ 3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>, the alphabet contains all
letters used in Finnish words together with the vowel-harmony
archphoneme <code class="ltx_verbatim ltx_font_typewriter">~A</code>, the gradation morphophonemes <code class="ltx_verbatim ltx_font_typewriter">~K</code> and
<code class="ltx_verbatim ltx_font_typewriter">~P</code>, as well as, the gradation-markers <code class="ltx_verbatim ltx_font_typewriter">+AV+</code>, <code class="ltx_verbatim ltx_font_typewriter">+AV-</code>,
<code class="ltx_verbatim ltx_font_typewriter">+AVA</code>, <code class="ltx_verbatim ltx_font_typewriter">+AVD</code>, <code class="ltx_verbatim ltx_font_typewriter">+AVH</code>, <code class="ltx_verbatim ltx_font_typewriter">+AVM</code>.</p>
</div>
<div id="S3.SS1.SSS1.p2" class="ltx_para">
<p class="ltx_p">All symbols in the grammar may be arbitrary strings of UTF-8
characters, but characters like <code class="ltx_verbatim ltx_font_typewriter">+</code>, <code class="ltx_verbatim ltx_font_typewriter">~</code> or white-space,
which bear special meanings for the compiler need to be escaped using
the escape-character <code class="ltx_verbatim ltx_font_typewriter">%</code>.</p>
</div>
<div id="S3.SS1.SSS1.p3" class="ltx_para">
<p class="ltx_p">The letters in the example-grammar of
Table <a href="#S3.T3" title="Table 3 ‣ 3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a> always correspond to
themselves on the surface. The gradation-markers always correspond to
zero and the archphoneme <code class="ltx_verbatim ltx_font_typewriter">~A</code> and the morphophonemes <code class="ltx_verbatim ltx_font_typewriter">~K</code>
and <code class="ltx_verbatim ltx_font_typewriter">~P</code> have various surface-realizations. E.g. <code class="ltx_verbatim ltx_font_typewriter">~A</code> is
always realized as either <code class="ltx_verbatim ltx_font_typewriter">a</code> or <span class="ltx_text ltx_font_typewriter">ä</span>.</p>
</div>
<div id="S3.SS1.SSS1.p4" class="ltx_para">
<p class="ltx_p">Each valid pair of a lexical symbol and its surface-correspondence has
to be listed in the alphabet. This differs from the <span class="ltx_text ltx_font_smallcaps">TwolC</span>
compiler, where pairs may be omitted from the alphabet, if they are
identity-pairs or are already constrained by some rule. Forcing the
grammar-writer to declare all symbol-pairs, may result in some extra
work, but it also prevents the creation of inadvertent pairs.</p>
</div>
<div id="S3.SS1.SSS1.p5" class="ltx_para">
<p class="ltx_p">Declaring all symbol-pairs in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> is mandatory, as we
have not yet implemented an <span class="ltx_text ltx_font_italic">other-symbol</span> like the one in
Xerox <span class="ltx_text ltx_font_smallcaps">TwolC</span> <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx3" title="" class="ltx_ref">1992</a>]</cite> using the <span class="ltx_text ltx_font_smallcaps">HFST</span>
interface. Besides the grammar-formalism, this also affects the
compile-time for rules, which becomes more dependent on the number of
symbol-pairs in the grammar.</p>
</div>
</section>
<section id="S3.SS1.SSS2" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.2 </span>The Rule-variables.</h4>
<div id="S3.SS1.SSS2.p1" class="ltx_para">
<p class="ltx_p">Like the <span class="ltx_text ltx_font_smallcaps">Xerox</span> compiler, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> supports
defining a set of similar two-level rules using a rule-schema with
variables. During the compilation of the grammar, each schema is
compiled into actual two-level rules, by substituting the variables
with the values specified for them. All rule-variables, which are used
in the grammar, need to be declared in the Rule-variables section.</p>
</div>
</section>
<section id="S3.SS1.SSS3" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.3 </span>The Sets.</h4>
<div id="S3.SS1.SSS3.p1" class="ltx_para">
<p class="ltx_p">It is often convenient to name some classes of symbols, which are used
in many rules. E.g. the class <code class="ltx_verbatim ltx_font_typewriter">BackVowels</code> in the example-grammar
in Table <a href="#S3.T3" title="Table 3 ‣ 3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>, which contains all
vowel-segments used in the grammar. The sets in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
and <span class="ltx_text ltx_font_smallcaps">TwolC</span> are very similar constructs.</p>
</div>
<div id="S3.SS1.SSS3.p2" class="ltx_para">
<p class="ltx_p">In <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>, the Cartesian product of sets, or a set and a
symbol, is always limited to the set of symbol-pairs declared in the
alphabet. E.g. the equivalent expressions <code class="ltx_verbatim ltx_font_typewriter">BackVowel:BackVowel</code>
and <code class="ltx_verbatim ltx_font_typewriter">BackVowel</code> will only accept the pairs <code class="ltx_verbatim ltx_font_typewriter">a:a</code>,
<code class="ltx_verbatim ltx_font_typewriter">o:o</code>, <code class="ltx_verbatim ltx_font_typewriter">u:u</code>, <code class="ltx_verbatim ltx_font_typewriter">A:A</code>, <code class="ltx_verbatim ltx_font_typewriter">O:O</code> and
<code class="ltx_verbatim ltx_font_typewriter">U:U</code>. Although it is conceivable, that they would accept
e.g. the pairs <code class="ltx_verbatim ltx_font_typewriter">a:U</code> and <code class="ltx_verbatim ltx_font_typewriter">A:O</code>, they will not, since the
pairs have not been declared.</p>
</div>
<div id="S3.SS1.SSS3.p3" class="ltx_para">
<p class="ltx_p">All sets have to be declared in the sets section of the grammar. Of
the five sets we have defined in the example grammar, the first four
are defined directly using a symbol sequence. The fifth set
<code class="ltx_verbatim ltx_font_typewriter">Vowels</code> is defined as the union of the sets <code class="ltx_verbatim ltx_font_typewriter">SmallVowels</code>
and <code class="ltx_verbatim ltx_font_typewriter">BigVowels</code>.</p>
</div>
</section>
<section id="S3.SS1.SSS4" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.4 </span>The Definitions.</h4>
<div id="S3.SS1.SSS4.p1" class="ltx_para">
<p class="ltx_p">Like character-sets, also regular expressions may be stored under a
name and accessed later using that name. Named regular expressions are
called definitions and may be used freely in the rules. Sets and
previous definitions can be used in the definition of a new
definition. The definitions in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and <span class="ltx_text ltx_font_smallcaps">TwolC</span>
are identical.</p>
</div>
</section>
<section id="S3.SS1.SSS5" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.5 </span>The Rules.</h4>
<div id="S3.SS1.SSS5.p1" class="ltx_para">
<p class="ltx_p">A two-level grammar constrains the surface-realizations of lexical
forms. The constraints are given as two-level rules, whose joint
effect determines the set of valid correspondences for each lexical
form. Each of the rules governs one realization of a lexical symbol in
a context given by a regular expression of pairs of a lexical and
surface symbol.</p>
</div>
<div id="S3.SS1.SSS5.p2" class="ltx_para">
<p class="ltx_p">The syntax and semantics of rules in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and
<span class="ltx_text ltx_font_smallcaps">TwolC</span> are very
similar<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">11</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">11</sup><span class="ltx_ERROR undefined">\url</span>http://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/fssyntax.html</span></span></span>.
Except for <span class="ltx_text ltx_font_italic">surface-restrictions</span> concerning epsilon,
i.e. epenthesis rules, the rules also work the same way.</p>
</div>
<div id="S3.SS1.SSS5.p3" class="ltx_para">
<p class="ltx_p">An example of a rule is the rule governing vowel-harmony in our example grammar</p>
<pre class="ltx_verbatim ltx_font_typewriter">
"Vowel Harmony"
%~A:a <=> :BackVowels NonVowelSeq _ ;
</pre>
<p class="ltx_p">It states that the archphoneme <code class="ltx_verbatim ltx_font_typewriter">~A</code> has to be realized as
<code class="ltx_verbatim ltx_font_typewriter">a</code>, if the surface-vowel immediately preceding it is a
back-vowel. It also disallows the pair <code class="ltx_verbatim ltx_font_typewriter">~A:a</code> in all other
contexts.</p>
</div>
<div id="S3.SS1.SSS5.p4" class="ltx_para">
<p class="ltx_p">The rule accepts the first correspondence in Table <a href="#S3.T4" title="Table 4 ‣ 3.1.5 The Rules. ‣ 3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a> since
the vowel preceding <code class="ltx_verbatim ltx_font_typewriter">~A</code> is <code class="ltx_verbatim ltx_font_typewriter">y</code>, which is not a
back-vowel. It disallows both of the latter correspondences. In the
second correspondence <code class="ltx_verbatim ltx_font_typewriter">~A</code> is realized as <code class="ltx_verbatim ltx_font_typewriter">a</code>, even though
the preceding surface-vowel is not a back-vowel. This violates the
<code class="ltx_verbatim ltx_font_typewriter">=></code> direction of the rule. In the third correspondence,
<code class="ltx_verbatim ltx_font_typewriter">~A</code> is realized as <span class="ltx_text ltx_font_typewriter">ä</span>, but the preceding
surface-vowel is <code class="ltx_verbatim ltx_font_typewriter">u</code>, which is a back-vowel. This violates the
<code class="ltx_verbatim ltx_font_typewriter"><=</code> direction of the rule.</p>
</div>
<figure id="S3.T4" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_align_middle">
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<td class="ltx_td ltx_align_right ltx_border_l ltx_border_t">
<br class="ltx_break"><code class="ltx_verbatim ltx_font_typewriter">k</code>
</td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~K</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~A</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~K</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~A</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">u</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">m</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~P</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">u</code></td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~A</code></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_right ltx_border_l">
<br class="ltx_break"><code class="ltx_verbatim ltx_font_typewriter">k</code>
</td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_r"><span class="ltx_text ltx_font_typewriter">ä<code class="ltx_verbatim"></code></span></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_r"><code class="ltx_verbatim ltx_font_typewriter">a</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">u</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">m</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">p</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">u</code></td>
<td class="ltx_td ltx_align_right ltx_border_r"><span class="ltx_text ltx_font_typewriter">ä<code class="ltx_verbatim"></code></span></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_right ltx_border_l ltx_border_t">
<br class="ltx_break">
</td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 4: </span>Symbol-pair correspondences for demonstrating the
vowel-harmony rules.</figcaption>
</figure>
<div id="S3.SS1.SSS5.p5" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> allows a set of rules to be defined using
variables or by giving a set of rule-centers. E.g. the rule which
defines the basic constraint of gradation in our example grammar is a
rule with three variables: <code class="ltx_verbatim ltx_font_typewriter">Cs</code>, <code class="ltx_verbatim ltx_font_typewriter">Cv</code> and <code class="ltx_verbatim ltx_font_typewriter">Cm</code>.</p>
<pre class="ltx_verbatim ltx_font_typewriter">
"%~K:v and %~P:m Gradation"
Cs:Cw <=> _ AlphaSeq Cm:0 AlphaSeq %+AV%-:0 ;
where Cs in ( %~K %~P )
Cw in ( v m )
Cm in ( %+AVM %+AVH ) matched ;
</pre>
<p class="ltx_p">Like ordinary alphabet-symbols, variables may be used both in the
center of a rule and in its contexts.
</p>
</div>
<div id="S3.SS1.SSS5.p6" class="ltx_para">
<p class="ltx_p">When a rule with variables is compiled, it is split into
sub-rules. These are obtained by substituting real alphabet symbols
for the variables. The possible values of variables are listed in the
where-clause following the rule.</p>
</div>
</section>
</section>
<section id="S3.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.2 </span>Compiling the Rules and Resolving Rule-Conflicts</h3>
<div id="S3.SS2.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> compiles two-level rules, given as regular
expressions of pairs, into finite-state transducers. All two-level
rules may be constructed from simple surface-requirements,
context-restrictions and surface-prohibitions. The compilation reduces
the two-sided rules and rules with variables into combinations of such
simple constructions, or subrules.</p>
</div>
<div id="S3.SS2.p2" class="ltx_para">
<p class="ltx_p">After compilation, the subrules are intersected, so that finally
equally many rule-transducers are produced as there were original
two-level rules. Intersecting the subrules of the two-level grammar
rules takes up a considerable portion of the compile-time of the
grammar.</p>
</div>
<div id="S3.SS2.p3" class="ltx_para">
<p class="ltx_p">Compilation of the rules is preceded by a phase called
conflict-resolution, which modifies rule-contexts in order to prevent
harmful interactions between the rules. After conflict-resolution the
modified rule-set may be compiled as usual.
</p>
</div>
<div id="S3.SS2.p4" class="ltx_para">
<p class="ltx_p">We use the GR-operation of Yli-Jyrä and Koskenniemi
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx11" title="" class="ltx_ref">2006</a>]</cite> to compile rules. Both compiling rules and
conflict-resolution is simplified using the operation.</p>
</div>
<div id="S3.SS2.p5" class="ltx_para">
<p class="ltx_p">The compilation in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> differs from <span class="ltx_text ltx_font_smallcaps">TwolC</span>
when epenthesis rules are compiled. As Yli-Jyrä and Koskenniemi
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx11" title="" class="ltx_ref">2006</a>]</cite> point out, epenthesis rules may be compiled as any
other surface-requirement rules using the GR-operation. This increases
the expressive power of the two-level grammar as explained below.</p>
</div>
<div id="S3.SS2.p6" class="ltx_para">
<p class="ltx_p">A general restriction of the pair-alphabet <math id="S3.SS2.p6.m1" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math> is an expression
<math id="S3.SS2.p6.m2" class="ltx_Math" alttext="W\overset{n\diamond}{\Rightarrow}W^{\prime}" display="inline"><mrow><mi>W</mi><mo></mo><mover accent="true"><mo>⇒</mo><mrow><mi>n</mi><mo>⋄</mo></mrow></mover><mo></mo><msup><mi>W</mi><mo>′</mo></msup></mrow></math>, where the precondition <math id="S3.SS2.p6.m3" class="ltx_Math" alttext="W" display="inline"><mi>W</mi></math> and
postcondition <math id="S3.SS2.p6.m4" class="ltx_Math" alttext="W^{\prime}" display="inline"><msup><mi>W</mi><mo>′</mo></msup></math> are unions of expressions of the form <math id="S3.SS2.p6.m5" class="ltx_Math" alttext="V_{1}\diamond V_{2}\ \diamond\text{ ... }\diamond\ V_{n}\subset\Sigma^{*}(%
\diamond\ \Sigma^{*})^{n}" display="inline"><mrow><mrow><msub><mi>V</mi><mn>1</mn></msub><mo>⋄</mo><mpadded width="+5pt"><msub><mi>V</mi><mn>2</mn></msub></mpadded><mo>⋄</mo><mtext> … </mtext><mo>⋄</mo><msub><mi>V</mi><mi>n</mi></msub></mrow><mo>⊂</mo><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo></mo><msup><mrow><mo stretchy="false">(</mo><mrow><mo rspace="7.5pt">⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo stretchy="false">)</mo></mrow><mi>n</mi></msup></mrow></mrow></math>, where <math id="S3.SS2.p6.m6" class="ltx_Math" alttext="\diamond\notin\Sigma" display="inline"><mrow><mo>⋄</mo><mo>∉</mo><mi mathvariant="normal">Σ</mi></mrow></math> is a special
marker-symbol and each <math id="S3.SS2.p6.m7" class="ltx_Math" alttext="V_{i}" display="inline"><msub><mi>V</mi><mi>i</mi></msub></math> is a regular language of the alphabet
<math id="S3.SS2.p6.m8" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math>. Such an expression is compiled into a regular expression
using the GR-operation as in (<a href="#S3.E6" title="(6) ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a>).</p>
</div>
<div id="S3.SS2.p7" class="ltx_para">
<table id="S3.E6" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E6.m1" class="ltx_Math" alttext="\Sigma^{*}-\text{delete}_{\diamond}(W-W^{\prime})" display="block"><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>-</mo><mrow><msub><mtext>delete</mtext><mo>⋄</mo></msub><mo></mo><mrow><mo stretchy="false">(</mo><mrow><mi>W</mi><mo>-</mo><msup><mi>W</mi><mo>′</mo></msup></mrow><mo stretchy="false">)</mo></mrow></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(6)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.p8" class="ltx_para">
<p class="ltx_p">The operation <math id="S3.SS2.p8.m1" class="ltx_Math" alttext="\text{delete}_{\diamond}" display="inline"><msub><mtext>delete</mtext><mo>⋄</mo></msub></math> in (<a href="#S3.E6" title="(6) ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a>) rewrites each
marker-symbol <math id="S3.SS2.p8.m2" class="ltx_Math" alttext="\diamond" display="inline"><mo>⋄</mo></math> into epsilon and leaves all other symbols
intact.</p>
</div>
<div id="S3.SS2.p9" class="ltx_para">
<p class="ltx_p">We do not need the full expressive power of the GR-operation. Instead
we use a restricted version <math id="S3.SS2.p9.m1" class="ltx_Math" alttext="W\overset{2\diamond}{\Rightarrow}W^{\prime}" display="inline"><mrow><mi>W</mi><mo></mo><mover accent="true"><mo>⇒</mo><mrow><mn>2</mn><mo>⋄</mo></mrow></mover><mo></mo><msup><mi>W</mi><mo>′</mo></msup></mrow></math>,
which is limited to compiling rules with one center and a number of
contexts with a right and a left part. Hence we operate on
preconditions and postconditions with two diamonds,
i.e. <math id="S3.SS2.p9.m2" class="ltx_Math" alttext="W,W^{\prime}\subseteq\Sigma^{*}\diamond\Sigma^{*}\diamond\Sigma^{*}" display="inline"><mrow><mrow><mi>W</mi><mo>,</mo><msup><mi>W</mi><mo>′</mo></msup></mrow><mo>⊆</mo><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow></mrow></math>.</p>
</div>
<div id="S3.SS2.p10" class="ltx_para">
<p class="ltx_p">We discuss compiling one rule first and then conflict-resolution,
although logically conflict-resolution is done first and then the
rules are compiled. This is easier to explain, because
conflict-resolution is highly dependent on the way the rules are
compiled.</p>
</div>
<section id="S3.SS2.SSS1" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.2.1 </span>Compiling one rule.</h4>
<div id="S3.SS2.SSS1.p1" class="ltx_para">
<p class="ltx_p">Yli-Jyrä and Koskenniemi <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx11" title="" class="ltx_ref">2006</a>]</cite> explain how ordinary
two-level rules can be compiled using the GR-operation. We use slight
variations of the same methods.</p>
</div>
<div id="S3.SS2.SSS1.p2" class="ltx_para">
<p class="ltx_p">Surface-requirement rules and context-restriction rules need to be
compiled in different ways. Surface-prohibition rules can be compiled
in a similar manner as surface-requirement rules and double-sided
rules are compiled, by intersecting the two directions of the rule.</p>
</div>
<div id="S3.SS2.SSS1.p3" class="ltx_para">
<p class="ltx_p">The general restriction corresponding to the context-restriction rule
<math id="S3.SS2.SSS1.p3.m1" class="ltx_Math" alttext="a\text{:}b\Rightarrow\bigcup_{i=0}^{n}L_{i}\ \_\ R_{i}" display="inline"><mrow><mrow><mi>a</mi><mo></mo><mtext>:</mtext><mo></mo><mi>b</mi></mrow><mo>⇒</mo><mrow><msubsup><mo largeop="true" mathsize="160%" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mrow><mpadded width="+5pt"><msub><mi>L</mi><mi>i</mi></msub></mpadded><mo></mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo></mo><msub><mi>R</mi><mi>i</mi></msub></mrow></mrow></mrow></math> is given by
(<a href="#S3.E7" title="(7) ‣ 3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">7</span></a>).</p>
</div>
<div id="S3.SS2.SSS1.p4" class="ltx_para">
<table id="S3.E7" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E7.m1" class="ltx_Math" alttext="\Sigma^{*}\diamond a\text{:}b\diamond\Sigma^{*}\overset{2\diamond}{\Rightarrow%
}\bigcup_{i=0}^{n}L_{i}\diamond\Sigma^{*}\diamond R_{i}" display="block"><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><mrow><mi>a</mi><mo></mo><mtext>:</mtext><mo></mo><mi>b</mi></mrow><mo>⋄</mo><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo></mo><mover accent="true"><mo>⇒</mo><mrow><mn>2</mn><mo>⋄</mo></mrow></mover><mo></mo><mrow><munderover><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></munderover><msub><mi>L</mi><mi>i</mi></msub></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msub><mi>R</mi><mi>i</mi></msub></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(7)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p5" class="ltx_para">
<p class="ltx_p">The surface-requirement rule requires an auxiliary definition. We
define the inverse projection <math id="S3.SS2.SSS1.p5.m1" class="ltx_Math" alttext="[x\text{:}]" display="inline"><mrow><mo stretchy="false">[</mo><mrow><mi>x</mi><mo></mo><mtext>:</mtext></mrow><mo stretchy="false">]</mo></mrow></math> of the input-symbol <math id="S3.SS2.SSS1.p5.m2" class="ltx_Math" alttext="x" display="inline"><mi>x</mi></math>
using (<a href="#S3.E8" title="(8) ‣ 3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">8</span></a>). Here <math id="S3.SS2.SSS1.p5.m3" class="ltx_Math" alttext="x" display="inline"><mi>x</mi></math> may be any of the
input-symbols of pairs in <math id="S3.SS2.SSS1.p5.m4" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math>, including epsilon.</p>
</div>
<div id="S3.SS2.SSS1.p6" class="ltx_para">
<table id="S3.E8" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E8.m1" class="ltx_Math" alttext="[x\text{:}]=\{x\text{:}y\ |\ x\text{:}y\in\Sigma\}" display="block"><mrow><mrow><mo stretchy="false">[</mo><mrow><mi>x</mi><mo></mo><mtext>:</mtext></mrow><mo stretchy="false">]</mo></mrow><mo>=</mo><mrow><mo stretchy="false">{</mo><mrow><mi>x</mi><mo></mo><mtext>:</mtext><mo></mo><mpadded width="+5pt"><mi>y</mi></mpadded></mrow><mo rspace="7.5pt" stretchy="false">|</mo><mrow><mrow><mi>x</mi><mo></mo><mtext>:</mtext><mo></mo><mi>y</mi></mrow><mo>∈</mo><mi mathvariant="normal">Σ</mi></mrow><mo stretchy="false">}</mo></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(8)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p7" class="ltx_para">
<p class="ltx_p">The general restriction corresponding to the surface-requirement rule
<math id="S3.SS2.SSS1.p7.m1" class="ltx_Math" alttext="a\text{:}b\Leftarrow\bigcup_{i=0}^{n}L_{i}\ \_\ R_{i}" display="inline"><mrow><mrow><mi>a</mi><mo></mo><mtext>:</mtext><mo></mo><mi>b</mi></mrow><mo>⇐</mo><mrow><msubsup><mo largeop="true" mathsize="160%" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mrow><mpadded width="+5pt"><msub><mi>L</mi><mi>i</mi></msub></mpadded><mo></mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo></mo><msub><mi>R</mi><mi>i</mi></msub></mrow></mrow></mrow></math> is given by
(<a href="#S3.E9" title="(9) ‣ 3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">9</span></a>).</p>
</div>
<div id="S3.SS2.SSS1.p8" class="ltx_para">
<table id="S3.E9" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E9.m1" class="ltx_Math" alttext="\Big{(}\Sigma^{*}\diamond[a\text{:}]-a\text{:}b\ \diamond\Sigma^{*}\cap\bigcup%
_{i=0}^{n}L_{i}\diamond\Sigma^{*}\diamond R_{i}\Big{)}\overset{2\diamond}{%
\Rightarrow}\emptyset" display="block"><mrow><mrow><mo maxsize="160%" minsize="160%">(</mo><mrow><mrow><mrow><mrow><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><mrow><mo stretchy="false">[</mo><mrow><mi>a</mi><mo></mo><mtext>:</mtext></mrow><mo stretchy="false">]</mo></mrow></mrow><mo>-</mo><mrow><mi>a</mi><mo></mo><mtext>:</mtext><mo></mo><mpadded width="+5pt"><mi>b</mi></mpadded></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo>∩</mo><mrow><munderover><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></munderover><msub><mi>L</mi><mi>i</mi></msub></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msub><mi>R</mi><mi>i</mi></msub></mrow><mo maxsize="160%" minsize="160%">)</mo></mrow><mo></mo><mover accent="true"><mo>⇒</mo><mrow><mn>2</mn><mo>⋄</mo></mrow></mover><mo></mo><mi mathvariant="normal">∅</mi></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(9)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p9" class="ltx_para">
<p class="ltx_p">The general restriction corresponding to the surface-prohibition rule
<math id="S3.SS2.SSS1.p9.m1" class="ltx_Math" alttext="a\text{:}b\/\Leftarrow\bigcup_{i=0}^{n}L_{i}\ \_\ R_{i}" display="inline"><mrow><mrow><mi>a</mi><mo></mo><mtext>:</mtext><mo></mo><mi>b</mi></mrow><mo>⇐</mo><mrow><msubsup><mo largeop="true" mathsize="160%" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mrow><mpadded width="+5pt"><msub><mi>L</mi><mi>i</mi></msub></mpadded><mo></mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo></mo><msub><mi>R</mi><mi>i</mi></msub></mrow></mrow></mrow></math> is similar. It
is given by (<a href="#S3.E10" title="(10) ‣ 3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in Zürich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">10</span></a>).</p>
</div>
<div id="S3.SS2.SSS1.p10" class="ltx_para">
<table id="S3.E10" class="ltx_equation ltx_eqn_table">
<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E10.m1" class="ltx_Math" alttext="\Big{(}\Sigma^{*}\diamond a\text{:}b\ \diamond\Sigma^{*}\cap\bigcup_{i=0}^{n}L%
_{i}\diamond\Sigma^{*}\diamond R_{i}\Big{)}\overset{2\diamond}{\Rightarrow}\emptyset" display="block"><mrow><mrow><mo maxsize="160%" minsize="160%">(</mo><mrow><mrow><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><mrow><mi>a</mi><mo></mo><mtext>:</mtext><mo></mo><mpadded width="+5pt"><mi>b</mi></mpadded></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo>∩</mo><mrow><munderover><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></munderover><msub><mi>L</mi><mi>i</mi></msub></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msub><mi>R</mi><mi>i</mi></msub></mrow><mo maxsize="160%" minsize="160%">)</mo></mrow><mo></mo><mover accent="true"><mo>⇒</mo><mrow><mn>2</mn><mo>⋄</mo></mrow></mover><mo></mo><mi mathvariant="normal">∅</mi></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(10)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p11" class="ltx_para">