2009-sfcm/Pirinen-2009-sfcm.html

<!DOCTYPE html><html>
<head>
<title>HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3 </title>
<!--Generated on Thu Sep 28 18:07:52 2017 by LaTeXML (version 0.8.2) http://dlmf.nist.gov/LaTeXML/.-->
<!--Document created on Last modification: September 28, 2017.-->

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" href="../latexml/LaTeXML.css" type="text/css">
<link rel="stylesheet" href="../latexml/ltx-article.css" type="text/css">
</head>
<body>
<div class="ltx_page_main">
<div class="ltx_page_content">
<article class="ltx_document ltx_authors_1line">
<h1 class="ltx_title ltx_title_document">
<span class="ltx_text ltx_font_smallcaps">HFST</span> Tools for Morphology – An Efficient Open-Source
Package for Construction of Morphological Analyzers
<span class="ltx_ERROR undefined">\footnotepubrights</span>This article was published in Proceedings of SFCM 2009 in
ZÃ¼rich <span class="ltx_ERROR undefined">\url</span>http://sfcm2009.org.
<span class="ltx_ERROR undefined">\springerpostprintdoi</span>10.1007/978-3-642-04131-0_3
</h1>
<div class="ltx_authors">
<span class="ltx_creator ltx_role_author">
<span class="ltx_personname"> Krister LindÃ©n
</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Miikka Silfverberg
</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Tommi Pirinen
<br class="ltx_break">University of Helsinki
<br class="ltx_break">firstname.lastname@helsinki.fi
<br class="ltx_break">
</span></span>
</div>
<div class="ltx_date ltx_role_creation">Last modification: September 28, 2017</div>

<div class="ltx_abstract">
<h6 class="ltx_title ltx_title_abstract">Abstract</h6>

<p class="ltx_p">Morphological analysis of a wide range of languages can be implemented
efficiently using finite-state transducer technologies. Over the last
30 years, a number of attempts have been made to create tools for
computational morphologies. The two main competing approaches have
been parallel vs. cascaded rule application. The parallel rule
application was originally introduced by Koskenniemi
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx1" title="" class="ltx_ref">1983</a>]</cite> and implemented in tools like <span class="ltx_text ltx_font_smallcaps">TwolC</span> and
<span class="ltx_text ltx_font_smallcaps">LexC</span>. Currently many applications of morphologies could use
dictionaries encoding the a priori likelihoods of words and
expressions as well as the likelihood of relations to other
representations or languages. We have made the choice to create
open-source tools and language descriptions in order to let as many as
possible participate in the effort. The current article presents some
of the main tools that we have created such as <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span>,
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and <span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span>. We evaluate
their efficiency in comparison to some similar tools and libraries. In
particular, we evaluate them using several full-fledged morphological
descriptions. Our tools compare well with similar open source tools,
even if we still have some challenges ahead before we can catch up
with the commercial tools. We demonstrate that for various reasons a
parallel rule approach still seems to be more efficient than a
cascaded rule approach when developing finite-state morphologies.</p>

</div>
<section id="S1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2>

<div id="S1.p1" class="ltx_para">
<p class="ltx_p">Morphological analysis of a wide range of languages can be implemented
efficiently using finite-state technologies based on finite-state
transducers. Our goal is to implement efficient tools for creating and
manipulating finite-state transducer morphologies for different uses
and purposes. The task is daunting and we cannot do it alone.</p>
</div>
<div id="S1.p2" class="ltx_para">
<p class="ltx_p">Over the last 30 years, a number of attempts have been made to create
tools for computational morphologies and some of them have withstood
the test of time better than others. A major effort that has shaped
the landscape and incorporated many lasting ideas is the morphological
development tools created by Xerox. It started with the insight that
we can use transducers to describe or encode phonological processes
and relate various levels of linguistic abstraction using tools like
<span class="ltx_text ltx_font_smallcaps">TwolC</span> introduced by Koskenniemi and Karttunen
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx1" title="" class="ltx_ref">1983</a>, <a href="#bib.bibx2" title="" class="ltx_ref">1987</a>, <a href="#bib.bibx3" title="" class="ltx_ref">1992</a>]</cite>. To efficiently compile
large-scale lexicons into transducers, we need special lexicon
compilers like <span class="ltx_text ltx_font_smallcaps">LexC</span> described by Karttunen
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx4" title="" class="ltx_ref">1993</a>, <a href="#bib.bibx5" title="" class="ltx_ref">1994</a>]</cite>.</p>
</div>
<div id="S1.p3" class="ltx_para">
<p class="ltx_p">Such tools do not solve all the problems. Writing full-scale
dictionaries in <span class="ltx_text ltx_font_smallcaps">LexC</span> may well be compared to having
programmers write sophisticated applications in C without access to
any of the modern high-level libraries. It is possible, but unless it
is done in some principled way, one may easily end up with
spaghetti-code that is difficult to maintain. This is not the fault of
the lexicon compiler, but the general programming solution is to
create several descriptions that are small and independent,
i.e. modular. With this insight and as computers became more
powerful, the initial calculus that was conceived for abstract objects
like automata and transducers in <span class="ltx_text ltx_font_smallcaps">TwolC</span> and <span class="ltx_text ltx_font_smallcaps">LexC</span> was
expanded and migrated into the lexical programming environment
<span class="ltx_text ltx_font_smallcaps">xfst</span> documented by Beesly and Karttunen <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx8" title="" class="ltx_ref">2003</a>]</cite>,
where smaller lexical modules for various purposes can be tailored and
combined using finite-state calculus operations.</p>
</div>
<div id="S1.p4" class="ltx_para">
<p class="ltx_p">The previous effort is well-worth studying, but currently additional
ideas have established themselves such as weighted transducers for
modeling aspects of language that deal with preferences or trends
rather than strict rules or on/off phenomena. Many applications of
morphologies could use dictionaries encoding the a priori likelihoods
of words and expressions as well as the likelihood of their relations
to phonetic representations or their lexical relations to other words
in the same language or in different languages. The efforts to explore
weighted finite-state transducers for natural language processing are
ongoing in information retrieval, speech processing and machine
translation to name a few of the main application areas involved.</p>
</div>
<div id="S1.p5" class="ltx_para">
<p class="ltx_p">Since we do not pretend that we could develop all the morphologies for
all the languages ourselves, or even all the aspects of the tools
needed to develop these morphologies, we have made the choice to
create open-source tools and language descriptions. We hope as many as
possible will participate in the effort by developing the tools
further for common needs and special purposes.</p>
</div>
<div id="S1.p6" class="ltx_para">
<p class="ltx_p">In addition to the open source tools, we also encourage the commercial
use of the final transducers created by the tools by providing runtime
software<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup><span class="ltx_ERROR undefined">\url</span>https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstRuntimeInterface</span></span></span>
that is free for commercial purposes. Eventually this will allow
software applications simply to select the appropriate transducer in
order to process a language correctly allowing the programmer to
ignore special characteristics of individual languages.</p>
</div>
<div id="S1.p7" class="ltx_para">
<p class="ltx_p">Recently, a number of open-source finite-state processing environments
have emerged, e.g. for unweighted transducers there are the
<em class="ltx_emph">SFST–Stuttgart Finite-State Transducer
Tools</em><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup><span class="ltx_ERROR undefined">\url</span>http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html</span></span></span>
by Schmid <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx10" title="" class="ltx_ref">2005</a>]</cite>, <em class="ltx_emph">foma: a finite-state machine toolkit and
library</em><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">3</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">3</sup><span class="ltx_ERROR undefined">\url</span>http://foma.sourceforge.net/</span></span></span> by Huldén,
etc., and for weighted transducers there are
<em class="ltx_emph">Vaucanson</em><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">4</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">4</sup><span class="ltx_ERROR undefined">\url</span>http://www.lrde.epita.fr/cgi-bin/twiki/view/Projects/Vaucanson</span></span></span>
by Lombardy et al. <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx9" title="" class="ltx_ref">2004</a>]</cite>, <em class="ltx_emph">OpenFST
Library</em><span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">5</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">5</sup><span class="ltx_ERROR undefined">\url</span>http://www.openfst.org/</span></span></span> by Allauzen et
al. <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx12" title="" class="ltx_ref">2007</a>]</cite>, etc. These are valuable contributions to the open
source software that we can build on.</p>
</div>
<div id="S1.p8" class="ltx_para">
<p class="ltx_p">Our particular goal currently is in providing the basic facilities for
efficiently developing, compiling and running morphologies with or
without weights. To achieve our goal, we decided to create a unified
API<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">6</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">6</sup><span class="ltx_ERROR undefined">\url</span>http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/documentation.shtml</span></span></span>,
which is capable of interfacing various weighted and unweighted
finite-state transducer libraries allowing us to incorporate new
libraries as needed. Currently, we have interfaces to <span class="ltx_text ltx_font_smallcaps">SFST</span>
and <span class="ltx_text ltx_font_smallcaps">OpenFST</span>. On top of the unified API, we created a set of
basic
tools<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">7</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">7</sup><span class="ltx_ERROR undefined">\url</span>https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstHome</span></span></span>,
e.g. <span class="ltx_text ltx_font_smallcaps">HFST-TwolC, HFST-LexC, HFST-Compose-Intersect,
HFST-Test, HFST-Lookup</span>, etc. With these tools, we created or used
real full-fledged morphological descriptions of different languages
from different
language-families<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">8</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">8</sup><span class="ltx_ERROR undefined">\url</span>http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/index.shtml</span></span></span>,
e.g. <em class="ltx_emph">English, Finnish, French, Northern Sámi</em> and
<em class="ltx_emph">Swedish</em>. We used the morphological descriptions for testing the
functionality of the tools and for evaluating the performance of the
different libraries through a unified interface on the full-fledged
morphological development and compilation tasks.</p>
</div>
<div id="S1.p9" class="ltx_para">
<p class="ltx_p">The current article presents some of the main tools that we have
created: <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> in Sect. 2, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> in Sect. 3
and <span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span> in Sect. 4. For each tool, we
present the main theoretical underpinnings of the implementation and
illustrate them with some examples. We highlight the main design
decisions that influenced the efficiency of the implementation and
how, if at all, our implementations differ from their namesakes. In
Sect. 5, we briefly present the morphological descriptions that we use
for demonstrating and comparing the efficiency of the
implementation. In Sect. 6, we evaluate the performance of our tools
for parallel-rule application and compare them with the performance of
the <span class="ltx_text ltx_font_italic">foma</span> <span class="ltx_text ltx_font_smallcaps">LexC</span> compiler and the <span class="ltx_text ltx_font_italic">Xerox</span> tools,
as well as the cascaded rule compiler of <span class="ltx_text ltx_font_smallcaps">SFST</span>. In Sect. 7, we
discuss the test results and present some aspects of future research
and development. In Sect. 8, we draw the conclusions.</p>
</div>
</section>
<section id="S2" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">2 </span><span class="ltx_text ltx_font_smallcaps">HFST-LexC</span>
</h2>

<div id="S2.p1" class="ltx_para">
<p class="ltx_p">A lexicon compiler is a program that reads sets of morphemes and their
morphotactic combinations in order to create a finite-state transducer
of a lexicon. This finite state transducer is called a <em class="ltx_emph">lexical
transducer</em><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx5" title="" class="ltx_ref">1994</a>]</cite>. The lexical transducer may be further
adjusted with e.g. phonological rules. The example for our lexicon
compiler is set by <span class="ltx_text ltx_font_smallcaps">LexC</span> of Xerox <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx8" title="" class="ltx_ref">2003</a>]</cite>. In
<span class="ltx_text ltx_font_smallcaps">LexC</span>, morphemes are arranged into named sets called
sub-lexicons. Each entry of a sub-lexicon is a pair of finite
strings<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">9</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">9</sup>Entries of regular expression form are not covered
here to simplify the presentation, but a full definition of an entry
in this formalism allows an entry to be a regular language.</span></span></span>
associated with the name of a sub-lexicon called a <em class="ltx_emph">continuation
class</em>.</p>
</div>
<div id="S2.p2" class="ltx_para">
<p class="ltx_p">Below, we highlight the main design decisions that influenced the
efficiency of the implementation and some of the we present the main
theory for compiling a <span class="ltx_text ltx_font_smallcaps">LexC</span> description into a finite-state
transducer. Our morphology example outlines the nominal inflection of
four Finnish nouns as shown in Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>. This
example is a highly simplified version of the actual morphology.</p>
</div>
<figure id="S2.T1" class="ltx_table"><pre class="ltx_verbatim ltx_font_typewriter">
Multichar_Symbols

+noun +1 +a +d +h +m +AV+ +AV- +AVA +AVD +AVH +AVM
+all +gen +ptv +sg ~A ~K ~P

LEXICON Root
    akku+noun+1+a:ak~Ku+AVA   N1b "battery";
    alku+noun+1+d:al~Ku+AVD   N1b "beginning";
    kumpu+noun+1+h:kum~Pu+AVH N1b "heap";
    kyky+noun+1+m:ky~Ky+AVM   N1b "capability";

LEXICON N1b
    NounSg ;
    NounPtvA ;


LEXICON NounPtvA
    +sg+ptv:~A+AV+  Ennd ;

LEXICON NounSg
    +sg+gen:n+AV-   Ennd ;
    +sg+all:l+AV-le Ennd ;
    :n+AV-          Compounding ;

LEXICON Compounding
    Root ;

LEXICON Ennd
    # ;
</pre>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 1: </span>A simplified <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> lexicon for some Finnish
nouns.</figcaption>
</figure>
<div id="S2.p3" class="ltx_para">
<p class="ltx_p">There are at least three time consuming parts of the <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> compilation
process. First the compiler needs to parse the strings representing
the entry morphemes. Traditionally <span class="ltx_text ltx_font_smallcaps">LexC</span> allows multiple
characters in a single symbol. The problem of finding the optimal
partition of a string when compiling it into a finite-state transducer
is optimizing the <em class="ltx_emph">tokenization</em> algorithm. The tokenization is
discussed in Sect. <a href="#S2.SS1" title="2.1 Efficient Tokenizing of a Sub-Lexicon Entry ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.1</span></a>. The set of entries
in each sub-lexicon form a <em class="ltx_emph">union</em>. There are a few alternative
strategies for creating unions, which are briefly outlined in
Sect. <a href="#S2.SS2" title="2.2 Efficient Union of Sub-Lexicon Entries ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>. The combining of sub-lexicons is
described in Sect. <a href="#S2.SS3" title="2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.3</span></a> on morphotax.</p>
</div>
<section id="S2.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">2.1 </span>Efficient Tokenizing of a Sub-Lexicon Entry</h3>

<div id="S2.SS1.p1" class="ltx_para">
<p class="ltx_p">Lexicon entries are tokenized using a simple left-to-right longest
match algorithm. The entry can be tokenized by going through the entry
string, position by position, and looking up the longest symbols
available using a very simple greedy algorithm. An alternative, but
less efficient, strategy is to compose the entry string with a
tokenizer-transducer, implementing greedy left-to-right matching, to
achieve the correct partition.</p>
</div>
</section>
<section id="S2.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">2.2 </span>Efficient Union of Sub-Lexicon Entries</h3>

<div id="S2.SS2.p1" class="ltx_para">
<p class="ltx_p">The finite-state form of a sub-lexicon is a union of entry
transducers. Building a union of entry transducers is a relatively
straight-forward process. However, iteratively taking the union of n
entries with the n+1<math id="S2.SS2.p1.m1" class="ltx_Math" alttext="{}^{th}" display="inline"><msup><mi></mi><mrow><mi>t</mi><mo>⁢</mo><mi>h</mi></mrow></msup></math> entry is not ideal. A faster approach,
given that all our entries are simple finite strings is to build the
sub-lexicon transducer as a prefix tree, <em class="ltx_emph">trie</em>.</p>
</div>
</section>
<section id="S2.SS3" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">2.3 </span>Efficient Implementation of Morphotax over Sublexicons</h3>

<div id="S2.SS3.p1" class="ltx_para">
<p class="ltx_p">There are two strategies for making the sub-lexicon combinations. A
trivial, and fast, strategy is to connect the entries using
continuation classes to sub-lexicons of the same name with an epsilon
arc. An alternative method, which we implemented, is to use named
auxiliary symbols and the finite-state algebra to create
overgenerating combinations which are filtered by composition to
achieve legal combinations. This is further described in the next
subsection.</p>
</div>
<section id="S2.SS3.SSS1" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">2.3.1 </span>Combining sublexicons using standard finite-state transducer algebra.</h4>

<div id="S2.SS3.SSS1.p1" class="ltx_para">
<p class="ltx_p">We assume all standard finite state operations to be known. For an
introduction, see Beesly and Karttunen <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx8" title="" class="ltx_ref">2003</a>]</cite>. We use the
following notation: <math id="S2.SS3.SSS1.p1.m1" class="ltx_Math" alttext="\cup" display="inline"><mo>∪</mo></math> is union, <math id="S2.SS3.SSS1.p1.m2" class="ltx_Math" alttext="\cap" display="inline"><mo>∩</mo></math> is intersection, <math id="S2.SS3.SSS1.p1.m3" class="ltx_Math" alttext="\circ" display="inline"><mo>∘</mo></math>
is composition, juxtaposition is concatenation. Latin characters
represent symbols of language and the <math id="S2.SS3.SSS1.p1.m4" class="ltx_Math" alttext="\varepsilon" display="inline"><mi>ε</mi></math> symbol is used for
a zero-length string. Capital Greek letters <math id="S2.SS3.SSS1.p1.m5" class="ltx_Math" alttext="\Sigma,\Gamma" display="inline"><mrow><mi mathvariant="normal">Σ</mi><mo>,</mo><mi mathvariant="normal">Γ</mi></mrow></math>
represent subsets of an alphabet. We define <math id="S2.SS3.SSS1.p1.m6" class="ltx_Math" alttext="\Sigma=\{a,b,\ldots\}" display="inline"><mrow><mi mathvariant="normal">Σ</mi><mo>=</mo><mrow><mo stretchy="false">{</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo>,</mo><mi mathvariant="normal">…</mi><mo stretchy="false">}</mo></mrow></mrow></math> as a subset of the alphabet used for representing the
morphophonology of the language in <span class="ltx_text ltx_font_smallcaps">LexC</span> definitions. <math id="S2.SS3.SSS1.p1.m7" class="ltx_Math" alttext="\Gamma" display="inline"><mi mathvariant="normal">Γ</mi></math>
is the alphabet of the <em class="ltx_emph">auxiliary</em> symbols used in our rules in
the morphotax implementation. We assume that <math id="S2.SS3.SSS1.p1.m8" class="ltx_Math" alttext="\Sigma\cup\Gamma=\emptyset" display="inline"><mrow><mrow><mi mathvariant="normal">Σ</mi><mo>∪</mo><mi mathvariant="normal">Γ</mi></mrow><mo>=</mo><mi mathvariant="normal">∅</mi></mrow></math>. We use the symbol <math id="S2.SS3.SSS1.p1.m9" class="ltx_Math" alttext="J\in\Sigma" display="inline"><mrow><mi>J</mi><mo>∈</mo><mi mathvariant="normal">Σ</mi></mrow></math> for <em class="ltx_emph">joiners</em> to
delimit and combine morphemes in our morphotax. A joiner for an entry
with a continuation class named <math id="S2.SS3.SSS1.p1.m10" class="ltx_Math" alttext="x" display="inline"><mi>x</mi></math> is denoted as <math id="S2.SS3.SSS1.p1.m11" class="ltx_Math" alttext="J_{x}" display="inline"><msub><mi>J</mi><mi>x</mi></msub></math> and a joiner
for a sub-lexicon named <math id="S2.SS3.SSS1.p1.m12" class="ltx_Math" alttext="y" display="inline"><mi>y</mi></math> is denoted as <math id="S2.SS3.SSS1.p1.m13" class="ltx_Math" alttext="J_{y}" display="inline"><msub><mi>J</mi><mi>y</mi></msub></math>.</p>
</div>
<div id="S2.SS3.SSS1.p2" class="ltx_para">
<p class="ltx_p">We introduce the compilation of lexicons using the example-lexicon in
Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>.</p>
</div>
<div id="S2.SS3.SSS1.p3" class="ltx_para">
<p class="ltx_p">A single entry in a sub-lexicon, i.e., a line of code in a
<span class="ltx_text ltx_font_smallcaps">LexC</span> file, is referred to as a morpheme denoted by
<math id="S2.SS3.SSS1.p3.m1" class="ltx_Math" alttext="\mathcal{M}" display="inline"><mi class="ltx_font_mathcaligraphic">ℳ</mi></math>. A morpheme can be a subset of the language
<math id="S2.SS3.SSS1.p3.m2" class="ltx_Math" alttext="\Sigma^{\star}" display="inline"><msup><mi mathvariant="normal">Σ</mi><mo>⋆</mo></msup></math> appended with the joiner of a continuation class
(<a href="#S2.E1" title="(1) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>).</p>
</div>
<div id="S2.SS3.SSS1.p4" class="ltx_para">
<table id="S2.E1" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E1.m1" class="ltx_Math" alttext="\mathcal{M}=\Sigma^{\star}~{}J" display="block"><mrow><mi class="ltx_font_mathcaligraphic">ℳ</mi><mo>=</mo><mrow><mpadded width="+3.3pt"><msup><mi mathvariant="normal">Σ</mi><mo>⋆</mo></msup></mpadded><mo>⁢</mo><mi>J</mi></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p5" class="ltx_para">
<p class="ltx_p">E.g. the <span class="ltx_text ltx_font_smallcaps">LexC</span> string entry <em class="ltx_emph">akku:ak<math id="S2.SS3.SSS1.p5.m1" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>Ku+AVA</em>
with a continuation class <em class="ltx_emph">N1b</em> becomes
<em class="ltx_emph">a k k:<math id="S2.SS3.SSS1.p5.m2" class="ltx_Math" alttext="\sim" display="inline"><mo>∼</mo></math>K u:+AVA <math id="S2.SS3.SSS1.p5.m3" class="ltx_Math" alttext="\varepsilon" display="inline"><mi>ε</mi></math>:<math id="S2.SS3.SSS1.p5.m4" class="ltx_Math" alttext="J_{N1b}" display="inline"><msub><mi>J</mi><mrow><mi>N</mi><mo>⁢</mo><mn>1</mn><mo>⁢</mo><mi>b</mi></mrow></msub></math></em>.</p>
</div>
<div id="S2.SS3.SSS1.p6" class="ltx_para">
<p class="ltx_p">A sub-lexicon <math id="S2.SS3.SSS1.p6.m1" class="ltx_Math" alttext="\mathcal{L}" display="inline"><mi class="ltx_font_mathcaligraphic">ℒ</mi></math> defined by (<a href="#S2.E2" title="(2) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>) is a union of
morphemes as specified in Sect. <a href="#S2.SS2" title="2.2 Efficient Union of Sub-Lexicon Entries ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>.</p>
</div>
<div id="S2.SS3.SSS1.p7" class="ltx_para">
<table id="S2.E2" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E2.m1" class="ltx_Math" alttext="\mathcal{L}=J~{}\bigcup_{\mathcal{M}_{x}\in\mathcal{M}}~{}(~{}\mathcal{M}_{x}~%
{})" display="block"><mrow><mi class="ltx_font_mathcaligraphic">ℒ</mi><mo>=</mo><mrow><mpadded width="+3.3pt"><mi>J</mi></mpadded><mo>⁢</mo><mrow><mpadded width="+3.3pt"><munder><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><msub><mi class="ltx_font_mathcaligraphic">ℳ</mi><mi>x</mi></msub><mo>∈</mo><mi class="ltx_font_mathcaligraphic">ℳ</mi></mrow></munder></mpadded><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mpadded width="+3.3pt"><msub><mi class="ltx_font_mathcaligraphic">ℳ</mi><mi>x</mi></msub></mpadded><mo stretchy="false">)</mo></mrow></mrow></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p8" class="ltx_para">
<p class="ltx_p">E.g. the lexicon named <em class="ltx_emph">Root</em> consisting of <em class="ltx_emph">akku</em> and
<em class="ltx_emph">alku</em> with continuation class <em class="ltx_emph">N1b</em> becomes
<math id="S2.SS3.SSS1.p8.m1" class="ltx_Math" alttext="J_{Root}~{}(~{}a~{}k~{}k~{}u~{}J_{N1b}~{}|~{}a~{}l~{}k~{}u~{}J_{N1b}~{})" display="inline"><mrow><mpadded width="+3.3pt"><msub><mi>J</mi><mrow><mi>R</mi><mo>⁢</mo><mi>o</mi><mo>⁢</mo><mi>o</mi><mo>⁢</mo><mi>t</mi></mrow></msub></mpadded><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mpadded width="+3.3pt"><mi>a</mi></mpadded><mpadded width="+3.3pt"><mi>k</mi></mpadded><mpadded width="+3.3pt"><mi>k</mi></mpadded><mpadded width="+3.3pt"><mi>u</mi></mpadded><mpadded width="+3.3pt"><msub><mi>J</mi><mrow><mi>N</mi><mo>⁢</mo><mn>1</mn><mo>⁢</mo><mi>b</mi></mrow></msub></mpadded><mo rspace="5.8pt" stretchy="false">|</mo><mpadded width="+3.3pt"><mi>a</mi></mpadded><mpadded width="+3.3pt"><mi>l</mi></mpadded><mpadded width="+3.3pt"><mi>k</mi></mpadded><mpadded width="+3.3pt"><mi>u</mi></mpadded><mpadded width="+3.3pt"><msub><mi>J</mi><mrow><mi>N</mi><mo>⁢</mo><mn>1</mn><mo>⁢</mo><mi>b</mi></mrow></msub></mpadded><mo stretchy="false">)</mo></mrow></mrow></math>.</p>
</div>
<div id="S2.SS3.SSS1.p9" class="ltx_para">
<p class="ltx_p">We create a filter <math id="S2.SS3.SSS1.p9.m1" class="ltx_Math" alttext="\mathcal{F}" display="inline"><mi class="ltx_font_mathcaligraphic">ℱ</mi></math> defined by (<a href="#S2.E3" title="(3) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>) for legal
morpheme combinations by pairing up adjacent joiners.</p>
</div>
<div id="S2.SS3.SSS1.p10" class="ltx_para">
<table id="S2.E3" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E3.m1" class="ltx_Math" alttext="\mathcal{F}=\bigcup_{J_{x}\in J}~{}(~{}J_{x}~{}J_{x}~{})" display="block"><mrow><mi class="ltx_font_mathcaligraphic">ℱ</mi><mo>=</mo><mrow><mpadded width="+3.3pt"><munder><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><msub><mi>J</mi><mi>x</mi></msub><mo>∈</mo><mi>J</mi></mrow></munder></mpadded><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mrow><mpadded width="+3.3pt"><msub><mi>J</mi><mi>x</mi></msub></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><msub><mi>J</mi><mi>x</mi></msub></mpadded></mrow><mo stretchy="false">)</mo></mrow></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(3)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p11" class="ltx_para">
<p class="ltx_p">To account for the special starting lexicon and the special ending
lexicon, we define <math id="S2.SS3.SSS1.p11.m1" class="ltx_Math" alttext="J_{Root}\in J" display="inline"><mrow><msub><mi>J</mi><mrow><mi>R</mi><mo>⁢</mo><mi>o</mi><mo>⁢</mo><mi>o</mi><mo>⁢</mo><mi>t</mi></mrow></msub><mo>∈</mo><mi>J</mi></mrow></math> and <math id="S2.SS3.SSS1.p11.m2" class="ltx_Math" alttext="J_{\#}\notin J" display="inline"><mrow><msub><mi>J</mi><mi mathvariant="normal">#</mi></msub><mo>∉</mo><mi>J</mi></mrow></math>. The root
lexicon can be used in continuation classes as a target, e.g. for the
compounding mechanism, but the end lexicon is not available as a
lexicon name, so it is not part of the regular morphotax. To
accommodate this, we extend the filter definition to
<math id="S2.SS3.SSS1.p11.m3" class="ltx_Math" alttext="\mathcal{F}^{\prime}" display="inline"><msup><mi class="ltx_font_mathcaligraphic">ℱ</mi><mo>′</mo></msup></math> as in (<a href="#S2.E4" title="(4) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>).</p>
</div>
<div id="S2.SS3.SSS1.p12" class="ltx_para">
<table id="S2.E4" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E4.m1" class="ltx_Math" alttext="\mathcal{F}^{\prime}=J_{Root}~{}(~{}\Sigma^{\star}~{}\mathcal{F}~{})^{\star}~{%
}\Sigma^{\star}~{}J_{\#}" display="block"><mrow><msup><mi class="ltx_font_mathcaligraphic">ℱ</mi><mo>′</mo></msup><mo>=</mo><mrow><mpadded width="+3.3pt"><msub><mi>J</mi><mrow><mi>R</mi><mo>⁢</mo><mi>o</mi><mo>⁢</mo><mi>o</mi><mo>⁢</mo><mi>t</mi></mrow></msub></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><msup><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mrow><mpadded width="+3.3pt"><msup><mi mathvariant="normal">Σ</mi><mo>⋆</mo></msup></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><mi class="ltx_font_mathcaligraphic">ℱ</mi></mpadded></mrow><mo stretchy="false">)</mo></mrow><mo>⋆</mo></msup></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><msup><mi mathvariant="normal">Σ</mi><mo>⋆</mo></msup></mpadded><mo>⁢</mo><msub><mi>J</mi><mi mathvariant="normal">#</mi></msub></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(4)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p13" class="ltx_para">
<p class="ltx_p">This allows us to create the final transducer <math id="S2.SS3.SSS1.p13.m1" class="ltx_Math" alttext="\mathcal{R}" display="inline"><mi class="ltx_font_mathcaligraphic">ℛ</mi></math> with only
legal combinations of sub-lexicons by composition (<a href="#S2.E5" title="(5) ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">5</span></a>).</p>
</div>
<div id="S2.SS3.SSS1.p14" class="ltx_para">
<table id="S2.E5" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S2.E5.m1" class="ltx_Math" alttext="\mathcal{R}=\bigcup_{\mathcal{L}_{x}\in\mathcal{L}}~{}(~{}\mathcal{L}_{x}~{})^%
{\star}~{}\circ~{}\mathcal{F}^{\prime}" display="block"><mrow><mi class="ltx_font_mathcaligraphic">ℛ</mi><mo>=</mo><mrow><mpadded width="+3.3pt"><munder><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><msub><mi class="ltx_font_mathcaligraphic">ℒ</mi><mi>x</mi></msub><mo>∈</mo><mi class="ltx_font_mathcaligraphic">ℒ</mi></mrow></munder></mpadded><mrow><mpadded width="+3.3pt"><msup><mrow><mo rspace="5.8pt" stretchy="false">(</mo><mpadded width="+3.3pt"><msub><mi class="ltx_font_mathcaligraphic">ℒ</mi><mi>x</mi></msub></mpadded><mo stretchy="false">)</mo></mrow><mo>⋆</mo></msup></mpadded><mo rspace="5.8pt">∘</mo><msup><mi class="ltx_font_mathcaligraphic">ℱ</mi><mo>′</mo></msup></mrow></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(5)</span></td>
</tr>
</table>
</div>
<div id="S2.SS3.SSS1.p15" class="ltx_para">
<p class="ltx_p">E.g., for the sublexicons <em class="ltx_emph">Root</em>, <em class="ltx_emph">N1b</em>, <em class="ltx_emph">NounSg</em>, and
<em class="ltx_emph">Ennd</em> in Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>, and their entries
<em class="ltx_emph">akku</em> and <em class="ltx_emph">+sg+all:lle</em>, we get the disjunction of lexicons
<math id="S2.SS3.SSS1.p15.m1" class="ltx_Math" alttext="L^{\star}" display="inline"><msup><mi>L</mi><mo>⋆</mo></msup></math>, which we filter using <math id="S2.SS3.SSS1.p15.m2" class="ltx_Math" alttext="L^{\star}\circ F^{\prime}" display="inline"><mrow><msup><mi>L</mi><mo>⋆</mo></msup><mo>∘</mo><msup><mi>F</mi><mo>′</mo></msup></mrow></math> as
shown in Table <a href="#S2.T2" title="Table 2 ‣ 2.3.1 Combining sublexicons using standard finite-state transducer algebra. ‣ 2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>.</p>
</div>
<figure id="S2.T2" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_align_middle">
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left"><math id="S2.T2.m1" class="ltx_Math" alttext="L^{\star}=" display="inline"><mrow><msup><mi mathsize="90%">L</mi><mo mathsize="90%" stretchy="false">⋆</mo></msup><mo mathsize="90%" stretchy="false">=</mo><mi></mi></mrow></math></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m2" class="ltx_Math" alttext="(~{}~{}J_{Root}~{}a~{}k~{}k~{}u~{}J_{N1b}~{}~{}|~{}~{}J_{N1b}~{}J_{NounSg}~{}~%
{}|" display="inline"><mrow><mo maxsize="90%" minsize="90%" rspace="9.1pt">(</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">R</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">t</mi></mrow></msub></mpadded><mpadded width="+3.3pt"><mi mathsize="90%">a</mi></mpadded><mpadded width="+3.3pt"><mi mathsize="90%">k</mi></mpadded><mpadded width="+3.3pt"><mi mathsize="90%">k</mi></mpadded><mpadded width="+3.3pt"><mi mathsize="90%">u</mi></mpadded><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mn mathsize="90%">1</mn><mo>⁢</mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="9.1pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mn mathsize="90%">1</mn><mo>⁢</mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">u</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">S</mi><mo>⁢</mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%">|</mo></mrow></math></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td"></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m3" class="ltx_Math" alttext="J_{NounSg}~{}\!+\!sg\!:\!l~{}\!+all\!:\!l~{}\varepsilon\!:\!e~{}J_{Ennd}~{}~{}%
|~{}~{}J_{Ennd}~{}J_{\#}~{}~{})^{\star}" display="inline"><mrow><mpadded width="+1.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">u</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">S</mi><mo>⁢</mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">+</mo><mi mathsize="90%">s</mi><mpadded width="-1.7pt"><mi mathsize="90%">g</mi></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mpadded width="+1.6pt"><mi mathsize="90%">l</mi></mpadded><mo mathsize="90%" stretchy="false">+</mo><mi mathsize="90%">a</mi><mi mathsize="90%">l</mi><mpadded width="-1.7pt"><mi mathsize="90%">l</mi></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mpadded width="+3.3pt"><mi mathsize="90%">l</mi></mpadded><mpadded width="-1.7pt"><mi mathsize="90%">ε</mi></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mpadded width="+3.3pt"><mi mathsize="90%">e</mi></mpadded><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="9.1pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mi mathsize="90%" mathvariant="normal">#</mi></msub></mpadded><mo maxsize="90%" minsize="90%">)</mo><msup><mi></mi><mo mathsize="90%" stretchy="false">⋆</mo></msup></mrow></math></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left"><math id="S2.T2.m4" class="ltx_Math" alttext="F=" display="inline"><mrow><mi mathsize="90%">F</mi><mo mathsize="90%" stretchy="false">=</mo><mi></mi></mrow></math></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m5" class="ltx_Math" alttext="J_{Root}~{}J_{Root}~{}|~{}J_{N1b}~{}J_{N1b}~{}|~{}J_{NounSg}~{}J_{NounSg}~{}|~%
{}J_{Ennd}~{}J_{Ennd}" display="inline"><mrow><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">R</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">t</mi></mrow></msub></mpadded><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">R</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">t</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="5.8pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mn mathsize="90%">1</mn><mo>⁢</mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mn mathsize="90%">1</mn><mo>⁢</mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="5.8pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">u</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">S</mi><mo>⁢</mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">u</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">S</mi><mo>⁢</mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mo maxsize="90%" minsize="90%" rspace="5.8pt">|</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">d</mi></mrow></msub></mrow></math></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_left"><math id="S2.T2.m6" class="ltx_Math" alttext="L^{\star}\circ F^{\prime}=" display="inline"><mrow><mrow><msup><mi mathsize="90%">L</mi><mo mathsize="90%" stretchy="false">⋆</mo></msup><mo mathsize="90%" stretchy="false">∘</mo><msup><mi mathsize="90%">F</mi><mo mathsize="90%" stretchy="false">′</mo></msup></mrow><mo mathsize="90%" stretchy="false">=</mo><mi></mi></mrow></math></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m7" class="ltx_Math" alttext="J_{Root}~{}a~{}k~{}k~{}u~{}J_{N1b}~{}~{}J_{N1b}~{}J_{NounSg}" display="inline"><mrow><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">R</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">t</mi></mrow></msub></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><mi mathsize="90%">a</mi></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><mi mathsize="90%">k</mi></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><mi mathsize="90%">k</mi></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><mi mathsize="90%">u</mi></mpadded><mo>⁢</mo><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mn mathsize="90%">1</mn><mo>⁢</mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mn mathsize="90%">1</mn><mo>⁢</mo><mi mathsize="90%">b</mi></mrow></msub></mpadded><mo>⁢</mo><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">u</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">S</mi><mo>⁢</mo><mi mathsize="90%">g</mi></mrow></msub></mrow></math></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td"></td>
<td class="ltx_td ltx_align_left"><math id="S2.T2.m8" class="ltx_Math" alttext="J_{NounSg}~{}\!+\!sg\!:\!l~{}\!+\!all\!:\!l~{}\varepsilon\!:\!e~{}J_{Ennd}~{}~%
{}J_{Ennd}~{}J_{\#}" display="inline"><mrow><mrow><mpadded width="+1.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">N</mi><mo>⁢</mo><mi mathsize="90%">o</mi><mo>⁢</mo><mi mathsize="90%">u</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">S</mi><mo>⁢</mo><mi mathsize="90%">g</mi></mrow></msub></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">+</mo><mrow><mi mathsize="90%">s</mi><mo>⁢</mo><mpadded width="-1.7pt"><mi mathsize="90%">g</mi></mpadded></mrow></mrow><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mrow><mpadded width="+1.6pt"><mi mathsize="90%">l</mi></mpadded><mo mathsize="90%" rspace="0.8pt" stretchy="false">+</mo><mrow><mi mathsize="90%">a</mi><mo>⁢</mo><mi mathsize="90%">l</mi><mo>⁢</mo><mpadded width="-1.7pt"><mi mathsize="90%">l</mi></mpadded></mrow></mrow><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mrow><mpadded width="+3.3pt"><mi mathsize="90%">l</mi></mpadded><mo>⁢</mo><mpadded width="-1.7pt"><mi mathsize="90%">ε</mi></mpadded></mrow><mo mathsize="90%" rspace="0.8pt" stretchy="false">:</mo><mrow><mpadded width="+3.3pt"><mi mathsize="90%">e</mi></mpadded><mo>⁢</mo><mpadded width="+6.6pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><mo>⁢</mo><mpadded width="+3.3pt"><msub><mi mathsize="90%">J</mi><mrow><mi mathsize="90%">E</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">n</mi><mo>⁢</mo><mi mathsize="90%">d</mi></mrow></msub></mpadded><mo>⁢</mo><msub><mi mathsize="90%">J</mi><mi mathsize="90%" mathvariant="normal">#</mi></msub></mrow></mrow></math></td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 2: </span>Filtering a single path in <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> with a
morphotax filter.</figcaption>
</figure>
<div id="S2.SS3.SSS1.p16" class="ltx_para">
<p class="ltx_p">Finally, all the symbols in <math id="S2.SS3.SSS1.p16.m1" class="ltx_Math" alttext="\Gamma" display="inline"><mi mathvariant="normal">Γ</mi></math> are removed. While this is
trivial, it introduces some indeterminism in the final transducer,
which would otherwise have been introduced by building direct epsilon
arcs. Its influence on the performance is further detailed in
Sect. <a href="#S6" title="6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a>.</p>
</div>
<div id="S2.SS3.SSS1.p17" class="ltx_para">
<p class="ltx_p">According to our experiments, attaching weights to each entry works
without modification of the lexicon compilation method.</p>
</div>
</section>
</section>
</section>
<section id="S3" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">3 </span><span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
</h2>

<div id="S3.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_italic">Two-level rules</span> are parallel constraints on symbol-pair
strings governing the realizations of lexical word-forms as
corresponding surface-strings. They were introduced by Koskenniemi
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx1" title="" class="ltx_ref">1983</a>]</cite> and have been used for modeling the phonology of
numerous natural languages. <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> is an accurate and
efficient open-source two-level rule compiler. It compiles grammars of
two-level rules into sets of finite-state transducers. The rules are
represented as regular-expression operations closely resembling
familiar phonological re-write rules both to appearance and semantics.</p>
</div>
<div id="S3.p2" class="ltx_para">
<p class="ltx_p">The most widely known two-level rule-compiler existing at the moment
is the <em class="ltx_emph">Xerox Two-Level Rule Compiler</em> (later <span class="ltx_text ltx_font_smallcaps">TwolC</span>)
presented by Karttunen el al. <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx3" title="" class="ltx_ref">1992</a>]</cite>. It is proprietary
software, which imposes some limitations upon its use. The
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> compiler has been designed to be an open-source
substitute for the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler and has a syntax and
semantics very similar to those of the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler. Hence
existing two-level grammars, designed to compile under the
<span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler, require very few modifications to compile
correctly under <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>.</p>
</div>
<div id="S3.p3" class="ltx_para">
<p class="ltx_p">Besides being an open-source program, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> also has
other benefits compared with the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler. Resolution
of rule-conflicts is an important part of compiling two-level
grammars. We know of at least one instance, where the <span class="ltx_text ltx_font_smallcaps">TwolC</span>
compiler resolves rule-conflicts in an incorrect way (see
Sect. <a href="#S3.SS2.SSS2" title="3.2.2 Resolving rule-conflicts. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2.2</span></a>). It also compiles epenthesis
rules in a way, which denies the grammar-writer the full expressive
power of two-level rules (see Sect. <a href="#S3.SS2.SSS1" title="3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2.1</span></a>). In
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> we have been able to remedy these shortcomings by
compiling the rules with the <span class="ltx_text ltx_font_italic">generalized
restriction-operation</span> (later <span class="ltx_text ltx_font_italic">GR-operation</span>), presented by
Yli-Jyrä and Koskenniemi <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx11" title="" class="ltx_ref">2006</a>]</cite>. It allows compilation of
two-level rules in a uniform way and makes conflict-resolution easy to
tackle, while still permitting efficient compilation.</p>
</div>
<div id="S3.p4" class="ltx_para">
<p class="ltx_p">In Sect. <a href="#S3.SS1" title="3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.1</span></a>, we demonstrate the syntax and semantics
of a two-level grammar-file using a small example from Finnish
morphology. The example grammar maps the lexical forms given by the
example lexicon in Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>, presented in
Sect. <a href="#S2" title="2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>, into surface-forms.</p>
</div>
<div id="S3.p5" class="ltx_para">
<p class="ltx_p">It is not possible to demonstrate all features of <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
in this article, but we try to highlight the few most important
differences to show that it is easy to migrate from the <span class="ltx_text ltx_font_smallcaps">TwolC</span>
compiler to <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>.</p>
</div>
<div id="S3.p6" class="ltx_para">
<p class="ltx_p">We use the GR-operation to compile the grammar-rules in
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>. In Sect. <a href="#S3.SS2.SSS1" title="3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2.1</span></a>, we explain how the
different types of two-level rules are compiled. Rule-conflicts and
their resolution are covered in Sect. <a href="#S3.SS2.SSS2" title="3.2.2 Resolving rule-conflicts. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2.2</span></a>.</p>
</div>
<section id="S3.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.1 </span>An Example Grammar</h3>

<div id="S3.SS1.p1" class="ltx_para">
<p class="ltx_p">An input-file for <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> consists of five parts:
<span class="ltx_text ltx_font_italic">the Alphabet</span>, <span class="ltx_text ltx_font_italic">the Rule-variables</span>, <span class="ltx_text ltx_font_italic">the Sets</span>,
<span class="ltx_text ltx_font_italic">the Definitions</span> and <span class="ltx_text ltx_font_italic">the Rules</span>. The file-format has
been modeled on the format used by the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler, and
all parts of the grammar are present also in the <span class="ltx_text ltx_font_smallcaps">TwolC</span>
compiler except for the part declaring rule-variables. There are a few
other differences, as well, most of which we will mention below. A
complete list of known differences can be found in the
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
documentation<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">10</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">10</sup><span class="ltx_ERROR undefined">\url</span>https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstTwolC</span></span></span>.</p>
</div>
<figure id="S3.T3" class="ltx_table"><pre class="ltx_verbatim ltx_font_typewriter">
Alphabet

   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö
   a b c d e f g h i j k l m n o p q r s t u v w x y z å ä ö
   %+AV%+:0 %+AV%-:0 %+AVA:0 %+AVD:0 %+AVH:0 %+AVM:0
   %~A:a %~A:ä
   %~K:k %~K:0 %~K:v
   %~P:p %~P:m ;

Rule-variables

   Cm Cs Cw ;

Sets

   Gradations = %+AV%+ %+AV%- %+AVA %+AVD %+AVH %+AVM ;
   BackVowels = a o u A O U ;
   UpperCaseVowels   = A E I O U Y Å Ä Ö ;
   LowerCaseVowels = a e i o u y å ä ö ;

   Vowels = UpperCaseVowels LowerCaseVowels ;

Definitions

   AlphaSeq    = [ \Gradations: ]* ;
   NonVowelSeq = [ \:Vowels ]* ;

Rules

   "~K:0 Gradation"
   %~K:0 &lt;=&gt; _  AlphaSeq Gradations:0 AlphaSeq %+AV%-:0 ;

   "%~K:v and %~P:m Gradation"
   Cs:Cw &lt;=&gt; _ AlphaSeq Cm:0 AlphaSeq %+AV%-:0 ;
      where Cs in (   %~K   %~P )
            Cw in (     v     m )
            Cm in ( %+AVM %+AVH ) matched ;

   "Vowel Harmony"
   %~A:a &lt;=&gt; :BackVowels NonVowelSeq _ ;

</pre>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 3: </span>An example <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> grammar governing the surface
realizations of the forms presented in the example lexicon in
Table <a href="#S2.T1" title="Table 1 ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>.</figcaption>
</figure>
<section id="S3.SS1.SSS1" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.1 </span>The Alphabet.</h4>

<div id="S3.SS1.SSS1.p1" class="ltx_para">
<p class="ltx_p">The alphabet of a two-level grammar contains all lexical symbols
specified in the <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> grammar together with their
possible surface realizations. In the example grammar in
Table <a href="#S3.T3" title="Table 3 ‣ 3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>, the alphabet contains all
letters used in Finnish words together with the vowel-harmony
archphoneme <code class="ltx_verbatim ltx_font_typewriter">~A</code>, the gradation morphophonemes <code class="ltx_verbatim ltx_font_typewriter">~K</code> and
<code class="ltx_verbatim ltx_font_typewriter">~P</code>, as well as, the gradation-markers <code class="ltx_verbatim ltx_font_typewriter">+AV+</code>, <code class="ltx_verbatim ltx_font_typewriter">+AV-</code>,
<code class="ltx_verbatim ltx_font_typewriter">+AVA</code>, <code class="ltx_verbatim ltx_font_typewriter">+AVD</code>, <code class="ltx_verbatim ltx_font_typewriter">+AVH</code>, <code class="ltx_verbatim ltx_font_typewriter">+AVM</code>.</p>
</div>
<div id="S3.SS1.SSS1.p2" class="ltx_para">
<p class="ltx_p">All symbols in the grammar may be arbitrary strings of UTF-8
characters, but characters like <code class="ltx_verbatim ltx_font_typewriter">+</code>, <code class="ltx_verbatim ltx_font_typewriter">~</code> or white-space,
which bear special meanings for the compiler need to be escaped using
the escape-character <code class="ltx_verbatim ltx_font_typewriter">%</code>.</p>
</div>
<div id="S3.SS1.SSS1.p3" class="ltx_para">
<p class="ltx_p">The letters in the example-grammar of
Table <a href="#S3.T3" title="Table 3 ‣ 3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a> always correspond to
themselves on the surface. The gradation-markers always correspond to
zero and the archphoneme <code class="ltx_verbatim ltx_font_typewriter">~A</code> and the morphophonemes <code class="ltx_verbatim ltx_font_typewriter">~K</code>
and <code class="ltx_verbatim ltx_font_typewriter">~P</code> have various surface-realizations. E.g. <code class="ltx_verbatim ltx_font_typewriter">~A</code> is
always realized as either <code class="ltx_verbatim ltx_font_typewriter">a</code> or <span class="ltx_text ltx_font_typewriter">ä</span>.</p>
</div>
<div id="S3.SS1.SSS1.p4" class="ltx_para">
<p class="ltx_p">Each valid pair of a lexical symbol and its surface-correspondence has
to be listed in the alphabet. This differs from the <span class="ltx_text ltx_font_smallcaps">TwolC</span>
compiler, where pairs may be omitted from the alphabet, if they are
identity-pairs or are already constrained by some rule. Forcing the
grammar-writer to declare all symbol-pairs, may result in some extra
work, but it also prevents the creation of inadvertent pairs.</p>
</div>
<div id="S3.SS1.SSS1.p5" class="ltx_para">
<p class="ltx_p">Declaring all symbol-pairs in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> is mandatory, as we
have not yet implemented an <span class="ltx_text ltx_font_italic">other-symbol</span> like the one in
Xerox <span class="ltx_text ltx_font_smallcaps">TwolC</span> <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx3" title="" class="ltx_ref">1992</a>]</cite> using the <span class="ltx_text ltx_font_smallcaps">HFST</span>
interface. Besides the grammar-formalism, this also affects the
compile-time for rules, which becomes more dependent on the number of
symbol-pairs in the grammar.</p>
</div>
</section>
<section id="S3.SS1.SSS2" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.2 </span>The Rule-variables.</h4>

<div id="S3.SS1.SSS2.p1" class="ltx_para">
<p class="ltx_p">Like the <span class="ltx_text ltx_font_smallcaps">Xerox</span> compiler, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> supports
defining a set of similar two-level rules using a rule-schema with
variables. During the compilation of the grammar, each schema is
compiled into actual two-level rules, by substituting the variables
with the values specified for them. All rule-variables, which are used
in the grammar, need to be declared in the Rule-variables section.</p>
</div>
</section>
<section id="S3.SS1.SSS3" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.3 </span>The Sets.</h4>

<div id="S3.SS1.SSS3.p1" class="ltx_para">
<p class="ltx_p">It is often convenient to name some classes of symbols, which are used
in many rules. E.g. the class <code class="ltx_verbatim ltx_font_typewriter">BackVowels</code> in the example-grammar
in Table <a href="#S3.T3" title="Table 3 ‣ 3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>, which contains all
vowel-segments used in the grammar. The sets in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
and <span class="ltx_text ltx_font_smallcaps">TwolC</span> are very similar constructs.</p>
</div>
<div id="S3.SS1.SSS3.p2" class="ltx_para">
<p class="ltx_p">In <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>, the Cartesian product of sets, or a set and a
symbol, is always limited to the set of symbol-pairs declared in the
alphabet. E.g. the equivalent expressions <code class="ltx_verbatim ltx_font_typewriter">BackVowel:BackVowel</code>
and <code class="ltx_verbatim ltx_font_typewriter">BackVowel</code> will only accept the pairs <code class="ltx_verbatim ltx_font_typewriter">a:a</code>,
<code class="ltx_verbatim ltx_font_typewriter">o:o</code>, <code class="ltx_verbatim ltx_font_typewriter">u:u</code>, <code class="ltx_verbatim ltx_font_typewriter">A:A</code>, <code class="ltx_verbatim ltx_font_typewriter">O:O</code> and
<code class="ltx_verbatim ltx_font_typewriter">U:U</code>. Although it is conceivable, that they would accept
e.g. the pairs <code class="ltx_verbatim ltx_font_typewriter">a:U</code> and <code class="ltx_verbatim ltx_font_typewriter">A:O</code>, they will not, since the
pairs have not been declared.</p>
</div>
<div id="S3.SS1.SSS3.p3" class="ltx_para">
<p class="ltx_p">All sets have to be declared in the sets section of the grammar. Of
the five sets we have defined in the example grammar, the first four
are defined directly using a symbol sequence. The fifth set
<code class="ltx_verbatim ltx_font_typewriter">Vowels</code> is defined as the union of the sets <code class="ltx_verbatim ltx_font_typewriter">SmallVowels</code>
and <code class="ltx_verbatim ltx_font_typewriter">BigVowels</code>.</p>
</div>
</section>
<section id="S3.SS1.SSS4" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.4 </span>The Definitions.</h4>

<div id="S3.SS1.SSS4.p1" class="ltx_para">
<p class="ltx_p">Like character-sets, also regular expressions may be stored under a
name and accessed later using that name. Named regular expressions are
called definitions and may be used freely in the rules. Sets and
previous definitions can be used in the definition of a new
definition. The definitions in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and <span class="ltx_text ltx_font_smallcaps">TwolC</span>
are identical.</p>
</div>
</section>
<section id="S3.SS1.SSS5" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.1.5 </span>The Rules.</h4>

<div id="S3.SS1.SSS5.p1" class="ltx_para">
<p class="ltx_p">A two-level grammar constrains the surface-realizations of lexical
forms. The constraints are given as two-level rules, whose joint
effect determines the set of valid correspondences for each lexical
form. Each of the rules governs one realization of a lexical symbol in
a context given by a regular expression of pairs of a lexical and
surface symbol.</p>
</div>
<div id="S3.SS1.SSS5.p2" class="ltx_para">
<p class="ltx_p">The syntax and semantics of rules in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and
<span class="ltx_text ltx_font_smallcaps">TwolC</span> are very
similar<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">11</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">11</sup><span class="ltx_ERROR undefined">\url</span>http://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/fssyntax.html</span></span></span>.
Except for <span class="ltx_text ltx_font_italic">surface-restrictions</span> concerning epsilon,
i.e. epenthesis rules, the rules also work the same way.</p>
</div>
<div id="S3.SS1.SSS5.p3" class="ltx_para">
<p class="ltx_p">An example of a rule is the rule governing vowel-harmony in our example grammar</p>
<pre class="ltx_verbatim ltx_font_typewriter">
   "Vowel Harmony"
   %~A:a &lt;=&gt; :BackVowels NonVowelSeq _ ;
</pre>
<p class="ltx_p">It states that the archphoneme <code class="ltx_verbatim ltx_font_typewriter">~A</code> has to be realized as
<code class="ltx_verbatim ltx_font_typewriter">a</code>, if the surface-vowel immediately preceding it is a
back-vowel. It also disallows the pair <code class="ltx_verbatim ltx_font_typewriter">~A:a</code> in all other
contexts.</p>
</div>
<div id="S3.SS1.SSS5.p4" class="ltx_para">
<p class="ltx_p">The rule accepts the first correspondence in Table <a href="#S3.T4" title="Table 4 ‣ 3.1.5 The Rules. ‣ 3.1 An Example Grammar ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a> since
the vowel preceding <code class="ltx_verbatim ltx_font_typewriter">~A</code> is <code class="ltx_verbatim ltx_font_typewriter">y</code>, which is not a
back-vowel. It disallows both of the latter correspondences. In the
second correspondence <code class="ltx_verbatim ltx_font_typewriter">~A</code> is realized as <code class="ltx_verbatim ltx_font_typewriter">a</code>, even though
the preceding surface-vowel is not a back-vowel. This violates the
<code class="ltx_verbatim ltx_font_typewriter">=&gt;</code> direction of the rule. In the third correspondence,
<code class="ltx_verbatim ltx_font_typewriter">~A</code> is realized as <span class="ltx_text ltx_font_typewriter">ä</span>, but the preceding
surface-vowel is <code class="ltx_verbatim ltx_font_typewriter">u</code>, which is a back-vowel. This violates the
<code class="ltx_verbatim ltx_font_typewriter">&lt;=</code> direction of the rule.</p>
</div>
<figure id="S3.T4" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_align_middle">
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<td class="ltx_td ltx_align_right ltx_border_l ltx_border_t">
<br class="ltx_break"><code class="ltx_verbatim ltx_font_typewriter">k</code>
</td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~K</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~A</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~K</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~A</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">u</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">m</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~P</code></td>
<td class="ltx_td ltx_align_right ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">u</code></td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t"><code class="ltx_verbatim ltx_font_typewriter">~A</code></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_right ltx_border_l">
<br class="ltx_break"><code class="ltx_verbatim ltx_font_typewriter">k</code>
</td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_r"><span class="ltx_text ltx_font_typewriter">ä<code class="ltx_verbatim"></code></span></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">y</code></td>
<td class="ltx_td ltx_align_right ltx_border_r"><code class="ltx_verbatim ltx_font_typewriter">a</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">k</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">u</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">m</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">p</code></td>
<td class="ltx_td ltx_align_right"><code class="ltx_verbatim ltx_font_typewriter">u</code></td>
<td class="ltx_td ltx_align_right ltx_border_r"><span class="ltx_text ltx_font_typewriter">ä<code class="ltx_verbatim"></code></span></td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_right ltx_border_l ltx_border_t">
<br class="ltx_break">
</td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_border_t"></td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 4: </span>Symbol-pair correspondences for demonstrating the
vowel-harmony rules.</figcaption>
</figure>
<div id="S3.SS1.SSS5.p5" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> allows a set of rules to be defined using
variables or by giving a set of rule-centers. E.g. the rule which
defines the basic constraint of gradation in our example grammar is a
rule with three variables: <code class="ltx_verbatim ltx_font_typewriter">Cs</code>, <code class="ltx_verbatim ltx_font_typewriter">Cv</code> and <code class="ltx_verbatim ltx_font_typewriter">Cm</code>.</p>
<pre class="ltx_verbatim ltx_font_typewriter">
   "%~K:v and %~P:m Gradation"
   Cs:Cw &lt;=&gt; _ AlphaSeq Cm:0 AlphaSeq %+AV%-:0 ;
      where Cs in (   %~K   %~P )
            Cw in (     v     m )
            Cm in ( %+AVM %+AVH ) matched ;
</pre>
<p class="ltx_p">Like ordinary alphabet-symbols, variables may be used both in the
center of a rule and in its contexts.
</p>
</div>
<div id="S3.SS1.SSS5.p6" class="ltx_para">
<p class="ltx_p">When a rule with variables is compiled, it is split into
sub-rules. These are obtained by substituting real alphabet symbols
for the variables. The possible values of variables are listed in the
where-clause following the rule.</p>
</div>
</section>
</section>
<section id="S3.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.2 </span>Compiling the Rules and Resolving Rule-Conflicts</h3>

<div id="S3.SS2.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> compiles two-level rules, given as regular
expressions of pairs, into finite-state transducers. All two-level
rules may be constructed from simple surface-requirements,
context-restrictions and surface-prohibitions. The compilation reduces
the two-sided rules and rules with variables into combinations of such
simple constructions, or subrules.</p>
</div>
<div id="S3.SS2.p2" class="ltx_para">
<p class="ltx_p">After compilation, the subrules are intersected, so that finally
equally many rule-transducers are produced as there were original
two-level rules. Intersecting the subrules of the two-level grammar
rules takes up a considerable portion of the compile-time of the
grammar.</p>
</div>
<div id="S3.SS2.p3" class="ltx_para">
<p class="ltx_p">Compilation of the rules is preceded by a phase called
conflict-resolution, which modifies rule-contexts in order to prevent
harmful interactions between the rules. After conflict-resolution the
modified rule-set may be compiled as usual.
</p>
</div>
<div id="S3.SS2.p4" class="ltx_para">
<p class="ltx_p">We use the GR-operation of Yli-Jyrä and Koskenniemi
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx11" title="" class="ltx_ref">2006</a>]</cite> to compile rules. Both compiling rules and
conflict-resolution is simplified using the operation.</p>
</div>
<div id="S3.SS2.p5" class="ltx_para">
<p class="ltx_p">The compilation in <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> differs from <span class="ltx_text ltx_font_smallcaps">TwolC</span>
when epenthesis rules are compiled. As Yli-Jyrä and Koskenniemi
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx11" title="" class="ltx_ref">2006</a>]</cite> point out, epenthesis rules may be compiled as any
other surface-requirement rules using the GR-operation. This increases
the expressive power of the two-level grammar as explained below.</p>
</div>
<div id="S3.SS2.p6" class="ltx_para">
<p class="ltx_p">A general restriction of the pair-alphabet <math id="S3.SS2.p6.m1" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math> is an expression
<math id="S3.SS2.p6.m2" class="ltx_Math" alttext="W\overset{n\diamond}{\Rightarrow}W^{\prime}" display="inline"><mrow><mi>W</mi><mo>⁢</mo><mover accent="true"><mo>⇒</mo><mrow><mi>n</mi><mo>⋄</mo></mrow></mover><mo>⁢</mo><msup><mi>W</mi><mo>′</mo></msup></mrow></math>, where the precondition <math id="S3.SS2.p6.m3" class="ltx_Math" alttext="W" display="inline"><mi>W</mi></math> and
postcondition <math id="S3.SS2.p6.m4" class="ltx_Math" alttext="W^{\prime}" display="inline"><msup><mi>W</mi><mo>′</mo></msup></math> are unions of expressions of the form <math id="S3.SS2.p6.m5" class="ltx_Math" alttext="V_{1}\diamond V_{2}\ \diamond\text{ ... }\diamond\ V_{n}\subset\Sigma^{*}(%
\diamond\ \Sigma^{*})^{n}" display="inline"><mrow><mrow><msub><mi>V</mi><mn>1</mn></msub><mo>⋄</mo><mpadded width="+5pt"><msub><mi>V</mi><mn>2</mn></msub></mpadded><mo>⋄</mo><mtext> … </mtext><mo>⋄</mo><msub><mi>V</mi><mi>n</mi></msub></mrow><mo>⊂</mo><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⁢</mo><msup><mrow><mo stretchy="false">(</mo><mrow><mo rspace="7.5pt">⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo stretchy="false">)</mo></mrow><mi>n</mi></msup></mrow></mrow></math>, where <math id="S3.SS2.p6.m6" class="ltx_Math" alttext="\diamond\notin\Sigma" display="inline"><mrow><mo>⋄</mo><mo>∉</mo><mi mathvariant="normal">Σ</mi></mrow></math> is a special
marker-symbol and each <math id="S3.SS2.p6.m7" class="ltx_Math" alttext="V_{i}" display="inline"><msub><mi>V</mi><mi>i</mi></msub></math> is a regular language of the alphabet
<math id="S3.SS2.p6.m8" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math>. Such an expression is compiled into a regular expression
using the GR-operation as in (<a href="#S3.E6" title="(6) ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a>).</p>
</div>
<div id="S3.SS2.p7" class="ltx_para">
<table id="S3.E6" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E6.m1" class="ltx_Math" alttext="\Sigma^{*}-\text{delete}_{\diamond}(W-W^{\prime})" display="block"><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>-</mo><mrow><msub><mtext>delete</mtext><mo>⋄</mo></msub><mo>⁢</mo><mrow><mo stretchy="false">(</mo><mrow><mi>W</mi><mo>-</mo><msup><mi>W</mi><mo>′</mo></msup></mrow><mo stretchy="false">)</mo></mrow></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(6)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.p8" class="ltx_para">
<p class="ltx_p">The operation <math id="S3.SS2.p8.m1" class="ltx_Math" alttext="\text{delete}_{\diamond}" display="inline"><msub><mtext>delete</mtext><mo>⋄</mo></msub></math> in (<a href="#S3.E6" title="(6) ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a>) rewrites each
marker-symbol <math id="S3.SS2.p8.m2" class="ltx_Math" alttext="\diamond" display="inline"><mo>⋄</mo></math> into epsilon and leaves all other symbols
intact.</p>
</div>
<div id="S3.SS2.p9" class="ltx_para">
<p class="ltx_p">We do not need the full expressive power of the GR-operation. Instead
we use a restricted version <math id="S3.SS2.p9.m1" class="ltx_Math" alttext="W\overset{2\diamond}{\Rightarrow}W^{\prime}" display="inline"><mrow><mi>W</mi><mo>⁢</mo><mover accent="true"><mo>⇒</mo><mrow><mn>2</mn><mo>⋄</mo></mrow></mover><mo>⁢</mo><msup><mi>W</mi><mo>′</mo></msup></mrow></math>,
which is limited to compiling rules with one center and a number of
contexts with a right and a left part. Hence we operate on
preconditions and postconditions with two diamonds,
i.e. <math id="S3.SS2.p9.m2" class="ltx_Math" alttext="W,W^{\prime}\subseteq\Sigma^{*}\diamond\Sigma^{*}\diamond\Sigma^{*}" display="inline"><mrow><mrow><mi>W</mi><mo>,</mo><msup><mi>W</mi><mo>′</mo></msup></mrow><mo>⊆</mo><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow></mrow></math>.</p>
</div>
<div id="S3.SS2.p10" class="ltx_para">
<p class="ltx_p">We discuss compiling one rule first and then conflict-resolution,
although logically conflict-resolution is done first and then the
rules are compiled. This is easier to explain, because
conflict-resolution is highly dependent on the way the rules are
compiled.</p>
</div>
<section id="S3.SS2.SSS1" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.2.1 </span>Compiling one rule.</h4>

<div id="S3.SS2.SSS1.p1" class="ltx_para">
<p class="ltx_p">Yli-Jyrä and Koskenniemi <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx11" title="" class="ltx_ref">2006</a>]</cite> explain how ordinary
two-level rules can be compiled using the GR-operation. We use slight
variations of the same methods.</p>
</div>
<div id="S3.SS2.SSS1.p2" class="ltx_para">
<p class="ltx_p">Surface-requirement rules and context-restriction rules need to be
compiled in different ways. Surface-prohibition rules can be compiled
in a similar manner as surface-requirement rules and double-sided
rules are compiled, by intersecting the two directions of the rule.</p>
</div>
<div id="S3.SS2.SSS1.p3" class="ltx_para">
<p class="ltx_p">The general restriction corresponding to the context-restriction rule
<math id="S3.SS2.SSS1.p3.m1" class="ltx_Math" alttext="a\text{:}b\Rightarrow\bigcup_{i=0}^{n}L_{i}\ \_\ R_{i}" display="inline"><mrow><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>b</mi></mrow><mo>⇒</mo><mrow><msubsup><mo largeop="true" mathsize="160%" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mrow><mpadded width="+5pt"><msub><mi>L</mi><mi>i</mi></msub></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo>⁢</mo><msub><mi>R</mi><mi>i</mi></msub></mrow></mrow></mrow></math> is given by
(<a href="#S3.E7" title="(7) ‣ 3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">7</span></a>).</p>
</div>
<div id="S3.SS2.SSS1.p4" class="ltx_para">
<table id="S3.E7" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E7.m1" class="ltx_Math" alttext="\Sigma^{*}\diamond a\text{:}b\diamond\Sigma^{*}\overset{2\diamond}{\Rightarrow%
}\bigcup_{i=0}^{n}L_{i}\diamond\Sigma^{*}\diamond R_{i}" display="block"><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>b</mi></mrow><mo>⋄</mo><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⁢</mo><mover accent="true"><mo>⇒</mo><mrow><mn>2</mn><mo>⋄</mo></mrow></mover><mo>⁢</mo><mrow><munderover><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></munderover><msub><mi>L</mi><mi>i</mi></msub></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msub><mi>R</mi><mi>i</mi></msub></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(7)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p5" class="ltx_para">
<p class="ltx_p">The surface-requirement rule requires an auxiliary definition. We
define the inverse projection <math id="S3.SS2.SSS1.p5.m1" class="ltx_Math" alttext="[x\text{:}]" display="inline"><mrow><mo stretchy="false">[</mo><mrow><mi>x</mi><mo>⁢</mo><mtext>:</mtext></mrow><mo stretchy="false">]</mo></mrow></math> of the input-symbol <math id="S3.SS2.SSS1.p5.m2" class="ltx_Math" alttext="x" display="inline"><mi>x</mi></math>
using (<a href="#S3.E8" title="(8) ‣ 3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">8</span></a>). Here <math id="S3.SS2.SSS1.p5.m3" class="ltx_Math" alttext="x" display="inline"><mi>x</mi></math> may be any of the
input-symbols of pairs in <math id="S3.SS2.SSS1.p5.m4" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math>, including epsilon.</p>
</div>
<div id="S3.SS2.SSS1.p6" class="ltx_para">
<table id="S3.E8" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E8.m1" class="ltx_Math" alttext="[x\text{:}]=\{x\text{:}y\ |\ x\text{:}y\in\Sigma\}" display="block"><mrow><mrow><mo stretchy="false">[</mo><mrow><mi>x</mi><mo>⁢</mo><mtext>:</mtext></mrow><mo stretchy="false">]</mo></mrow><mo>=</mo><mrow><mo stretchy="false">{</mo><mrow><mi>x</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mpadded width="+5pt"><mi>y</mi></mpadded></mrow><mo rspace="7.5pt" stretchy="false">|</mo><mrow><mrow><mi>x</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>y</mi></mrow><mo>∈</mo><mi mathvariant="normal">Σ</mi></mrow><mo stretchy="false">}</mo></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(8)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p7" class="ltx_para">
<p class="ltx_p">The general restriction corresponding to the surface-requirement rule
<math id="S3.SS2.SSS1.p7.m1" class="ltx_Math" alttext="a\text{:}b\Leftarrow\bigcup_{i=0}^{n}L_{i}\ \_\ R_{i}" display="inline"><mrow><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>b</mi></mrow><mo>⇐</mo><mrow><msubsup><mo largeop="true" mathsize="160%" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mrow><mpadded width="+5pt"><msub><mi>L</mi><mi>i</mi></msub></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo>⁢</mo><msub><mi>R</mi><mi>i</mi></msub></mrow></mrow></mrow></math> is given by
(<a href="#S3.E9" title="(9) ‣ 3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">9</span></a>).</p>
</div>
<div id="S3.SS2.SSS1.p8" class="ltx_para">
<table id="S3.E9" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E9.m1" class="ltx_Math" alttext="\Big{(}\Sigma^{*}\diamond[a\text{:}]-a\text{:}b\ \diamond\Sigma^{*}\cap\bigcup%
_{i=0}^{n}L_{i}\diamond\Sigma^{*}\diamond R_{i}\Big{)}\overset{2\diamond}{%
\Rightarrow}\emptyset" display="block"><mrow><mrow><mo maxsize="160%" minsize="160%">(</mo><mrow><mrow><mrow><mrow><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><mrow><mo stretchy="false">[</mo><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext></mrow><mo stretchy="false">]</mo></mrow></mrow><mo>-</mo><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mpadded width="+5pt"><mi>b</mi></mpadded></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo>∩</mo><mrow><munderover><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></munderover><msub><mi>L</mi><mi>i</mi></msub></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msub><mi>R</mi><mi>i</mi></msub></mrow><mo maxsize="160%" minsize="160%">)</mo></mrow><mo>⁢</mo><mover accent="true"><mo>⇒</mo><mrow><mn>2</mn><mo>⋄</mo></mrow></mover><mo>⁢</mo><mi mathvariant="normal">∅</mi></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(9)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p9" class="ltx_para">
<p class="ltx_p">The general restriction corresponding to the surface-prohibition rule
<math id="S3.SS2.SSS1.p9.m1" class="ltx_Math" alttext="a\text{:}b\/\Leftarrow\bigcup_{i=0}^{n}L_{i}\ \_\ R_{i}" display="inline"><mrow><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>b</mi></mrow><mo>⇐</mo><mrow><msubsup><mo largeop="true" mathsize="160%" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mrow><mpadded width="+5pt"><msub><mi>L</mi><mi>i</mi></msub></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo>⁢</mo><msub><mi>R</mi><mi>i</mi></msub></mrow></mrow></mrow></math> is similar. It
is given by (<a href="#S3.E10" title="(10) ‣ 3.2.1 Compiling one rule. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">10</span></a>).</p>
</div>
<div id="S3.SS2.SSS1.p10" class="ltx_para">
<table id="S3.E10" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E10.m1" class="ltx_Math" alttext="\Big{(}\Sigma^{*}\diamond a\text{:}b\ \diamond\Sigma^{*}\cap\bigcup_{i=0}^{n}L%
_{i}\diamond\Sigma^{*}\diamond R_{i}\Big{)}\overset{2\diamond}{\Rightarrow}\emptyset" display="block"><mrow><mrow><mo maxsize="160%" minsize="160%">(</mo><mrow><mrow><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mpadded width="+5pt"><mi>b</mi></mpadded></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo>∩</mo><mrow><munderover><mo largeop="true" mathsize="160%" movablelimits="false" stretchy="false" symmetric="true">⋃</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></munderover><msub><mi>L</mi><mi>i</mi></msub></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msub><mi>R</mi><mi>i</mi></msub></mrow><mo maxsize="160%" minsize="160%">)</mo></mrow><mo>⁢</mo><mover accent="true"><mo>⇒</mo><mrow><mn>2</mn><mo>⋄</mo></mrow></mover><mo>⁢</mo><mi mathvariant="normal">∅</mi></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(10)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p11" class="ltx_para">
<p class="ltx_p">Using the GR-operation, epenthesis rules have the same semantics as
other surface-requirement rules. The rule <math id="S3.SS2.SSS1.p11.m1" class="ltx_Math" alttext="0\text{:}a\Leftarrow b\ \_\ b" display="inline"><mrow><mrow><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>a</mi></mrow><mo>⇐</mo><mrow><mpadded width="+5pt"><mi>b</mi></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo>⁢</mo><mi>b</mi></mrow></mrow></math> rejects the correspondences <math id="S3.SS2.SSS1.p11.m2" class="ltx_Math" alttext="bb" display="inline"><mrow><mi>b</mi><mo>⁢</mo><mi>b</mi></mrow></math> and <math id="S3.SS2.SSS1.p11.m3" class="ltx_Math" alttext="b0\text{:}cb" display="inline"><mrow><mi>b</mi><mo>⁢</mo><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>c</mi><mo>⁢</mo><mi>b</mi></mrow></math>, but accepts
<math id="S3.SS2.SSS1.p11.m4" class="ltx_Math" alttext="b0\text{:}ab" display="inline"><mrow><mi>b</mi><mo>⁢</mo><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mi>b</mi></mrow></math>.
</p>
</div>
<div id="S3.SS2.SSS1.p12" class="ltx_para">
<p class="ltx_p">The <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler compiles epenthesis rules in a different
way than <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>. In <span class="ltx_text ltx_font_smallcaps">TwolC</span>, the rule <math id="S3.SS2.SSS1.p12.m1" class="ltx_Math" alttext="0\text{:}a\Leftarrow b\ \_\ b" display="inline"><mrow><mrow><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>a</mi></mrow><mo>⇐</mo><mrow><mpadded width="+5pt"><mi>b</mi></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo>⁢</mo><mi>b</mi></mrow></mrow></math> becomes equivalent to the expression <math id="S3.SS2.SSS1.p12.m2" class="ltx_Math" alttext="\Sigma^{*}-(\Sigma^{*}bb\Sigma^{*})" display="inline"><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>-</mo><mrow><mo stretchy="false">(</mo><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⁢</mo><mi>b</mi><mo>⁢</mo><mi>b</mi><mo>⁢</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo stretchy="false">)</mo></mrow></mrow></math>, which means that <math id="S3.SS2.SSS1.p12.m3" class="ltx_Math" alttext="bb" display="inline"><mrow><mi>b</mi><mo>⁢</mo><mi>b</mi></mrow></math> is rejected, but
<math id="S3.SS2.SSS1.p12.m4" class="ltx_Math" alttext="b0\text{:}cb" display="inline"><mrow><mi>b</mi><mo>⁢</mo><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>c</mi><mo>⁢</mo><mi>b</mi></mrow></math> is accepted, provided that the pair <math id="S3.SS2.SSS1.p12.m5" class="ltx_Math" alttext="0\text{:}c" display="inline"><mrow><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>c</mi></mrow></math> is
declared in the alphabet <math id="S3.SS2.SSS1.p12.m6" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math>. This makes it impossible to
interpret one epenthesis rule a special case of another epenthesis
rule.</p>
</div>
<div id="S3.SS2.SSS1.p13" class="ltx_para">
<p class="ltx_p">E.g. we might want the pair <math id="S3.SS2.SSS1.p13.m1" class="ltx_Math" alttext="0\text{:}v" display="inline"><mrow><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>v</mi></mrow></math> between two vowels, but the pair
<math id="S3.SS2.SSS1.p13.m2" class="ltx_Math" alttext="0\text{:}w" display="inline"><mrow><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>w</mi></mrow></math> between two like vowels. This can be expressed by the rules</p>
</div>
<div id="S3.SS2.SSS1.p14" class="ltx_para">
<table id="S3.Ex1" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.Ex1.m1" class="ltx_Math" alttext="0\text{:}v\Leftarrow\text{Vowel}\ \_\ \text{Vowel}\ ;\text{ and
}0\text{:}w\Leftarrow V_{x}\ \_\ V_{x}\text{, }V_{x}\in\text{Vowel}\ ;" display="block"><mrow><mrow><mrow><mrow><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>v</mi></mrow><mo>⇐</mo><mrow><mpadded width="+5pt"><mtext>Vowel</mtext></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo>⁢</mo><mpadded width="+5pt"><mtext>Vowel</mtext></mpadded></mrow></mrow><mo>;</mo><mrow><mrow><mtext> and </mtext><mo>⁢</mo><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>w</mi></mrow><mo>⇐</mo><mrow><mpadded width="+5pt"><msub><mi>V</mi><mi>x</mi></msub></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded><mo>⁢</mo><msub><mi>V</mi><mi>x</mi></msub><mo>⁢</mo><mtext>, </mtext><mo>⁢</mo><msub><mi>V</mi><mi>x</mi></msub></mrow><mo>∈</mo><mpadded width="+5pt"><mtext>Vowel</mtext></mpadded></mrow></mrow><mo>;</mo></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS1.p15" class="ltx_para">
<p class="ltx_p">In <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> conflict resolution modifies the context of the
more general rule. A correspondence with <math id="S3.SS2.SSS1.p15.m1" class="ltx_Math" alttext="0\text{:}t" display="inline"><mrow><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>t</mi></mrow></math> between like
vowels becomes disallowed, but a correspondence with <math id="S3.SS2.SSS1.p15.m2" class="ltx_Math" alttext="0\text{:}s" display="inline"><mrow><mn>0</mn><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>s</mi></mrow></math>
between like vowels is allowed. In the <span class="ltx_text ltx_font_smallcaps">TwolC</span> compiler this is
not possible.</p>
</div>
</section>
<section id="S3.SS2.SSS2" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">3.2.2 </span>Resolving rule-conflicts.</h4>

<div id="S3.SS2.SSS2.p1" class="ltx_para">
<p class="ltx_p">Rule-conflicts are situations where different rules require a lexical
string to be realized in different ways. Since each correspondence of
a lexical string and surface string needs to be accepted by all rules
in a two-level grammar, such lexical strings are filtered by the
grammar. Using the GR-operation to compile the rules allows separating
the processes of conflict-resolution and rule-compilation. Previously,
these may have been more entangled, which would explain, why the
conflict resolution of the <span class="ltx_text ltx_font_smallcaps">Xerox</span> compiler sometimes works in
an unexpected way (see example below).</p>
</div>
<div id="S3.SS2.SSS2.p2" class="ltx_para">
<p class="ltx_p">Like <span class="ltx_text ltx_font_smallcaps">TwolC</span>, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> only handles two kinds of
conflicts: right-arrow conflicts and left-arrow conflicts. Right-arrow
conflicts occur between context-restrictions with the same
center-pair. Left-arrow conflicts occur between surface-requirements
with centers having the same lexical symbol, but different
surface-symbols.</p>
</div>
<div id="S3.SS2.SSS2.p3" class="ltx_para">
<p class="ltx_p">Consider the rules</p>
<table id="S3.Ex2" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.Ex2.m1" class="ltx_Math" alttext="a\text{:}b\Rightarrow x\ \_\ ;\text{ and }a\text{:}b\Rightarrow y\ \_\ ;" display="block"><mrow><mrow><mrow><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>b</mi></mrow><mo>⇒</mo><mrow><mpadded width="+5pt"><mi>x</mi></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded></mrow></mrow><mo>;</mo><mrow><mrow><mtext> and </mtext><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>b</mi></mrow><mo>⇒</mo><mrow><mpadded width="+5pt"><mi>y</mi></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded></mrow></mrow></mrow><mo>;</mo></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS2.p4" class="ltx_para">
<p class="ltx_p">These are in right-arrow conflict with each other. Like Xerox
<span class="ltx_text ltx_font_smallcaps">TwolC</span>, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> interprets both rules as
permissions and replaces them with one rule, whose context is the
union of the contexts of the conflicting rules. Joining the contexts
is easy when the rules are compiled using the GR-operation.</p>
</div>
<div id="S3.SS2.SSS2.p5" class="ltx_para">
<p class="ltx_p">A left-arrow conflict is resolvable exactly when one of the
rule-contexts is a sub-context of the other. A trivial example of a
resolvable left-arrow conflict is given by the rules</p>
<table id="S3.Ex3" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.Ex3.m1" class="ltx_Math" alttext="a\text{:}b\Leftarrow\{d\text{, }e\}\ \_\ ;\text{ and }a\text{:}c\Leftarrow d\ %
\_\ ;" display="block"><mrow><mrow><mrow><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>b</mi></mrow><mo>⇐</mo><mrow><mrow><mo stretchy="false">{</mo><mrow><mi>d</mi><mo>⁢</mo><mtext>, </mtext><mo>⁢</mo><mi>e</mi></mrow><mo rspace="7.5pt" stretchy="false">}</mo></mrow><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded></mrow></mrow><mo>;</mo><mrow><mrow><mtext> and </mtext><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>c</mi></mrow><mo>⇐</mo><mrow><mpadded width="+5pt"><mi>d</mi></mpadded><mo>⁢</mo><mpadded width="+5pt"><mi mathvariant="normal">_</mi></mpadded></mrow></mrow></mrow><mo>;</mo></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS2.p6" class="ltx_para">
<p class="ltx_p">Here the alphabet <math id="S3.SS2.SSS2.p6.m1" class="ltx_Math" alttext="\Sigma" display="inline"><mi mathvariant="normal">Σ</mi></math> consists of the pairs <math id="S3.SS2.SSS2.p6.m2" class="ltx_Math" alttext="a\text{:}b" display="inline"><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>b</mi></mrow></math>,
<math id="S3.SS2.SSS2.p6.m3" class="ltx_Math" alttext="a\text{:}c" display="inline"><mrow><mi>a</mi><mo>⁢</mo><mtext>:</mtext><mo>⁢</mo><mi>c</mi></mrow></math>, <math id="S3.SS2.SSS2.p6.m4" class="ltx_Math" alttext="d" display="inline"><mi>d</mi></math> and <math id="S3.SS2.SSS2.p6.m5" class="ltx_Math" alttext="e" display="inline"><mi>e</mi></math>. This is resolved by replacing the more
general context with the difference of that context and the more
specific context as given by (<a href="#S3.E11" title="(11) ‣ 3.2.2 Resolving rule-conflicts. ‣ 3.2 Compiling the Rules and Resolving Rule-Conflicts ‣ 3 HFST-TwolC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">11</span></a>), where we have
written the contexts as generalized restriction contexts.</p>
</div>
<div id="S3.SS2.SSS2.p7" class="ltx_para">
<table id="S3.E11" class="ltx_equation ltx_eqn_table">

<tr class="ltx_equation ltx_eqn_row ltx_align_baseline">
<td class="ltx_eqn_cell ltx_eqn_center_padleft"></td>
<td class="ltx_eqn_cell ltx_align_center"><math id="S3.E11.m1" class="ltx_Math" alttext="(\Sigma^{*}\{d\text{, }e\}\diamond\Sigma^{*}\diamond\Sigma^{*})-(\Sigma^{*}d%
\diamond\Sigma^{*}\diamond\Sigma^{*})=\Sigma^{*}e\diamond\Sigma^{*}\diamond%
\Sigma^{*}" display="block"><mrow><mrow><mrow><mo stretchy="false">(</mo><mrow><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⁢</mo><mrow><mo stretchy="false">{</mo><mrow><mi>d</mi><mo>⁢</mo><mtext>, </mtext><mo>⁢</mo><mi>e</mi></mrow><mo stretchy="false">}</mo></mrow></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo stretchy="false">)</mo></mrow><mo>-</mo><mrow><mo stretchy="false">(</mo><mrow><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⁢</mo><mi>d</mi></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow><mo stretchy="false">)</mo></mrow></mrow><mo>=</mo><mrow><mrow><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⁢</mo><mi>e</mi></mrow><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup><mo>⋄</mo><msup><mi mathvariant="normal">Σ</mi><mo>*</mo></msup></mrow></mrow></math></td>
<td class="ltx_eqn_cell ltx_eqn_center_padright"></td>
<td rowspan="1" class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right"><span class="ltx_tag ltx_tag_equation ltx_align_right">(11)</span></td>
</tr>
</table>
</div>
<div id="S3.SS2.SSS2.p8" class="ltx_para">
<p class="ltx_p">This example does not compile as expected under
<span class="ltx_text ltx_font_smallcaps">TwolC</span>. Conflict-resolution results in a grammar, which
rejects all lexical strings containing <math id="S3.SS2.SSS2.p8.m1" class="ltx_Math" alttext="a" display="inline"><mi>a</mi></math> or <math id="S3.SS2.SSS2.p8.m2" class="ltx_Math" alttext="e" display="inline"><mi>e</mi></math>.</p>
</div>
<div id="S3.SS2.SSS2.p9" class="ltx_para">
<p class="ltx_p">Right-arrow conflict-resolution may result in large rule-contexts
which may be time-consuming to determinize. Left-arrow conflict
resolution requires testing all pairs of surface-restriction rules
concerning the same lexical symbol. This means that the worst-case
time-requirement is quadratic w.r.t. to the number of rules in the
grammar.</p>
</div>
</section>
</section>
</section>
<section id="S4" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">4 </span><span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span>
</h2>

<div id="S4.p1" class="ltx_para">
<p class="ltx_p">A lexicon compiled using <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> and a grammar of two-level
rules compiled using <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> are combined using the
program <span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span>. It is an implementation of
the <span class="ltx_text ltx_font_italic">intersecting composition</span> algorithm presented by Karttunen
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx5" title="" class="ltx_ref">1994</a>]</cite>. The result of the operation is equivalent to the
composition of the lexicon-transducer with the intersection of the
rule-transducers.</p>
</div>
<div id="S4.p2" class="ltx_para">
<p class="ltx_p">Karttunen <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx5" title="" class="ltx_ref">1994</a>]</cite> observed that the intersection of the
rule-transducers alone may be extremely large and computing it may
take a long time, whereas intersecting composition allows the lexicon
to restrict the intersection of the rule-transducers. This reduces
compilation time significantly.</p>
</div>
<div id="S4.p3" class="ltx_para">
<p class="ltx_p">Although computers have become considerably faster since 1994 and they
have more memory, computing the intersection of the rule-transducers
can still be very space-consuming. We intersected the rule-transducers
of the two-level implementation of
<span class="ltx_text ltx_font_smallcaps">OMorFi<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">12</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">12</sup><span class="ltx_ERROR undefined">\url</span><span class="ltx_text ltx_font_upright">http://kitwiki.csc.fi/kitwiki/Main/OMorFiHome</span></span></span></span></span>,
i.e. Pirinen’s <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx13" title="" class="ltx_ref">2008</a>]</cite> morphological analyzer for
Finnish. Without the intersecting composition, the rule intersection
took eleven hours using the same machine we used for conducting our
other performance-tests. Hence we believe that intersecting
composition is still a necessary operation when developing full-scale
two-level morphological analyzers.</p>
</div>
</section>
<section id="S5" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">5 </span>Full-Scale Morphological Analyzers using <span class="ltx_text ltx_font_smallcaps">HFST</span> Morphological
Tools</h2>

<div id="S5.p1" class="ltx_para">
<p class="ltx_p">We test the performance of the <span class="ltx_text ltx_font_smallcaps">HFST</span> tools by building three
full-scale morphological analyzers of varying complexities for French,
Finnish, and Northern Sámi. All of them highlight different
aspects of the compilation process. To verify the correctness of the
compilation results, we analyzed corpora using the lexical
transducers.</p>
</div>
<div id="S5.p2" class="ltx_para">
<p class="ltx_p">The French analyzer was built from the existing morphological
full-form lexicon
Morphalou<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">13</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">13</sup><span class="ltx_ERROR undefined">\url</span>http://www.cnrtl.fr/lexiques/morphalou/</span></span></span>. The
lexicon was translated into the <span class="ltx_text ltx_font_smallcaps">LexC</span> format and it contains
some 550,000 entries in a single lexicon. Each entry represents a word
form and its analysis. We chose this lexicon for testing
<span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> with a large number of real entries.</p>
</div>
<div id="S5.p3" class="ltx_para">
<p class="ltx_p">The Finnish analyzer has two implementations, i.e. the version using
the <span class="ltx_text ltx_font_smallcaps">SFST</span> compiler format of <span class="ltx_text ltx_font_smallcaps">OMorFi</span> which is
Pirinen’s <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx13" title="" class="ltx_ref">2008</a>]</cite> original analyzer for Finnish, and a
reformulated version using a <span class="ltx_text ltx_font_smallcaps">LexC</span> lexicon and a
<span class="ltx_text ltx_font_smallcaps">TwolC</span> grammar format. The reformulation was done manually by
converting the morpheme sets of the original code into <span class="ltx_text ltx_font_smallcaps">LexC</span>
sublexicons and rewriting the phonological rules from replace cascades
into <span class="ltx_text ltx_font_smallcaps">TwolC</span> rule sets. While care has been taken to ensure
the similarity of the implementations, it should be noted that the
versions are not totally equivalent. We still think they are close
enough for a meaningful comparison of the two approaches.</p>
</div>
<div id="S5.p4" class="ltx_para">
<p class="ltx_p">The Northern Sámi analyzer is an original <span class="ltx_text ltx_font_smallcaps">LexC</span> and
<span class="ltx_text ltx_font_smallcaps">TwolC</span> based morphological analyzer developed in the Divvun
Project<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">14</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">14</sup><span class="ltx_ERROR undefined">\url</span>http://www.divvun.no</span></span></span>. It is a full-fledged
analyzer developed totally independently of the <span class="ltx_text ltx_font_smallcaps">HFST</span> project
and it has both a large number of sublexicons and a large number of
rules.</p>
</div>
<div id="S5.p5" class="ltx_para">
<p class="ltx_p">The characteristics of the analyzers of the three languages are
summarized in Table <a href="#S5.T5" title="Table 5 ‣ 5 Full-Scale Morphological Analyzers using HFST Morphological Tools ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">5</span></a>. The first three of the
columns summarize the <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> lexicons stating the numbers
of sublexicons, lexicon-entries and symbols used in the lexicons. The
remaining three columns summarize the <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> grammars.
They state the numbers of symbol-pairs, rules and subrules in the
double-sided rules and rules with variables. The example for French
has no entries in the last three columns, since it has no two-level
grammar.</p>
</div>
<figure id="S5.T5" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t"></th>
<th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_l ltx_border_t" colspan="3"><span class="ltx_text ltx_font_smallcaps">HFST-LexC</span></th>
<th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_l ltx_border_t" colspan="3"><span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span></th>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_r">Language</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column">  Sublexicons</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column">  Entries</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r">  Symbols</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column">  Pairs</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column">  Rules</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column">  Subrules</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t">French</th>
<td class="ltx_td ltx_align_right ltx_border_t">1</td>
<td class="ltx_td ltx_align_right ltx_border_t">553,158</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">87</td>
<td class="ltx_td ltx_align_right ltx_border_t">—</td>
<td class="ltx_td ltx_align_right ltx_border_t">—</td>
<td class="ltx_td ltx_align_right ltx_border_t">—</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r">Finnish</th>
<td class="ltx_td ltx_align_right">213</td>
<td class="ltx_td ltx_align_right">94,278</td>
<td class="ltx_td ltx_align_right ltx_border_r">301</td>
<td class="ltx_td ltx_align_right">169</td>
<td class="ltx_td ltx_align_right">12</td>
<td class="ltx_td ltx_align_right">76</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_r">Northern Sámi</th>
<td class="ltx_td ltx_align_right ltx_border_b">870</td>
<td class="ltx_td ltx_align_right ltx_border_b">105,503</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">428</td>
<td class="ltx_td ltx_align_right ltx_border_b">313</td>
<td class="ltx_td ltx_align_right ltx_border_b">105</td>
<td class="ltx_td ltx_align_right ltx_border_b">555</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 5: </span>Some numbers characterizing the lexicons and two-level
grammars we used for testing.</figcaption>
</figure>
</section>
<section id="S6" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">6 </span>Performance Evaluation</h2>

<div id="S6.p1" class="ltx_para">
<p class="ltx_p">The goal of the performance evaluation is is to see how far we still
have to go before we reach industrial-strength performance.
Additionally, we wish to see how the performance of the <span class="ltx_text ltx_font_smallcaps">LexC</span>
and <span class="ltx_text ltx_font_smallcaps">TwolC</span> approach with parallel-rules compares to the
approach with cascaded-rules. To achieve these goals, we compare
<span class="ltx_text ltx_font_smallcaps">HFST</span> with some other open source tools and an industrial
strength implementation by Xerox. By compiling the analyzers
mentioned in the previous section, we can also collect performance
figures on real full-fledged morphologies for identifying the most
significant remaining bottle-necks in our tools.</p>
</div>
<div id="S6.p2" class="ltx_para">
<p class="ltx_p">The <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> and <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> tools mimic many of the
Xerox <span class="ltx_text ltx_font_smallcaps">LexC</span> and <span class="ltx_text ltx_font_smallcaps">TwolC</span> functionalities, so the
input-files for the <span class="ltx_text ltx_font_smallcaps">HFST</span> tools require very small
modifications in order to compile using the Xerox tools, and vice
versa. This makes it is easy to compare the performance of the
<span class="ltx_text ltx_font_smallcaps">HFST</span> tools with the Xerox versions.</p>
</div>
<div id="S6.p3" class="ltx_para">
<p class="ltx_p">As the Finnish <span class="ltx_text ltx_font_smallcaps">OMorFi</span> analyzer has two almost identical
versions: one using replace-rules for the <span class="ltx_text ltx_font_smallcaps">SFST</span> compiler and
one using two-level rules for the <span class="ltx_text ltx_font_smallcaps">LexC</span> and <span class="ltx_text ltx_font_smallcaps">TwolC</span>
tools, we are able to compare the efficiency of the two approaches to
building morphological analyzers.</p>
</div>
<div id="S6.p4" class="ltx_para">
<p class="ltx_p">Below, we have five tables summarizing the results of the performance
tests. The first Table <a href="#S6.T6" title="Table 6 ‣ 6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a>
compares the total compile times of the analyzers using the
<span class="ltx_text ltx_font_smallcaps">HFST</span> tools, the Xerox tools, the foma <span class="ltx_text ltx_font_smallcaps">LexC</span> tool and
the <span class="ltx_text ltx_font_smallcaps">SFST</span> compiler. For foma, only the compile-time for the
analyzer of French is given, as foma only comes with a
<span class="ltx_text ltx_font_smallcaps">LexC<span class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">15</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">15</sup><span class="ltx_text ltx_font_upright">foma also has an </span>xfst<span class="ltx_text ltx_font_upright"> interface.</span></span></span></span></span>
interface. For the <span class="ltx_text ltx_font_smallcaps">SFST</span> compiler, only the compile-time for
the Finnish lexicon is given, as our only implementation with cascaded
rules is for Finnish.</p>
</div>
<figure id="S6.T6" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t">Language</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">  <span class="ltx_text ltx_font_smallcaps">HFST</span> tools</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">  foma <span class="ltx_text ltx_font_smallcaps">LexC</span>
</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">  <span class="ltx_text ltx_font_smallcaps">SFST</span> compiler</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">  Xerox tools</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t">French</th>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">45.92 s</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">  16.87 s</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">—</td>
<td class="ltx_td ltx_align_right ltx_border_t">5.46 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r">Finnish</th>
<td class="ltx_td ltx_align_right ltx_border_r">25.42 s</td>
<td class="ltx_td ltx_align_right ltx_border_r">—</td>
<td class="ltx_td ltx_align_right ltx_border_r">  1682.04 s</td>
<td class="ltx_td ltx_align_right">1.83 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_r">Northern Sámi</th>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">  287.21 s</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">—</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">—</td>
<td class="ltx_td ltx_align_right ltx_border_b">  24.61 s</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 6: </span>Total compile-times using <span class="ltx_text ltx_font_smallcaps">HFST</span> tools, foma
<span class="ltx_text ltx_font_smallcaps">LexC</span>, <span class="ltx_text ltx_font_smallcaps">SFST</span> compiler and Xerox tools to compile
lexical transducers. Times are in
seconds.</figcaption>
</figure>
<div id="S6.p5" class="ltx_para">
<p class="ltx_p">The following three Tables <a href="#S6.T7" title="Table 7 ‣ 6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">7</span></a>,
<a href="#S6.T8" title="Table 8 ‣ 6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">8</span></a> and <a href="#S6.T9" title="Table 9 ‣ 6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">9</span></a> give the
HFST compile-times for the analyzers of Finnish, French and
Northern Sámi. The times have been broken down into sub-phases of the
compilation in order to see where the bottle-necks are. The phases are
explained below the tables.</p>
</div>
<figure id="S6.T7" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t">Language</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">2</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">3</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">4</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">Total</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t">French</th>
<td class="ltx_td ltx_align_right ltx_border_t">  19.27 s</td>
<td class="ltx_td ltx_align_right ltx_border_t">  1.18 s</td>
<td class="ltx_td ltx_align_right ltx_border_t">0.08 s</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">  25.40 s</td>
<td class="ltx_td ltx_align_right ltx_border_t">  45.92 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r">Finnish</th>
<td class="ltx_td ltx_align_right">3.59 s</td>
<td class="ltx_td ltx_align_right">0.19 s</td>
<td class="ltx_td ltx_align_right">0.29 s</td>
<td class="ltx_td ltx_align_right ltx_border_r">17.99 s</td>
<td class="ltx_td ltx_align_right">22.05 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_r">Northern Sámi</th>
<td class="ltx_td ltx_align_right ltx_border_b">3.74 s</td>
<td class="ltx_td ltx_align_right ltx_border_b">0.23 s</td>
<td class="ltx_td ltx_align_right ltx_border_b">  2.00 s</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">78.84 s</td>
<td class="ltx_td ltx_align_right ltx_border_b">84.81 s</td>
</tr>
</tbody>
</table>
<ol id="I1" class="ltx_enumerate ltx_centering">
<li id="I1.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">1.</span>
<div id="I1.i1.p1" class="ltx_para">
<p class="ltx_p">The entry parsing and compilation
(cf. Sect <a href="#S2.SS1" title="2.1 Efficient Tokenizing of a Sub-Lexicon Entry ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.1</span></a>)</p>
</div>
</li>
<li id="I1.i2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">2.</span>
<div id="I1.i2.p1" class="ltx_para">
<p class="ltx_p">Union of entries (cf. Sect <a href="#S2.SS2" title="2.2 Efficient Union of Sub-Lexicon Entries ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.2</span></a>)</p>
</div>
</li>
<li id="I1.i3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">3.</span>
<div id="I1.i3.p1" class="ltx_para">
<p class="ltx_p">Morphotactic filtering
(cf. Sect <a href="#S2.SS3" title="2.3 Efficient Implementation of Morphotax over Sublexicons ‣ 2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2.3</span></a>)</p>
</div>
</li>
<li id="I1.i4" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">4.</span>
<div id="I1.i4.p1" class="ltx_para">
<p class="ltx_p">Other phases (Alphabet discovery, minimizing results, etc.)</p>
</div>
</li>
</ol>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 7: </span><span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> performance broken into the different
phases of the compilation process. Times are in
seconds.</figcaption>
</figure>
<figure id="S6.T8" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t">Language</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">2</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">3</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">Total</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t">Finnish</th>
<td class="ltx_td ltx_align_right ltx_border_t">  0.10 s</td>
<td class="ltx_td ltx_align_right ltx_border_t">  0.04 s</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">1.27 s</td>
<td class="ltx_td ltx_align_right ltx_border_t">1.41 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_r">Northern Sámi</th>
<td class="ltx_td ltx_align_right ltx_border_b">2.11 s</td>
<td class="ltx_td ltx_align_right ltx_border_b">1.35 s</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">  24.77 s</td>
<td class="ltx_td ltx_align_right ltx_border_b">  28.23 s</td>
</tr>
</tbody>
</table>
<ol id="I2" class="ltx_enumerate ltx_centering">
<li id="I2.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">1.</span>
<div id="I2.i1.p1" class="ltx_para">
<p class="ltx_p">Reading the input-file and creating auxiliary
data-structures. Compiling rule-contexts into transducers.</p>
</div>
</li>
<li id="I2.i2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">2.</span>
<div id="I2.i2.p1" class="ltx_para">
<p class="ltx_p">Identifying and resolving rule-conflicts.</p>
</div>
</li>
<li id="I2.i3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">3.</span>
<div id="I2.i3.p1" class="ltx_para">
<p class="ltx_p">Combining contexts and centers of single
surface-requirements and context-restrictions. Intersecting subrules
of rules with variables and double-sided rules, in order to form
the final rule-transducers. Minimizing and storing the
rule-transducers.</p>
</div>
</li>
</ol>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 8: </span><span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> performance broken into the different
phases of the compilation process. Times are in
seconds.</figcaption>
</figure>
<figure id="S6.T9" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t">Language</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">1</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">2</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">3</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_r ltx_border_t">4</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">Total</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t">Finnish</th>
<td class="ltx_td ltx_align_right ltx_border_t">0.10 s</td>
<td class="ltx_td ltx_align_right ltx_border_t">1.44 s</td>
<td class="ltx_td ltx_align_right ltx_border_t">0.36 s</td>
<td class="ltx_td ltx_align_right ltx_border_r ltx_border_t">0.07 s</td>
<td class="ltx_td ltx_align_right ltx_border_t">1.97 s</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_r">Northern Sámi</th>
<td class="ltx_td ltx_align_right ltx_border_b">  0.90 s</td>
<td class="ltx_td ltx_align_right ltx_border_b">  154.60 s</td>
<td class="ltx_td ltx_align_right ltx_border_b">  18.26 s</td>
<td class="ltx_td ltx_align_right ltx_border_b ltx_border_r">  0.41 s</td>
<td class="ltx_td ltx_align_right ltx_border_b">  174.17 s</td>
</tr>
</tbody>
</table>
<ol id="I3" class="ltx_enumerate ltx_centering">
<li id="I3.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">1.</span>
<div id="I3.i1.p1" class="ltx_para">
<p class="ltx_p">Reading lexicon-transducer and rule-transducers.</p>
</div>
</li>
<li id="I3.i2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">2.</span>
<div id="I3.i2.p1" class="ltx_para">
<p class="ltx_p">Computing intersecting composition.</p>
</div>
</li>
<li id="I3.i3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">3.</span>
<div id="I3.i3.p1" class="ltx_para">
<p class="ltx_p">Determinizing and minimizing the result of the operation.</p>
</div>
</li>
<li id="I3.i4" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_enumerate">4.</span>
<div id="I3.i4.p1" class="ltx_para">
<p class="ltx_p">Storing the minimized result of the operation.</p>
</div>
</li>
</ol>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 9: </span><span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span> performance broken down
into the different phases of the compilation process. Times are in
seconds.</figcaption>
</figure>
<div id="S6.p6" class="ltx_para">
<p class="ltx_p">Finally, Table <a href="#S6.T10" title="Table 10 ‣ 6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">10</span></a> gives an indication of the
maximal memory consumption during the lexicon compilations using the
<span class="ltx_text ltx_font_smallcaps">HFST</span> tools and the Xerox tools.</p>
</div>
<figure id="S6.T10" class="ltx_table">
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_t">Language</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">  <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span>
</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">  <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>
</th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t">  <span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span>
</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">French</th>
<td class="ltx_td ltx_align_right ltx_border_t">596 MB</td>
<td class="ltx_td ltx_align_right ltx_border_t">—</td>
<td class="ltx_td ltx_align_right ltx_border_t">—</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">Finish</th>
<td class="ltx_td ltx_align_right">181 MB</td>
<td class="ltx_td ltx_align_right">13 MB</td>
<td class="ltx_td ltx_align_right">48 MB</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">Northern Sámi</th>
<td class="ltx_td ltx_align_right">  180 MB</td>
<td class="ltx_td ltx_align_right">  291 MB</td>
<td class="ltx_td ltx_align_right">  1090 MB (1.1 GB)</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">Language</th>
<td class="ltx_td ltx_align_right ltx_border_t">  Xerox <span class="ltx_text ltx_font_smallcaps">LexC</span>
</td>
<td class="ltx_td ltx_align_right ltx_border_t">  Xerox <span class="ltx_text ltx_font_smallcaps">TwolC</span>
</td>
<td class="ltx_td ltx_border_t"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">French</th>
<td class="ltx_td ltx_align_right ltx_border_t">  85 MB</td>
<td class="ltx_td ltx_align_right ltx_border_t">  —</td>
<td class="ltx_td ltx_align_right ltx_border_t">  —</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row">Finnish</th>
<td class="ltx_td ltx_align_right">  28 MB</td>
<td class="ltx_td ltx_align_right">  3 MB</td>
<td class="ltx_td ltx_align_right">  —</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b">Northern Sámi</th>
<td class="ltx_td ltx_align_right ltx_border_b">  13 MB</td>
<td class="ltx_td ltx_align_right ltx_border_b">  12 MB</td>
<td class="ltx_td ltx_align_right ltx_border_b">  —</td>
</tr>
</tbody>
</table>
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 10: </span>Maximum space required using <span class="ltx_text ltx_font_smallcaps">HFST</span> and Xerox
utilities to compile the transducers. Space indications are in
megabytes (MB).</figcaption>
</figure>
<div id="S6.p7" class="ltx_para">
<p class="ltx_p">All tests were conducted on an Intel computer with a Xeon E5450 64 bit
3.00 GHz CPU and 64 GB of memory. For the <span class="ltx_text ltx_font_smallcaps">HFST</span> tools, the
times were extracted using the C language <code class="ltx_verbatim ltx_font_typewriter">clock</code> function. For
other tools, the GNU <code class="ltx_verbatim ltx_font_typewriter">time</code> command was used. In order to monitor
the memory consumption, we used the GNU <code class="ltx_verbatim ltx_font_typewriter">top</code> command.</p>
</div>
<div id="S6.p8" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_smallcaps">HFST</span> has both a weighted and an unweighted implementation,
but the current tests were performed using only the unweighted
implementation of <span class="ltx_text ltx_font_smallcaps">HFST</span>, i.e. in practice we used the
unweighted <span class="ltx_text ltx_font_smallcaps">SFST</span> library implementation of the <span class="ltx_text ltx_font_smallcaps">HFST</span>
tools.</p>
</div>
</section>
<section id="S7" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">7 </span>Discussion and Future Research</h2>

<div id="S7.p1" class="ltx_para">
<p class="ltx_p">In this section we discuss the evaluation results and suggest some
further lines of research and development of the tools that the
evaluation figures seem to indicate. Comparing the total compilation
times for <span class="ltx_text ltx_font_smallcaps">HFST</span> tools, Xerox tools, foma <span class="ltx_text ltx_font_smallcaps">LexC</span> and
<span class="ltx_text ltx_font_smallcaps">SFST</span> compiler, shows that <span class="ltx_text ltx_font_smallcaps">HFST</span> is still a magnitude
slower than the Xerox tools. However, <span class="ltx_text ltx_font_smallcaps">HFST</span> compares well with
the other open-source tools. The decrease in compile-time for the
Finnish lexicon, when parallel-rules are used, is considerable by
improving performance with almost two magnitudes. Most importantly,
using the <span class="ltx_text ltx_font_smallcaps">HFST</span> tools is sufficiently quick not to slow down
the development of full-scale morphological analyzers. Even large
morphological analyzers like the analyzers for French and Northern
Sámi already compile in less than ten minutes.</p>
</div>
<section id="S7.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">7.1 </span>HFST-LexC Performance</h3>

<div id="S7.SS1.p1" class="ltx_para">
<p class="ltx_p">Comparing the <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> compilation times for French and
Northern Sámi given in Table <a href="#S6.T7" title="Table 7 ‣ 6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">7</span></a>,
we see that the entry parsing is almost linear. The lexicon for French
has about five times as many entries as the lexicon for Northern
Sámi. This is a result of the trie-union described in
Sect. <a href="#S2" title="2 HFST-LexC ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>, which speeds up the building of
sublexicons.</p>
</div>
<div id="S7.SS1.p2" class="ltx_para">
<p class="ltx_p">We also see that the number of sublexicons is currently very
influential on the <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> compile-time. The lexicon for
French only has one sub-lexicon, whereas the lexicon for Northern
Sámi has 870 sub-lexicons. This indicates that the current
implementation of the morphotax filtering still slows down the
compilation with some unnecessary copying.</p>
</div>
<div id="S7.SS1.p3" class="ltx_para">
<p class="ltx_p">There are two main parts that dominate the time consumption of the
fourth column of Table <a href="#S6.T7" title="Table 7 ‣ 6 Performance Evaluation ‣ HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers \footnotepubrightsThis article was published in Proceedings of SFCM 2009 in ZÃ¼rich \urlhttp://sfcm2009.org. \springerpostprintdoi10.1007/978-3-642-04131-0_3" class="ltx_ref"><span class="ltx_text ltx_ref_tag">7</span></a>,
i.e. alphabet discovery and final determinization and minimization.
Since the <span class="ltx_text ltx_font_smallcaps">HFST</span> API and underlying libraries need the full
alphabet to be known prior to the compilation of the entries and the
<span class="ltx_text ltx_font_smallcaps">LexC</span> file format provides no declaration of the alphabet,
<span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> needs to make an initial pass through the entries
before the actual compilation phase to gather the alphabet. In
addition, the determinization and minimization of the final result
consumes well over half of the compile-time of the Northern Sámi
lexicon (46,85 seconds), which we estimate is caused by a large number
of lexicons containing epsilon entries giving rise to indeterminism.
</p>
</div>
</section>
<section id="S7.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">7.2 </span>HFST-TwolC Performance</h3>

<div id="S7.SS2.p1" class="ltx_para">
<p class="ltx_p">Examining the <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> compile-times for Finnish and
Northern Sámi shows, that the last phase, i.e. combining contexts
and intersecting subrules, takes up approximately 90 % of the
compile-time. The compile-time for this phase depends heavily on the
intersection, subtraction and determinization algorithms used when
implementing the <span class="ltx_text ltx_font_smallcaps">HFST</span> API.</p>
</div>
<div id="S7.SS2.p2" class="ltx_para">
<p class="ltx_p">In <span class="ltx_text ltx_font_smallcaps">HFST</span>, we have not yet implemented an <span class="ltx_text ltx_font_italic">other-symbol</span>
like the one in the Xerox <span class="ltx_text ltx_font_smallcaps">TwolC</span> presented by Karttunen
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx3" title="" class="ltx_ref">1992</a>]</cite>, the rule-transducers encode a number of
unnecessary transitions. This slows down intersection, difference and
determinization among other operations. It is probably the single most
important factor slowing down <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span>. Like intersection,
intersecting composition is also affected by the lack of an
<span class="ltx_text ltx_font_italic">other-symbol</span>, since intersecting composition is sensitive to
the number of transitions in the rule-transducers.</p>
</div>
</section>
<section id="S7.SS3" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">7.3 </span>Parallel Rules vs. Cascaded Rules</h3>

<div id="S7.SS3.p1" class="ltx_para">
<p class="ltx_p">It is interesting to see, that the two-level <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> and
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> approach to compiling the <span class="ltx_text ltx_font_smallcaps">OMorFi</span> analyzer
for Finnish is so much more efficient than the cascade of
replace-rules, which constitutes the <span class="ltx_text ltx_font_smallcaps">SFST</span> implementation of
<span class="ltx_text ltx_font_smallcaps">OMorFi</span>. We know, that the difference lies in the approach to
compiling the lexicon and the rules, as the unweighted <span class="ltx_text ltx_font_smallcaps">HFST</span>
morphology tools ultimately perform their transducer operations using
the <span class="ltx_text ltx_font_smallcaps">SFST</span> library, even if this happens through the
<span class="ltx_text ltx_font_smallcaps">HFST</span> API.</p>
</div>
<div id="S7.SS3.p2" class="ltx_para">
<p class="ltx_p">One possible reason for the speed-up is that <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span> and
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> are more constrained environments than the
<span class="ltx_text ltx_font_smallcaps">SFST</span> utility <span class="ltx_text ltx_font_typewriter">fst-compiler</span> by Schmid <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bibx10" title="" class="ltx_ref">2005</a>]</cite>,
which is used for compiling the <span class="ltx_text ltx_font_smallcaps">SFST</span> version of the
<span class="ltx_text ltx_font_smallcaps">OMorFi</span> analyzer. We suspect that the great liberty in
constructing rules using <span class="ltx_text ltx_font_smallcaps">SFST</span> may tempt the user to indulge
in unnecessarily unconstrained ways of expressing replacements with a
very local area of application. This manifests itself among other
things as an increased compile-time.</p>
</div>
<div id="S7.SS3.p3" class="ltx_para">
<p class="ltx_p">Converting the <span class="ltx_text ltx_font_smallcaps">LexC</span> and <span class="ltx_text ltx_font_smallcaps">TwolC</span> version of the Finnish
lexicon from the <span class="ltx_text ltx_font_smallcaps">SFST</span> lexicon compiler files took
approximately a week of manual work. While doing this, we were able to
slightly modify and improve the rules in order to remove some of the
incorrect readings that were coming through as analyses of the Finnish
cascaded rule analyzer, which had not been corrected before. This
indicates that the parallel rule set may be easier to test and debug
than the cascaded rules, but first and foremost it confirms the
well-known effect that a reduced compile-time is very significant for
the development process as it allows an increased number of test
cycles during a fixed time-span.</p>
</div>
</section>
<section id="S7.SS4" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">7.4 </span>The <span class="ltx_text ltx_font_italic">Other-symbol</span>
</h3>

<div id="S7.SS4.p1" class="ltx_para">
<p class="ltx_p">We have demonstrated, that the morphology tools <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span>,
<span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and <span class="ltx_text ltx_font_smallcaps">HFST-Intersect-Compose</span> provide a
realistic open-source alternative for constructing morphological
analyzers in the two-level framework. Still, there is room for
improvement, as the performance of the Xerox tools show.</p>
</div>
<div id="S7.SS4.p2" class="ltx_para">
<p class="ltx_p">Currently the performance of both <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and
<span class="ltx_text ltx_font_smallcaps">HFST-Intersect-Compose</span> correlates strongly with the number of
symbol pairs in the alphabet of the two-level grammar. A significant
optimization for these <span class="ltx_text ltx_font_smallcaps">HFST</span> tools would be the introduction
of an <span class="ltx_text ltx_font_italic">other-symbol</span>, which can represent the class of pairs
bearing no special meaning to a rule. Such a symbol would decrease the
number of transitions in rule-transducers. In case the number of
symbol-pairs of the alphabet is large, this has a significant impact
on the performance of both <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and
<span class="ltx_text ltx_font_smallcaps">HFST-Intersect-Compose</span>. In practice, the introduction of the
<span class="ltx_text ltx_font_italic">other-symbol</span> makes both tools insensitive to the number of
symbols in the alphabet of the grammar. We believe, that this may help
us achieve rule compile-times closer to those of Xerox.
</p>
</div>
</section>
<section id="S7.SS5" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">7.5 </span>Future Directions</h3>

<div id="S7.SS5.p1" class="ltx_para">
<p class="ltx_p">In our future research, we intend to look at various aspects of and
methods for integrating the creation and use of weighted transducers
in morphologies. It is already possible to compile both weighted
two-level lexicons and grammars using <span class="ltx_text ltx_font_smallcaps">HFST</span> tools. These can
be combined into weighted lexical transducers using the weighted
version of <span class="ltx_text ltx_font_smallcaps">HFST-Intersect-Compose</span>. It is also possible to
adjoin meaningful weights to lexicon-entries in <span class="ltx_text ltx_font_smallcaps">HFST-LexC</span>.
Currently <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> only provides a way to compile weighted
rules with zero-weights. However, even this small beginning allows us
to combine weighted lexicons and two-level grammars using weighted
intersecting composition. We are currently working on useful ways to
attach weights to two-level rules with applications for weighted
two-level grammars.</p>
</div>
<div id="S7.SS5.p2" class="ltx_para">
<p class="ltx_p">We were able to compare the performance of a cascaded rule approach
with a parallel rule approach using the same underlying finite-state
library. However, using our full-fledged morphologies, we could also
compare different underlying finite-state libraries on real
compilation tasks in order to compare different algorithms and their
implementations. A future task, would be to compare the performance of
e.g. the <span class="ltx_text ltx_font_smallcaps">SFST</span> library with that of <span class="ltx_text ltx_font_smallcaps">OpenFST</span>. Our
preliminary evaluation results show that efficient and well-suited
determinization and minimization algorithms have a significant impact
on the real-world morphology compilation task.</p>
</div>
</section>
</section>
<section id="S8" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">8 </span>Conclusions</h2>

<div id="S8.p1" class="ltx_para">
<p class="ltx_p">We have chosen to create open-source tools and language descriptions
in order to let as many as possible participate in the effort of
providing morphological analyzers for the languages of the world. The
current article present some of the main tools that we have created
based on our unified API for finite-state libraries. The tools include
<span class="ltx_text ltx_font_smallcaps">HFST-LexC</span>, <span class="ltx_text ltx_font_smallcaps">HFST-TwolC</span> and
<span class="ltx_text ltx_font_smallcaps">HFST-Compose-Intersect</span>. We have evaluated the efficiency of
the current implementations in comparison with some of the similar
tools and libraries available using several full-fledged morphological
descriptions. Our tools compare well with other similar open source
tools, even if we still have some challenges ahead before we can catch
up with the commercial tools. We demonstrate that for various reasons
a parallel rule approach seems to be more efficient than the cascaded
rule approach when developing finite-state morphologies.</p>
</div>
</section>
<section id="Sx1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">Acknowledgments</h2>

<div id="Sx1.p1" class="ltx_para">
<p class="ltx_p"><math id="Sx1.p1.m1" class="ltx_Math" alttext="\ldots" display="inline"><mi mathvariant="normal">…</mi></math></p>
</div>
</section>
<section id="bib" class="ltx_bibliography">
<h2 class="ltx_title ltx_title_bibliography">References</h2>

<ul class="ltx_biblist">

<li id="bib.bibx1" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">1983</span>

<span class="ltx_bibblock">
Koskenniemi, K.:
Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production.
University of Helsinki, Department of General Linguistics (1983).

</span>
      </li>

<li id="bib.bibx2" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">1987</span>

<span class="ltx_bibblock">
Karttunen, L., Koskenniemi, K., Kaplan, R.:
A Compiler for Two-Level Phonological Rules.
CSLI Publications (1987)
<span class="ltx_ERROR undefined">\url</span>http://www2.parc.com/istl/members/karttune/publications/archive/twolcomp.pdf.

</span>
      </li>

<li id="bib.bibx3" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">1992</span>

<span class="ltx_bibblock">
Karttunen, L.:
Two-Level Rule Compiler, Technical Report ISTL-92-2, Xerox Palo Alto Research Center (1992).
<span class="ltx_ERROR undefined">\url</span>http://www.xrce.xerox.com/competencies/content-analysis/fssoft/docs/twolc-92/twolc92.html.

</span>
      </li>

<li id="bib.bibx4" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">1993</span>

<span class="ltx_bibblock">
Karttunen, L.:
Finite-State Lexicon Compiler.
Technical Report, ISTL-NLTT2993-04-02, Xerox Palo Alto Research Center (1993)
Palo Alto, California.

</span>
      </li>

<li id="bib.bibx5" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">1994</span>

<span class="ltx_bibblock">
Karttunen, L.:
Constructing Lexical Transducers.
The Proceedings of the 15th International Conference on Computational Linguistics COLING 94, I, 406–411 (1994)

</span>
      </li>

<li id="bib.bibx6" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">1997</span>

<span class="ltx_bibblock">
Mohri, M.:
Finite-state transducers in language and speech processing.
Computational Linguistics 23(2), (1997)

</span>
      </li>

<li id="bib.bibx7" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">2002</span>

<span class="ltx_bibblock">
Mohri, M., Riley. M.:
An efficient algorithm for the n-best-strings problem.
Proceedings of the International Conference on Spoken Language Processing 2002, ICSLP ’02 (2002)

</span>
      </li>

<li id="bib.bibx8" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">2003</span>

<span class="ltx_bibblock">
Beesley, K., Karttunen, L.:
Finite State Morphology.
CSLI Publications (2003).
<span class="ltx_ERROR undefined">\url</span>http://www.fsmbook.com.

</span>
      </li>

<li id="bib.bibx9" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">2004</span>

<span class="ltx_bibblock">
Lombardy, S., Régis-Gianas, Y., Sakharovitch, J.:
Introducing Vaucanson.
Theoretical Computer Science 328, 77–96 (2004)

</span>
      </li>

<li id="bib.bibx10" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">2005</span>

<span class="ltx_bibblock">
Schmid, H.:
A programming language for finite state transducers.
Proceedings of the 5th International Workshop on Finite State Methods in Natural Language Processing, FSMNLP 2005, (2005).
Helsinki, Finland.

</span>
      </li>

<li id="bib.bibx11" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">2006</span>

<span class="ltx_bibblock">
Yli-Jyrä, A., Koskenniemi, K.:
Compiling Generalized Two-Level Rules and Grammars.
Advances in Natural Language Processing, LNCS, 174–185 (2006)

</span>
      </li>

<li id="bib.bibx12" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">2007</span>

<span class="ltx_bibblock">
Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.:
OpenFst: A general and efficient weighted finite-state transducer library.
Proceedings of the Ninth International Conference on Implementation and Application of Automata, CIAA 2007, vol. 4783 LNCS, 11–23 (2007)
<span class="ltx_ERROR undefined">\url</span>http://www.openfst.org.

</span>
      </li>

<li id="bib.bibx13" class="ltx_bibitem">
        <span class="ltx_bibtag ltx_role_refnum">2008</span>

<span class="ltx_bibblock">
Pirinen, T.:
Suomen kielen äärellistilainen automaattinen morfologia avoimen lähdekoodin menetelmin.
Master’s thesis, Helsingin yliopisto (2008)

</span>
      </li>

</ul>
</section>
<div class="ltx_pagination ltx_role_newpage"></div>
</article>
</div>
<footer class="ltx_page_footer">
<div class="ltx_page_logo">Generated  on Thu Sep 28 18:07:52 2017 by <a href="http://dlmf.nist.gov/LaTeXML/">LaTeXML <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg==" alt="[LOGO]"></a>
</div></footer>
</div>
</body>
</html>