2021-mt-apertium/Khanna-2021-mt-apertium.html

<!DOCTYPE html><html>
<head>
<title>Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages   1  footnote 1  1  footnote 1  Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6  </title>
<!--Generated on Fri Feb  4 08:41:29 2022 by LaTeXML (version 0.8.5) http://dlmf.nist.gov/LaTeXML/.-->
<!--Document created on This version: February 4, 2022.-->

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" href="../latexml/LaTeXML.css" type="text/css">
<link rel="stylesheet" href="../latexml/ltx-article.css" type="text/css">
</head>
<body>
<div class="ltx_page_main">
<div class="ltx_page_content">
<article class="ltx_document ltx_authors_1line">
<h1 class="ltx_title ltx_title_document">Recent advances in Apertium, a free / open-source rule-based machine
translation platform for low-resource languages
<span id="footnote1" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup>
        <span class="ltx_tag ltx_tag_note">1</span>
        
        
        
      Springer Open Access publication. This version from pre-print
latex form does not contain some changes made in the editorial process.
Published version available:
<a href="https://link.springer.com/article/10.1007/s10590-021-09260-6" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://link.springer.com/article/10.1007/s10590-021-09260-6</a></span></span></span>
</h1>
<div class="ltx_authors">
<span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Tanmai Khanna 
<br class="ltx_break">Language Technologies Research Centre
<br class="ltx_break">IIIT Hyderabad, Telangana India 500032
<br class="ltx_break"><a href="tanmai.khanna@research.iiit.ac.in" title="" class="ltx_ref ltx_url ltx_font_typewriter">tanmai.khanna@research.iiit.ac.in</a>

</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Jonathan N Washington 
<br class="ltx_break">Swarthmore College
<br class="ltx_break">Swarthmore, PA USA 19081
<br class="ltx_break"><a href="jonathan.washington@swarthmore.edu" title="" class="ltx_ref ltx_url ltx_font_typewriter">jonathan.washington@swarthmore.edu</a>

</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Francis M Tyers 
<br class="ltx_break">Indiana University
<br class="ltx_break">Bloomington, IN USA 47401
<br class="ltx_break"><a href="ftyers@iu.edu" title="" class="ltx_ref ltx_url ltx_font_typewriter">ftyers@iu.edu</a>

</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Sevilay Bayatlı 
<br class="ltx_break">Beykent Üniversitesi
<br class="ltx_break">İstanbul, Turkey
<br class="ltx_break"><a href="sewaletaha@beykent.edu.tr" title="" class="ltx_ref ltx_url ltx_font_typewriter">sewaletaha@beykent.edu.tr</a>

</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Daniel G Swanson 
<br class="ltx_break">Swarthmore College
<br class="ltx_break">Swarthmore, PA USA 19081
<br class="ltx_break"><a href="dswanso1@gmail.com" title="" class="ltx_ref ltx_url ltx_font_typewriter">dswanso1@gmail.com</a>

</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Flammie A Pirinen 
<br class="ltx_break">UiT—Norgga árktalaš universitehta
<br class="ltx_break">NO-9000, Romssa
<br class="ltx_break"><a href="tommi.pirinen@uit.no" title="" class="ltx_ref ltx_url ltx_font_typewriter">tommi.pirinen@uit.no</a>

</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Irene Tang 
<br class="ltx_break">University of Chicago
<br class="ltx_break">Chicago, IL USA 60637
<br class="ltx_break"><a href="itang1@uchicago.edu" title="" class="ltx_ref ltx_url ltx_font_typewriter">itang1@uchicago.edu</a>

</span></span>
<span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author">
<span class="ltx_personname">Hèctor Alòs i Font 
<br class="ltx_break">Centre de Recerca en Sociolingüística i Comunicació
<br class="ltx_break">Universitat de Barcelona
<br class="ltx_break"><a href="hectoralos@gmail.com" title="" class="ltx_ref ltx_url ltx_font_typewriter">hectoralos@gmail.com</a>

</span></span>
</div>
<div class="ltx_dates">(This version: February 4, 2022)</div>

<div class="ltx_abstract">
<h6 class="ltx_title ltx_title_abstract">Abstract</h6>
    
<p class="ltx_p">This paper presents an overview of Apertium, a free and open-source
rule-based machine translation platform. Translation in Apertium happens
through a pipeline of modular tools, and the platform continues to be
improved as more language pairs are added. Several advances have been
implemented since the last publication, including some new optional modules:
a module that allows rules to process recursive structures at the structural
transfer stage, a module that deals with contiguous and discontiguous
multi-word expressions, and a module that resolves anaphora to aid
translation. Also highlighted is the hybridisation of Apertium through
statistical modules that augment the pipeline, and statistical methods that
augment existing modules. This includes morphological disambiguation,
weighted structural transfer, and lexical selection modules that learn from
limited data. The paper also discusses how a platform like Apertium can be
a critical part of access to language technology for so-called low-resource
languages, which might be ignored or deemed unapproachable by popular
corpus-based translation technologies. Finally, the paper presents some of
the released and unreleased language pairs, concluding with a brief look at
some supplementary Apertium tools that prove valuable to users as well as
language developers. All Apertium-related code, including language data, is
free/open-source and available at <a href="https://github.com/apertium" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium</a>.
</p>
    
<p class="ltx_p">Keywords: machine translation low-resource languages rule-based
machine translation hybrid machine translation</p>
  
</div>
<section id="S1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2>
<span id="footnote2" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup>
        <span class="ltx_tag ltx_tag_note">2</span>
        
        
        
      Several of the advances described in this paper were supported by
Google Summer of Code funding, for which the authors are very grateful.</span></span></span>
<div id="S1.p1" class="ltx_para">
<p class="ltx_p">Apertium <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib60" title="Apertium: a free/open-source platform for rule-based machine translation platform" class="ltx_ref">10</a>]</cite> is a free/open-source platform for
rule-based machine translation (RBMT). It was designed to use the shallow
transfer based approach to translation, and most modules in the pipeline work on
rules written by language developers and linguists. The platform provides an
accessible way to create language data and rules, such that apart from language
developers, speakers of a language with a limited understanding of programming
and/or linguistics can create decent translation systems for their languages as
well. This is a superior model for creating translation systems for low-resource
languages both because it involves stakeholders from the language communities,
and because the languages lack widely available corpora that would be needed for
fully data-driven approaches. Apart from developing RBMT systems for
low-resource languages, the Apertium open source organisation also develops and
supports tools for the creation of RBMT systems.</p>
</div>
<div id="S1.p2" class="ltx_para">
<p class="ltx_p">Several advances to the Apertium platform (Release version 3.6) have been
implemented since the previous publication <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib60" title="Apertium: a free/open-source platform for rule-based machine translation platform" class="ltx_ref">10</a>]</cite>. These
include organisational improvements, additional tools, additional methods to
augment RBMT with corpus-based methods, new modules for more precise
translation, a few additional tools not directly involved in the RBMT pipeline,
and resources for many more languages and translation pairs.</p>
</div>
<div id="S1.p3" class="ltx_para">
<p class="ltx_p">Organisational changes include a migration of the codebase from subversion
(hosted by SourceForge) to git (hosted by GitHub), a switch from two-letter ISO
codes (ISO 639-1) to three-letter ISO codes (ISO 639-3), and a three-directory
model for translation pairs (one for components specific to each language, and
one for the common components). Additionally, morphological transducers for a
number of languages make use of Helsinki Finite-State Technology (HFST)
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib116" title="Hfst—framework for compiling and applying morphologies" class="ltx_ref">20</a>]</cite>, morphological disambiguation has been improved in many
languages by using Visual Interactive Syntax Learning Constraint Grammar (VISL
CG-3) <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib28" title="CG-3 – beyond classical constraint grammar" class="ltx_ref">6</a>]</cite>, and several new features have been incorporated into
the lexical selection module.</p>
</div>
<div id="S1.p4" class="ltx_para">
<p class="ltx_p">Section <a href="#S2" title="2 Overview of the Apertium platform ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a> overviews the design of the Apertium RBMT platform.
Section <a href="#S3" title="3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a> discusses modules used by Apertium to
augment RBMT using corpus-based methods. Section <a href="#S4" title="4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a> introduces
the new modules in the pipeline: a module that allows rules to process recursive
structures at the structural transfer stage, a module that deals with contiguous
and discontiguous multiword expressions, and one that resolves anaphors to aid
translation. Section <a href="#S5" title="5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">5</span></a> discusses Apertium’s
contribution to language revitalisation and reclamation efforts.
Section <a href="#S6" title="6 Supplementary tools ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6</span></a> introduces several supplementary Apertium
tools. Section <a href="#S7" title="7 Conclusion ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">7</span></a> concludes.</p>
</div>
</section>
<section id="S2" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">2 </span>Overview of the Apertium platform</h2>

<div id="S2.p1" class="ltx_para">
<p class="ltx_p">The overall design of Apertium is a pipeline with a series of modules. Each
stage of the pipeline reads from and writes to text streams in a consistent
format so that modules can easily be added or removed according to the needs of
the languages in question.</p>
</div>
<div id="S2.p2" class="ltx_para">
<p class="ltx_p">Apertium consists of both the management of the pipeline (the main
<span class="ltx_text ltx_font_typewriter">apertium</span> executable) and all the stages in this pipeline, except where
outside tools (such as HFST for morphological analysis and generation, or CG for
morphological disambiguation) are used. Each stage consists of a general
processor which modifies the stream based on hand-crafted “rules” (coded
linguistic generalisations) for a given language or language pair.
Figure <a href="#S2.F1" title="Figure 1 ‣ 2 Overview of the Apertium platform ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a> shows the entire pipeline, including optional modules.</p>
</div>
<figure id="S2.F1" class="ltx_figure"><img src="x1.png" id="S2.F1.g1" class="ltx_graphics ltx_centering" width="674" height="237" alt="The architecture of Apertium, a transfer-based machine translation
system. Each rounded box is a module available for language-specific or
pair-specific development. Broken lines show optional modules. Lines with
arrows represent the flow of data through the pipeline. The stages in the
pipeline are grouped by whether they are relevant to source-language
analysis, bilingual transfer, or target-language generation—the three
logical sections of the pipeline. The deformatter and reformatter are
language-agnostic and provided by Apertium.">
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 1: </span>The architecture of Apertium, a transfer-based machine translation
system. Each rounded box is a module available for language-specific or
pair-specific development. Broken lines show optional modules. Lines with
arrows represent the flow of data through the pipeline. The stages in the
pipeline are grouped by whether they are relevant to source-language
analysis, bilingual transfer, or target-language generation—the three
logical sections of the pipeline. The deformatter and reformatter are
language-agnostic and provided by Apertium.</figcaption>
</figure>
<div id="S2.p3" class="ltx_para">
<p class="ltx_p">A short overview of each of the stages of the pipeline is provided below. The
new ones are discussed in further detail in Section <a href="#S4" title="4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>.</p>
</div>
<div id="S2.p4" class="ltx_para">
<ul id="S2.I1" class="ltx_itemize">
<li id="S2.I1.ix1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item"><math id="S2.I1.ix1.m1" class="ltx_Math" alttext="-" display="inline"><mo>-</mo></math></span> 
<div id="S2.I1.ix1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Deformatter:</span> Encapsulates any document formatting tags so
that they go through the rest of the translation pipeline untouched.
This is a language-agnostic part of Apertium.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p5" class="ltx_para">
<ul id="S2.I2" class="ltx_itemize">
<li id="S2.I2.ix1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item"><math id="S2.I2.ix1.m1" class="ltx_Math" alttext="-" display="inline"><mo>-</mo></math></span> 
<div id="S2.I2.ix1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Source Language morphological analyser:</span> Segments the
surface form of text (words or multi-word lexical units) using a
finite-state transducer (FST) and delivers one or more lexical forms (or
“analyses”), each of which includes a lemma and a part-of-speech label
(encoded as a “tag”), as well as any relevant subcategory and
grammatical (e.g., inflectional) information (also encoded as tags).</p>
</div>
</li>
</ul>
</div>
<div id="S2.p6" class="ltx_para">
<ul id="S2.I3" class="ltx_itemize">
<li id="S2.I3.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I3.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Source Language morphological disambiguator:</span> Tries to choose
the best sequence of morphological analyses for an ambiguous sentence.

<br class="ltx_break">The original Apertium disambiguator used a first-order hidden Markov
model (HMM). Other statistical models, such as averaged weighted
Perceptron, have since been added and are currently in use for various
languages. Additionally, CG <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib28" title="CG-3 – beyond classical constraint grammar" class="ltx_ref">6</a>]</cite> is often combined with a
statistical model for a two-step process. The different approaches are
discussed in Section <a href="#S3.SS1" title="3.1 Morphological disambiguation ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.1</span></a>.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p7" class="ltx_para">
<ul id="S2.I4" class="ltx_itemize">
<li id="S2.I4.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I4.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Source Language retokenization:</span> Adjusts token boundaries for
multi-word expressions, which can be non-contiguous (such as separable
verbs in Germanic languages), in preparation for translation. Often
this consists of combining component parts into single multi-word
expressions. This module is discussed in more detail in
Section <a href="#S4.SS2" title="4.2 Processing multi-word expressions ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.2</span></a>.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p8" class="ltx_para">
<ul id="S2.I5" class="ltx_itemize">
<li id="S2.I5.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I5.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Lexical transfer:</span> Reads each source-language (SL) lexical
form and delivers a set of corresponding target-language (TL) lexical
forms by looking it up in a bilingual dictionary, implemented as an FST.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p9" class="ltx_para">
<ul id="S2.I6" class="ltx_itemize">
<li id="S2.I6.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I6.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Lexical selection:</span> Based on context rules, chooses the most
adequate translation of ambiguous SL lexical forms. The original
module <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib231" title="Flexible finite-state lexical selection for rule-based machine translation" class="ltx_ref">43</a>]</cite> has been extended with new features
like macros. This is discussed in more detail in
Section <a href="#S3.SS2" title="3.2 Lexical selection ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2</span></a>.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p10" class="ltx_para">
<ul id="S2.I7" class="ltx_itemize">
<li id="S2.I7.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I7.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Source Language anaphora resolution:</span> Resolves references to
earlier items in discourse. Using saliency metrics, this module attaches
the lexical unit of the antecedent to its corresponding anaphor to aid
translation. This module is discussed in more detail in
Section <a href="#S4.SS3" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.3</span></a>.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p11" class="ltx_para">
<ul id="S2.I8" class="ltx_itemize">
<li id="S2.I8.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I8.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Shallow structural transfer:</span> Apertium’s shallow structural
transfer module implements a sequence of one or more finite-state
constraint rules on the output of the lexical selection module. It
generally consists of three sub-modules: a chunker mode, an interchunk
mode, and a postchunk mode.</p>
</div>
<div id="S2.I8.i1.p2" class="ltx_para">
<p class="ltx_p">Apertium 1.0 had one single structural transfer step. This was
considered enough for the translators between the closely related
Iberian Romance languages which constituted the first Apertium
translators. The one-step strategy is still used in the current released
versions of many of them, including the Catalan-Spanish translation
pair, which since then has been continuously improved and is widely
used.<span id="footnote3" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">3</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">3</sup>
                  <span class="ltx_tag ltx_tag_note">3</span>
                  
                  
                  
                In 2020, the Softcatalà-hosted Apertium translators
served an average of 4.6 million requests per month from Spanish to
Catalan and 1.1 million from Catalan to Spanish (data kindly provided by
Xavier Ivars).</span></span></span>
<br class="ltx_break">Beginning with the implementation of the
Spanish-English and Catalan-English language pairs, a three-step
transfer architecture was developed, leading to the release of Apertium
2.0. The first step creates chunks in the source language and reorders
words inside the chunk as per the transfer rules. The second step
reorders chunks based on the target language syntax, and the final step
makes the stream ready for the generator. This is currently the standard
Apertium structural transfer architecture. Several pairs have additional
transfer steps, such as Catalan-Esperanto (5 steps) and French-Occitan
(4 steps).</p>
</div>
<div id="S2.I8.i1.p3" class="ltx_para">
<p class="ltx_p">In the Catalan-Esperanto translator there are three “interchunk” steps
aimed at a deeper syntactic analysis, with the overarching objective of
generating the correct case morphology on various types of nominals in
the target language (Esperanto), since the source language (Catalan)
lacks case morphology except in its pronominal system. The shallow
transfer system is used creatively in other ways as well.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p12" class="ltx_para">
<ul id="S2.I9" class="ltx_itemize">
<li id="S2.I9.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I9.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Recursive structural transfer:</span>
This module is a recently developed alternative to the shallow
structural transfer module (chunker, interchunk, and postchunk). Its
linguistic data is specified as context-free grammars (CFGs) and it uses
a Generalized Left-right Right-reduce (GLR) parser rather than
finite-state chunking to more effectively implement long-distance
reordering. This module is discussed further in
Section <a href="#S4.SS1" title="4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.1</span></a>.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p13" class="ltx_para">
<ul id="S2.I10" class="ltx_itemize">
<li id="S2.I10.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I10.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Target Language retokenization:</span> Adjusts token boundaries for
multi-word expressions, which can be non-contiguous (such as separable
verbs in Germanic languages), in preparation for target-language
morphological generation. Often this consists of separating multi-word
expressions into their component parts. This module is discussed in
more detail in Section <a href="#S4.SS2" title="4.2 Processing multi-word expressions ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.2</span></a>.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p14" class="ltx_para">
<ul id="S2.I11" class="ltx_itemize">
<li id="S2.I11.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I11.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Target Language morphological generator:</span> Delivers the
sequence of TL surface forms for each corresponding TL lexical form
received from earlier modules in the pipeline.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p15" class="ltx_para">
<ul id="S2.I12" class="ltx_itemize">
<li id="S2.I12.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I12.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Target Language Post-generator:</span> Performs mainly orthographic
operations across tokens, for example elision (such as <span class="ltx_text ltx_font_italic">lo + òme
= l’òme</span> in Occitan), fusion (such as <span class="ltx_text ltx_font_italic">da + il = dal</span> in
Italian), epenthesis (such as <span class="ltx_text ltx_font_italic">a ¿ an</span> in English, or <span class="ltx_text ltx_font_italic">с ¿
со</span> and <span class="ltx_text ltx_font_italic">о ¿ об</span> in Russian), or dissimilation (such as
<span class="ltx_text ltx_font_italic">la + agua ¿ el agua</span> in Spanish).</p>
</div>
</li>
</ul>
</div>
<div id="S2.p16" class="ltx_para">
<ul id="S2.I13" class="ltx_itemize">
<li id="S2.I13.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="S2.I13.i1.p1" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_bold">Reformatter:</span> De-encapsulates any formatting information to
prepare a finally formatted document in the target language. This is a
language-agnostic part of Apertium.</p>
</div>
</li>
</ul>
</div>
<div id="S2.p17" class="ltx_para">
<p class="ltx_p">The reader is referred to the Apertium
wiki<span id="footnote4" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">4</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">4</sup>
            <span class="ltx_tag ltx_tag_note">4</span>
            
            
            
          <a href="http://wiki.apertium.org/wiki/Pipeline" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Pipeline</a></span></span></span> for more information
about file naming conventions, mode naming conventions, and dates of
introduction for each stage of the pipeline. Any further additions to the
pipeline will be documented on this wiki.</p>
</div>
<div id="S2.p18" class="ltx_para">
<p class="ltx_p">It should be added that a major difference in the organisation of Apertium
language pairs as compared to the original model is the three-directory
structure currently used for most (but not all) of the released translation
pairs. Initially, every translation pair was developed in a single
self-contained repository that included all relevant linguistic data. Currently,
monolingual data, such as morphological dictionaries, morphological
disambiguators and post-generators, are shared by different translators,
allowing much easier reuse of data and cooperation in the improvement of
linguistic resources <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib124" title="FST trimming: ending dictionary redundancy in Apertium" class="ltx_ref">24</a>]</cite>. Thus, for instance, compiling the
<span class="ltx_text ltx_font_typewriter">apertium-spa-cat</span> pair now depends on the <span class="ltx_text ltx_font_typewriter">apertium-spa</span> and
<span class="ltx_text ltx_font_typewriter">apertium-cat</span> modules, which are also used by other translation pairs.</p>
</div>
</section>
<section id="S3" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">3 </span>Use of corpus-based approaches in Apertium modules</h2>

<div id="S3.p1" class="ltx_para">
<p class="ltx_p">Several methods of incorporating corpus-based approaches into Apertium RBMT
systems are available. These methods fall into the domains of morphological
disambiguation (Section <a href="#S3.SS1" title="3.1 Morphological disambiguation ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.1</span></a>), lexical selection
(Section <a href="#S3.SS2" title="3.2 Lexical selection ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.2</span></a>), and structural transfer
(Section <a href="#S3.SS3" title="3.3 Structural transfer module ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3.3</span></a>).</p>
</div>
<section id="S3.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.1 </span>Morphological disambiguation</h3>

<div id="S3.SS1.p1" class="ltx_para">
<p class="ltx_p">The goal of morphological disambiguation is to choose the correct morphological
analysis if there are multiple possible analyses for a given lexical unit.</p>
</div>
<div id="S3.SS1.p2" class="ltx_para">
<p class="ltx_p">The oldest and most commonly used morphological disambiguation method in
Apertium <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib198" title="Speeding up target language driven part-of-speech tagger training for machine translation" class="ltx_ref">37</a>, <a href="#bib.bib199" title="Training part-of-speech taggers to build machine translation systems for less-resourced language pairs" class="ltx_ref">33</a>]</cite> is a module that relies on patterns
learned from a corpus. This bigram-based morphological disambiguator chooses one
analysis from among those returned by the morphological analyser based on a
probabilistic model of sequences of part-of-speech tags given the surrounding
context.</p>
</div>
<div id="S3.SS1.p3" class="ltx_para">
<p class="ltx_p">Some Apertium disambiguators are implemented instead using statistical methods
based on Hidden Markov Models (HMM), which processes the result of the
application of constraint-grammar rules <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib87" title="Constraint grammar: a language-independent system for parsing unrestricted text" class="ltx_ref">15</a>]</cite>. The
perceptron tagger <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib262" title="Syntactic processing using the generalized perceptron and beam search" class="ltx_ref">52</a>]</cite> in the English language
module (<span class="ltx_text ltx_font_typewriter">apertium-eng</span>) follows one such method.</p>
</div>
<div id="S3.SS1.p4" class="ltx_para">
<p class="ltx_p">Furthermore, VISL CG-3 <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib28" title="CG-3 – beyond classical constraint grammar" class="ltx_ref">6</a>]</cite> has become a popular method among
Apertium developers of implementing morphological disambiguation using
hand-crafted heuristics. For many languages, it is combined with one of the
other methods for a two-step disambiguation stage.</p>
</div>
</section>
<section id="S3.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.2 </span>Lexical selection</h3>

<div id="S3.SS2.p1" class="ltx_para">
<p class="ltx_p">The goal of lexical selection is to choose an adequate translation in the target
language from among several possible translations for a given source-language
lexical unit. An FST-based module that allows the writing of rules has been in
use for some time <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib231" title="Flexible finite-state lexical selection for rule-based machine translation" class="ltx_ref">43</a>]</cite>.</p>
</div>
<div id="S3.SS2.p2" class="ltx_para">
<p class="ltx_p">Apart from manually written rules, a system has also been developed that learns
rules through a maximum-entropy model trained in an unsupervised
manner <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib233" title="Unsupervised training of maximum-entropy models for lexical selection in rule-based machine translation" class="ltx_ref">42</a>]</cite>. The training method requires only a source language
corpus, a statistical target-language language model, and the RBMT system
itself. All possible translations are scored against the TL language model, and
these scores are normalized to provide fractional counts to train
source-language maximum-entropy lexical selection models.</p>
</div>
</section>
<section id="S3.SS3" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">3.3 </span>Structural transfer module</h3>

<div id="S3.SS3.p1" class="ltx_para">
<p class="ltx_p">Structural transfer handles differences between the source and target languages
in terms of word order and morphological information by applying transfer rules.
In the chunker module, these transfer rules function by matching a source
language pattern of lexical items, creating chunks and applying a sequence of
actions to convert the word order and morphological properties of the chunk as
per the target language. There can, however, be more than one potential sequence
of actions for each source language pattern, as well as overlapping patterns. To
generate an accurate translation, transfer rules are applied to the input using
a left-right-longest match algorithm.</p>
</div>
<div id="S3.SS3.p2" class="ltx_para">
<p class="ltx_p">Work has been done to extract, or “learn”, chunking rules using Alignment
Templates <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib82" title="Using alignment templates to infer shallow-transfer machine translation rules" class="ltx_ref">36</a>, <a href="#bib.bib200" title="Automatic induction of shallow-transfer rules for open-source machine translation" class="ltx_ref">34</a>, <a href="#bib.bib127" title="Using unsupervised corpus-based methods to build rule-based machine translation systems" class="ltx_ref">23</a>, <a href="#bib.bib202" title="Inferring shallow-transfer machine translation rules from small parallel corpora" class="ltx_ref">35</a>, <a href="#bib.bib203" title="A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora" class="ltx_ref">32</a>]</cite>. A
parallel corpus is searched for sequences of lexical units that exhibit
differences in order or morphological information.</p>
</div>
<div id="S3.SS3.p3" class="ltx_para">
<p class="ltx_p">In addition, chunker rules can now be weighted so as to apply different rules in
different overlapping lexical environments. These weights can be learned using
an unsupervised maximum entropy approach <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib17" title="Unsupervised weighting of transfer rules in rule-based machine translation using maximum-entropy approach" class="ltx_ref">4</a>]</cite>.</p>
</div>
<div id="S3.SS3.p4" class="ltx_para">
<p class="ltx_p">The basic goal of this method is to choose between conflicting structural
transfer rules based on the lexical environment. For example, the Spanish
sentence <em class="ltx_emph ltx_font_italic">Encontré el pastel muy bueno</em> has (at least) two different
(hypothetical) translations to English depending on the syntactic parse in
Spanish: (a) “I found the cake very good” or (b) “I found the very good
cake”. That is, <em class="ltx_emph ltx_font_italic">muy bueno</em> may be parsed (a) as a complement to the verb
<em class="ltx_emph ltx_font_italic">encuentro</em> or (b) as a modifier to the noun <em class="ltx_emph ltx_font_italic">el pastel</em>. These
parses correspond to different sets of transfer rules, each of which could be
matched: (a) a single verb phrase consisting of V DET N ADV ADJ, or (b) a verb
V, followed by a noun phrase DET N ADV ADJ. The noun phrase rule would specify
that the elements be output in a different order, DET ADV ADJ N, and both rules
that match the verb would add a lexical unit for the pronominal subject.</p>
</div>
<div id="S3.SS3.p5" class="ltx_para">
<p class="ltx_p">A model is produced by running SL text through all possible transfer rules,
comparing the potential translations that are output to a TL language model, and
dividing the scores by the series of SL lemmas that matched each transfer rule
pattern for a given potential translation. If the example above were part of
the training data, then the potential translation <em class="ltx_emph ltx_font_italic">I found the cake very
good</em> would score higher than <em class="ltx_emph ltx_font_italic">I found the very good cake</em> against an
English language model due to having a higher probability. These different
scores are then distributed as weights, along with the input lemmas, attached to
the rules that each translation is the result of. In this example, the weight
assigned to the V DET N ADV ADJ rule for the Spanish lemmas is higher than the
sum of the weights assigned to the V and DET N ADV ADJ rules for these same
lemmas, and hence the V DET N ADV ADJ rule will be selected.</p>
</div>
<div id="S3.SS3.p6" class="ltx_para">
<p class="ltx_p">During translation, when a string of SL text matches multiple transfer rules,
the system is able to choose between them (infer the “correct” one) based on
the weights associated with the rules that the SL lemmas trigger. For example,
if this same sentence were being translated, the V DET N ADV ADJ rule would be
matched, resulting in the output “I found the cake very good”.</p>
</div>
<div id="S3.SS3.p7" class="ltx_para">
<p class="ltx_p">A contrasting example, <em class="ltx_emph ltx_font_italic">Encontré un pastel muy bueno</em>, would match the same
two sets of rules, but would result in translation occurring through the other
rule set. This is because the lemmas of the potential translation <em class="ltx_emph ltx_font_italic">I found
a very good cake</em> would result in higher combined weights for the V and DET N
ADV ADJ rules than the V DET N ADV ADJ rule. This reason for this is that
translations containing these lemmas would have scored higher against an English
language model than translations like <em class="ltx_emph ltx_font_italic">I find a cake very good</em>, resulting
in higher weights for this set of Spanish lemmas attached to this set of rules.</p>
</div>
<div id="S3.SS3.p8" class="ltx_para">
<p class="ltx_p">In both examples of Spanish inputs, using this approach and a suitable corpus to
train an English language model, the set of transfer rules that results in the
more likely English translation is chosen.</p>
</div>
<div id="S3.SS3.p9" class="ltx_para">
<p class="ltx_p">This method has been tested using the Kazakh–Turkish, Kyrgyz–Turkish, and
Spanish–English translation pairs, and it has been observed that the results
are better when there is a greater number of ambiguous rules. The module has
not yet been included in any released translation system.</p>
</div>
</section>
</section>
<section id="S4" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">4 </span>New modules</h2>

<div id="S4.p1" class="ltx_para">
<p class="ltx_p">Several previously-unpublished modules are now available for the Apertium
pipeline. Discussed in this section are <span class="ltx_text ltx_font_typewriter">apertium-recursive</span>, which
provides for true recursive transfer; <span class="ltx_text ltx_font_typewriter">apertium-separable</span>, which enables
the processing of multi-word expressions; and <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span>, which
allows the resolution of anaphors in the source text.</p>
</div>
<section id="S4.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">4.1 </span>Recursive structural transfer</h3>

<figure id="S4.F4" class="ltx_figure"><span class="ltx_inline-para ltx_minipage ltx_align_middle" style="width:433.6pt;">
<span id="S4.F4.p1" class="ltx_para ltx_align_center"><pre class="ltx_verbatim ltx_font_typewriter">
            NP -&gt; det n { 2 + 1 } |
                  NP PP { 2 _ 1 } ;

            PP -&gt; pr NP { 2 + 1 } ;
    
</pre>
</span></span>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">Figure 2: </span>A simple set of recursive rules translating a subset of noun
phrases and prepositional phrases from English to Basque. A noun phrase
(<span class="ltx_text ltx_font_typewriter">NP</span>) in the source language consists of a determiner (<span class="ltx_text ltx_font_typewriter">det</span>)
and a noun (<span class="ltx_text ltx_font_typewriter">n</span>), and may optionally include a prepositional phrase
(<span class="ltx_text ltx_font_typewriter">PP</span>), and a prepositional phrase consists of a preposition
(<span class="ltx_text ltx_font_typewriter">pr</span>) and a noun phrase. All three output rules reverse the order
of the two nodes: the order of a determiner and a noun is reversed, the
order of a noun phrase and a prepositional phrase is reversed, and the order
of an adposition (preposition/postposition) and a noun phrase is reversed.
The action part of the rules (building up the target translation) appears
between braces <span class="ltx_text ltx_font_typewriter">{…}</span>. The indices, <span class="ltx_text ltx_font_typewriter">1</span> and <span class="ltx_text ltx_font_typewriter">2</span>,
indicate the position of the unit matched in the input, <span class="ltx_text ltx_font_typewriter">_</span>
represents a space in the output, and <span class="ltx_text ltx_font_typewriter">+</span> indicates that the words on
either side of it will be conjoined without a space. </figcaption><span class="ltx_inline-para ltx_minipage ltx_align_middle" style="width:433.6pt;">
<span id="S4.F4.p2" class="ltx_para ltx_align_center">
<span class="ltx_p">[Missing Figure: forest: 1]
</span>
</span></span>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">Figure 3: </span>A source language parse tree for the phrase <em class="ltx_emph ltx_font_italic">the house by
the side of that road</em> built using the rules in
figure <a href="#S4.F4" title="Figure 4 ‣ 4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>. When no further application of the rules is
possible, this tree will be transformed into the tree shown in
figure <a href="#S4.F4" title="Figure 4 ‣ 4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>.</figcaption><span class="ltx_inline-para ltx_minipage ltx_align_middle" style="width:433.6pt;">
<span id="S4.F4.p3" class="ltx_para ltx_align_center">
<span class="ltx_p">[Missing Figure: forest: 2]
</span>
</span></span>
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">Figure 4: </span>The target language tree resulting from applying the action
steps of the rules in figure <a href="#S4.F4" title="Figure 4 ‣ 4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a> to the tree in
figure <a href="#S4.F4" title="Figure 4 ‣ 4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>. The analyses yielded by this tree will
generate the Basque phrase <em class="ltx_emph ltx_font_italic">kale haren ertzeko etxea</em> ‘the house by
the side of the road’. The final step of combining definite articles and
postpositions with the immediately preceding words is not
shown.</figcaption>
</figure>
<div id="S4.SS1.p1" class="ltx_para">
<p class="ltx_p">Given the range of possible syntactic structures, it is common for any two
languages to have significantly different word orders. For example, in Welsh,
verbs tend to be at the beginning of a sentence; in English they tend to be in
the middle; and in Kyrgyz, they tend to be at the end.</p>
</div>
<div id="S4.SS1.p2" class="ltx_para">
<p class="ltx_p">These differences are problematic for Apertium’s finite-state chunking module,
which matches fixed sequences of words that must be contiguous. This limitation
means it is fairly easy to write rules which perform operations such as changing
the order of nouns and adjectives, since these are usually adjacent, but
changing larger structures is much harder. Switching the order of the subject
and the main verb, for instance, would generally require writing a rule for each
sequence of words that can make up each of those parts. The English-Spanish pair
has more than 30 chunking rules for handling noun phrases with different numbers
of determiners and adjectives, and those rules don’t attempt to deal with all
structures that may occur in noun phrases, such as relative clauses.</p>
</div>
<div id="S4.SS1.p3" class="ltx_para">
<p class="ltx_p">To deal with the limitations of finite-state chunking, the
<span class="ltx_text ltx_font_typewriter">apertium-recursive</span> module <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib220" title="A tree-based structural transfer module for the Apertium machine translation platform" class="ltx_ref">38</a>]</cite> was developed by
Daniel Swanson as part of Google Summer of Code
2019<span id="footnote5" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">5</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">5</sup>
              <span class="ltx_tag ltx_tag_note">5</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2019/projects/6746718069063680/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2019/projects/6746718069063680/</a>.</span></span></span>
to apply structural transfer rules recursively using context-free grammars
(CFGs) and a Generalized Left-right Right-reduce (GLR) parser. This makes it
possible to process nested structures such as relative clauses or prepositional
phrases within prepositional phrases. An example of the latter is shown in
Figures <a href="#S4.F4" title="Figure 4 ‣ 4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a> and <a href="#S4.F4" title="Figure 4 ‣ 4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>, with the relevant
rules in Figure <a href="#S4.F4" title="Figure 4 ‣ 4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>. In this example, the word order of a set
of nested prepositional phrases needs to be completely reversed (or in
linguistic terms, the order of noun phrases (NPs) and adpositional phrases (PPs)
each needs to be reversed), regardless of the number of prepositional phrases
involved in order to translate from English to Basque.</p>
</div>
<div id="S4.SS1.p4" class="ltx_para">
<p class="ltx_p">A recursive approach to transfer can be helpful for translation pairs between
syntactically more similar languages as well. For example, in the case of the
English-Spanish noun phrase rules mentioned above, the more than 30 rules
required for handling determiners, adjectives, and nouns can be simplified to
less than 10 rules in <span class="ltx_text ltx_font_typewriter">apertium-recursive</span> because more complicated
structures can be handled by composing simpler ones. In fact, the majority of
these can be covered by just 3 rules saying that a noun phrase is composed of a
noun, or an adjective and a noun phrase, or a determiner and a noun phrase.
These 3 rules can immediately handle any number of determiners and adjectives.</p>
</div>
</section>
<section id="S4.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">4.2 </span>Processing multi-word expressions</h3>

<div id="S4.SS2.p1" class="ltx_para">
<p class="ltx_p">Multi-word expressions (MWEs) are compound expressions composed of two or more
words, such as phrasal verbs (<em class="ltx_emph ltx_font_italic">take out</em>, <em class="ltx_emph ltx_font_italic">wake up</em>, <em class="ltx_emph ltx_font_italic">make a
call</em>) and phrasal nouns (<em class="ltx_emph ltx_font_italic">telephone pole</em>). Separable multi-word
expressions are those that may be split by an intermediary word or phrase (such
as <em class="ltx_emph ltx_font_italic">take out</em> in <em class="ltx_emph ltx_font_italic">take the trash out</em>). This phenomenon can be seen in
a number of languages. In English, the multi-word “take away” can remain
unified, such as in <em class="ltx_emph ltx_font_bold ltx_font_italic">take away the item</em>, or be split up, such as
in <em class="ltx_emph ltx_font_bold ltx_font_italic">take the item away</em>—both phrasings have identical meanings.
This phenomenon can also be seen in some German verbs, where the separable
particle can detach from its lexical core, such as with the separable verb
<em class="ltx_emph ltx_font_italic">anrufen</em> ‘to call’: <em class="ltx_emph ltx_font_bold ltx_font_italic">rufe meine Freundin an</em> ‘call my
friend’. See <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib46" title="Multiword expression processing: a survey" class="ltx_ref">8</a>]</cite> for more on this phenomenon.</p>
</div>
<div id="S4.SS2.p2" class="ltx_para">
<p class="ltx_p">Separable MWEs are particularly problematic for Apertium’s rule-based
translation. Prior to the introduction of the <span class="ltx_text ltx_font_typewriter">apertium-separable</span>
module, the individual components of both non-separable and separable
multi-words were translated as individual tokens, often leading to less-optimal
translations. For example, during the English-to-Spanish translation of
<em class="ltx_emph ltx_font_bold ltx_font_italic">take the trash away</em>, the phrase’s individual components were
translated to produce <em class="ltx_emph ltx_font_italic">tomar la basura fuera</em> which isn’t a phrase that
native speakers of Spanish would produce. The more optimal solution is to
process <em class="ltx_emph ltx_font_italic">take away</em> as a single unit in order to obtain the correct
expression <em class="ltx_emph ltx_font_italic">sacar la basura</em>. Similarly, the Arpitan verbal expression
<em class="ltx_emph ltx_font_italic">tornar fâre</em> ‘to redo’ has negative forms of the type <em class="ltx_emph ltx_font_italic">tornar pas
fâre</em> which were not previously recognised nor correctly generated.</p>
</div>
<div id="S4.SS2.p3" class="ltx_para">
<p class="ltx_p"><span class="ltx_text ltx_font_typewriter">Apertium-separable</span> provides a framework to address mistranslations
arising from this sort of non-contiguous word ordering.
Section <a href="#S4.SS2.SSS1" title="4.2.1 The apertium-separable module ‣ 4.2 Processing multi-word expressions ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.2.1</span></a> describes the module and
section <a href="#S4.SS2.SSS2" title="4.2.2 Usage ‣ 4.2 Processing multi-word expressions ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.2.2</span></a> describes its usage.</p>
</div>
<section id="S4.SS2.SSS1" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">4.2.1 </span>The apertium-separable module</h4>

<div id="S4.SS2.SSS1.p1" class="ltx_para">
<p class="ltx_p">The <span class="ltx_text ltx_font_typewriter">apertium-separable</span> module was developed by Irene Tang as part of
Google Summer of Code
2017<span id="footnote6" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">6</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">6</sup>
                <span class="ltx_tag ltx_tag_note">6</span>
                
                
                
              <a href="https://summerofcode.withgoogle.com/archive/2017/projects/4690909727817728/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2017/projects/4690909727817728/</a>.</span></span></span>
to handle both contiguous and discontiguous (or “separable”) MWEs. The module
accepts an XML-format dictionary as input, which contains a list of phrase types
and a list of mappings between MWEs and their component elements—and in the
case of non-contiguous MWEs, a specification of the possible phrase types that
might separate the elements of the MWE.
</p>
</div>
<div id="S4.SS2.SSS1.p2" class="ltx_para">
<p class="ltx_p">As an example, one phrase-type entry that the <span class="ltx_text ltx_font_typewriter">eng</span> dictionary might
include is the definition a noun phrase (NP) as (among other patterns) any
sequence of words such that the first contains a <span class="ltx_text ltx_font_typewriter">&lt;det&gt;</span> tag, the second an
<span class="ltx_text ltx_font_typewriter">&lt;adj&gt;</span> tag, and the last a <span class="ltx_text ltx_font_typewriter">&lt;n&gt;</span> tag. The <span class="ltx_text ltx_font_typewriter">eng</span> dictionary should
then also include an entry specifying how <em class="ltx_emph ltx_font_italic">take away</em> as an MWE followed by
such a noun phrase may be mapped to its component elements. These phrase-type
and vocabulary entries work together as a framework for handling MWEs.</p>
</div>
<div id="S4.SS2.SSS1.p3" class="ltx_para">
<p class="ltx_p">The XML dictionary is compiled into a finite state transducer. As a parser feeds
the input text into the transducer one character or tag at a time, it looks out
for sequences of characters and tags that match anything in the dictionary. If a
match is found, then the parser outputs the corresponding substitution.</p>
</div>
<div id="S4.SS2.SSS1.p4" class="ltx_para">
<p class="ltx_p">Processors for this module may be included in two places in the Apertium RBMT
pipeline: immediately following morphological tagging and preceding lexical
transfer, and immediately following structural transfer and preceding
morphological generation. The former use allows “assembly” of source-language
MWEs for transfer, and the latter “disassembles” transferred target-language
MWEs for morphological generation.</p>
</div>
</section>
<section id="S4.SS2.SSS2" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">4.2.2 </span>Usage</h4>

<div id="S4.SS2.SSS2.p1" class="ltx_para">
<p class="ltx_p">Both contiguous and discontiguous multi-word expressions can also be handled by
this module. Processing seemingly simple contiguous MWEs in this way allows for
more robust bilingual dictionary entries with fairly vanilla morphological
transducers. For example, it may not make sense to have an entry for
<em class="ltx_emph ltx_font_italic">little brother</em> in an English morphological transducer that already
contains the component words, but it is useful to have an entry like this in a
bilingual dictionary with a language like Kyrgyz, which has two words for
brother with the difference in meaning associated with relative age to a
sibling. In this situation, the <span class="ltx_text ltx_font_typewriter">apertium-separable</span> module processes
the analysis of <em class="ltx_emph ltx_font_italic">little brother</em> as an adjective and a noun
(<span class="ltx_text ltx_font_typewriter">^little&lt;adj&gt;$</span> <span class="ltx_text ltx_font_typewriter">^brother&lt;n&gt;&lt;sg&gt;$</span>) and
retokenizes it as a multi-word noun (<span class="ltx_text ltx_font_typewriter">^little
brother&lt;n&gt;&lt;sg&gt;$</span>). Note that the assembly of the MWE (as described
here) would occur in the English-Kyrgyz translation direction before bilingual
dictionary lookup, and the disassembly of the MWE (the reverse) would occur in
the Kyrgyz-English translation direction before morphological generation.</p>
</div>
<div id="S4.SS2.SSS2.p2" class="ltx_para">
<p class="ltx_p">The module is used extensively in the French-Catalan pair, particularly in the
verbal phrases included in the dictionaries. Thus, for example, it is defined
that <em class="ltx_emph ltx_font_italic">faire appel</em> ‘to do appeal’ should be translated as <em class="ltx_emph ltx_font_italic">apel·lar</em>
‘to appeal’. However, there are often adverbs between the verb <em class="ltx_emph ltx_font_italic">faire</em> and
the noun <em class="ltx_emph ltx_font_italic">appel</em>, for example when negated: <em class="ltx_emph ltx_font_italic">ne fait pas appel</em>
‘does not appeal’. The module is used to reorder the phrase before lexical
transfer as <em class="ltx_emph ltx_font_italic">ne fait appel pas</em> (with <em class="ltx_emph ltx_font_italic">fait appel</em> as a single lexical
unit). Since the adverb now follows the multi-word verb instead of appearing
between its components, structural transfer does not need to treat such a
sentence any differently than sentences containing single-word verbs. Similar
examples are found in the [unreleased] Kazakh-Kyrgyz pair.</p>
</div>
</section>
</section>
<section id="S4.SS3" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">4.3 </span>Anaphora resolution</h3>

<div id="S4.SS3.p1" class="ltx_para">
<p class="ltx_p">The <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> module was developed by Tanmai Khanna as part of
Google Summer of Code
2019<span id="footnote7" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">7</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">7</sup>
              <span class="ltx_tag ltx_tag_note">7</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2019/projects/5434868157120512/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2019/projects/5434868157120512/</a>.</span></span></span>
to handle anaphora resolution in the Apertium pipeline. Anaphora resolution is
the process of resolving references to earlier items in the discourse. This is
necessary in a Machine Translation pipeline as languages have different ways of
using anaphors, and sometimes it is necessary to know the antecedent of an
anaphor to translate it correctly.</p>
</div>
<div id="S4.SS3.p2" class="ltx_para">
<p class="ltx_p">For example, in Catalan, the masculine singular possessive determiner is
<em class="ltx_emph ltx_font_italic">el seu</em>. Its gender and number are inflectional properties relating to
how it agrees with nouns, but its referent may be any gender or number. Hence
it could be translated to English as any of <em class="ltx_emph ltx_font_italic">his/her/its/their</em>, the
gender and number of which relate to the referent and not to a modified noun. To
pick the correct translation in English, then, it is necessary to know what
<em class="ltx_emph ltx_font_italic">el seu</em> refers to. Without a module in an Apertium translation pipeline
to do this, a default translation of the anaphor appears in the target language.
For instance, in the case of English possessive determiners, the default is
currently <em class="ltx_emph ltx_font_italic">his</em>.</p>
</div>
<div id="S4.SS3.p3" class="ltx_para">
<p class="ltx_p">While there are several statistical methods to resolve anaphors using machine
learning, Apertium is focused on supporting low-resource language pairs, which
usually don’t have enough data available for these methods to be viable. Common
rule-based approaches, on the other hand, often use parse trees
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib107" title="An algorithm for pronominal anaphora resolution" class="ltx_ref">18</a>, <a href="#bib.bib16" title="CogNIAC: high precision coreference with limited knowledge and linguistic resources" class="ltx_ref">2</a>, <a href="#bib.bib226" title="A rule-based pronoun resolution system for French" class="ltx_ref">41</a>, <a href="#bib.bib108" title="Deterministic coreference resolution based on entity-centric, precision-ranked rules" class="ltx_ref">19</a>, <a href="#bib.bib117" title="Rule-based pronominal anaphora treatment for machine translation" class="ltx_ref">21</a>, <a href="#bib.bib261" title="When annotation schemes change rules help: a configurable approach to coreference resolution beyond ontonotes" class="ltx_ref">51</a>]</cite>.
The <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> module uses a rule-based approach to anaphora
resolution which does not require any training data, nor rely on parse trees.
Based on Mitkov’s algorithm <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib135" title="Multilingual anaphora resolution" class="ltx_ref">25</a>]</cite>, it gives saliency
scores to candidate antecedents in the context (the current and previous three
sentences) based on <span class="ltx_text ltx_font_bold">saliency indicators</span>, which are syntactic or lexical
indicators that are expected to correlate to a higher or lower likelihood that a
candidate antecedent is the correct one, using positive and negative scores
respectively. For example, indefinite nouns can be given a small negative score
and proper nouns can be given a small positive score, as it has been shown
empirically that they are less or more likely to be the antecedent of anaphors,
respectively <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib135" title="Multilingual anaphora resolution" class="ltx_ref">25</a>]</cite>. After the scores of all the
indicators are applied, the candidate with the highest score, hence considered
most salient, is chosen as the antecedent. A complete example of this is
presented in section <a href="#S4.SS3.SSS2" title="4.3.2 Example Usage ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.3.2</span></a>. These saliency indicators are
added in the <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> module as manually written rules. These
rules are written for and are applied based on source-language forms only.
Because of this, a ruleset can be reused for multiple translation pairs with the
same source language.</p>
</div>
<div id="S4.SS3.p4" class="ltx_para">
<p class="ltx_p">Apart from manually written rules, a universal indicator is the Referential
Distance indicator. This indicator, which was also discovered empirically
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib135" title="Multilingual anaphora resolution" class="ltx_ref">25</a>]</cite>, tells the algorithm that as the distance
between the anaphor and candidate antecedent increases, the candidate is less
likely to be the correct antecedent of the anaphor. Penalisation of candidates
that are further from the anaphor is implemented by adding to candidates in the
same sentence as the anaphor a <span class="ltx_text ltx_font_typewriter">+1</span> score, candidates in the preceding
sentence a <span class="ltx_text ltx_font_typewriter">+0</span> score, in the sentence before the preceding sentence a
<span class="ltx_text ltx_font_typewriter">-1</span> score, and so on.</p>
</div>
<div id="S4.SS3.p5" class="ltx_para">
<p class="ltx_p">In the next few sections, some unique features of this module are discussed
(<a href="#S4.SS3.SSS1" title="4.3.1 Some unique features ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.3.1</span></a>), an example highlighting the process and benefit of
having anaphora resolution in the Machine Translation pipeline is shown
(<a href="#S4.SS3.SSS2" title="4.3.2 Example Usage ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.3.2</span></a>), a preliminary evaluation of the module is presented
(<a href="#S4.SS3.SSS3" title="4.3.3 Preliminary evaluation ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.3.3</span></a>), and future work for this module is outlined
(<a href="#S4.SS3.SSS4" title="4.3.4 Future Work ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4.3.4</span></a>).</p>
</div>
<section id="S4.SS3.SSS1" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">4.3.1 </span>Some unique features</h4>

<div id="S4.SS3.SSS1.p1" class="ltx_para">
<p class="ltx_p">Unlike <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib135" title="Multilingual anaphora resolution" class="ltx_ref">25</a>]</cite> original algorithm, this module is
extremely customisable. The linguistic patterns to be detected and the scores
to be assigned are all defined in an XML file specific to each translation
direction. These patterns help identify and rank potential antecedents, and can
include references to various types of surrounding words and even the anaphor
whose antecedent is being resolved. The translation pair developer also has the
ability to define multiple types of anaphors—such as possessive determiners,
reflexive pronouns, zero anaphors, etc.—so as to be able to write separate
rules for the resolution of each of them.
</p>
</div>
</section>
<section id="S4.SS3.SSS2" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">4.3.2 </span>Example Usage</h4>

<figure id="S4.T1" class="ltx_table">
<figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 1: </span>A Catalan-English translation example which highlights a use case for <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span>.</figcaption>
<table class="ltx_tabular ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_justify ltx_th ltx_th_column ltx_border_tt" style="width:160.4pt;"><span class="ltx_text ltx_wrap ltx_font_bold">Input sentence (Catalan)</span></th>
<th class="ltx_td ltx_align_justify ltx_th ltx_th_column ltx_border_tt" style="width:238.5pt;">Els grups del Parlament han mostrat aquest dimarts <span class="ltx_text ltx_font_bold">el seu</span> suport al batle d’Alaró.</th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:160.4pt;"><span class="ltx_text ltx_wrap ltx_font_bold">Reference translation (English)</span></td>
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:238.5pt;">Parliamentary groups showed <span class="ltx_text ltx_font_bold">their</span> support for the mayor of Alaró on Tuesday.</td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:160.4pt;"><span class="ltx_text ltx_wrap ltx_font_bold">Apertium translation without <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> (English)</span></td>
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:238.5pt;">The bands of the Parliament have shown this Tuesday <span class="ltx_text ltx_font_bold">his</span> support at the mayor of Alaró.</td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:160.4pt;"><span class="ltx_text ltx_wrap ltx_font_bold">Apertium translation with <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> (English)</span></td>
<td class="ltx_td ltx_align_justify ltx_border_t" style="width:238.5pt;">The bands of the Parliament have shown this Tuesday <span class="ltx_text ltx_font_bold">their</span> support at the mayor of Alaró.</td>
</tr>
<tr class="ltx_tr">
<td class="ltx_td ltx_align_justify ltx_border_tt" style="width:160.4pt;"></td>
<td class="ltx_td ltx_align_justify ltx_border_tt" style="width:238.5pt;"></td>
</tr>
</tbody>
</table>
</figure>
<div id="S4.SS3.SSS2.p1" class="ltx_para">
<p class="ltx_p">A sample translation which highlights the usage of <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span>
has been given in Table <a href="#S4.T1" title="Table 1 ‣ 4.3.2 Example Usage ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>. The source sentence goes through a
series of modules in the translation pipeline, as described in
Section <a href="#S2" title="2 Overview of the Apertium platform ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>. The output of the lexical selection module
contains a stream of lexical units, including the morphological analysis and the
translation of each lexical unit. This is taken as the input to the
<span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> module. The lexical unit of the example anaphor,
<span class="ltx_text ltx_font_italic">el seu</span>, at this stage in the stream is as follows:</p>
</div>
<div id="S4.SS3.SSS2.p2" class="ltx_para">
<pre class="ltx_verbatim ltx_font_typewriter"> ^el seu&lt;det&gt;&lt;pos&gt;&lt;m&gt;&lt;sg&gt;/his&lt;det&gt;&lt;pos&gt;&lt;m&gt;&lt;sg&gt;$ 
</pre>
</div>
<div id="S4.SS3.SSS2.p3" class="ltx_para">
<p class="ltx_p">The antecedent of the possessive determiner <span class="ltx_text ltx_font_italic">el seu</span> is <span class="ltx_text ltx_font_italic">els
grups</span> ‘the groups’, which is plural, and hence it should be translated as
<span class="ltx_text ltx_font_italic">their</span> in English and not <span class="ltx_text ltx_font_italic">his</span>. The anaphora resolution module
attempts to resolve this anaphor and identify its antecedent by applying all
rules that match the context. For instance, the <span class="ltx_text ltx_font_typewriter">First NP</span> rule gives a
positive score to the first noun of the sentence (<span class="ltx_text ltx_font_italic">grups</span>), as the first
noun of a sentence is more likely to be the antecedent of an anaphor. The
<span class="ltx_text ltx_font_typewriter">Preposition NP</span> rule gives a negative score to a noun that is part of a
prepositional phrase (<span class="ltx_text ltx_font_italic">Parlament</span>), as a noun inside a prepositional
phrase is less likely to be the antecedent of an anaphor. Both of these
tendencies have been observed empirically <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib135" title="Multilingual anaphora resolution" class="ltx_ref">25</a>]</cite>, and
have been implemented as language-specific rules.</p>
</div>
<div id="S4.SS3.SSS2.p4" class="ltx_para">
<p class="ltx_p">After application of all the rules on all candidate antecedents, the one with
the highest score is considered the most salient antecedent for the anaphor. If
the rules are successful, then the correct antecedent should have the highest
score (in this case, <span class="ltx_text ltx_font_italic">bands</span>). The anaphora resolution module then
attaches this antecedent (in the target language) to the lexical unit of the
anaphor:
</p>
</div>
<div id="S4.SS3.SSS2.p5" class="ltx_para">
<pre class="ltx_verbatim ltx_font_typewriter">
^el seu&lt;det&gt;&lt;pos&gt;&lt;m&gt;&lt;sg&gt;/his&lt;det&gt;&lt;pos&gt;&lt;m&gt;&lt;sg&gt;/band&lt;n&gt;&lt;pl&gt;$
</pre>
</div>
<div id="S4.SS3.SSS2.p6" class="ltx_para">
<p class="ltx_p">Based on the properties of the attached antecedent, the anaphor is modified
during structural transfer (<span class="ltx_text ltx_font_italic">his</span> <math id="S4.SS3.SSS2.p6.m1" class="ltx_Math" alttext="\rightarrow" display="inline"><mo>→</mo></math> <span class="ltx_text ltx_font_italic">their</span>, as the
antecedent is plural), resulting in the following lexical unit:</p>
</div>
<div id="S4.SS3.SSS2.p7" class="ltx_para">
<pre class="ltx_verbatim ltx_font_typewriter">
^their&lt;det&gt;&lt;pos&gt;&lt;m&gt;&lt;sg&gt;$
</pre>
</div>
<div id="S4.SS3.SSS2.p8" class="ltx_para">
<p class="ltx_p">The final Apertium translation, after each lexical unit output from structural
transfer has gone through morphological generation, can be seen in
Table <a href="#S4.T1" title="Table 1 ‣ 4.3.2 Example Usage ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1</span></a>. The translation of the anaphor is fixed due to the
use of <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> in the pipeline. While the final Apertium
translation is still not ideal, other aspects of this translation may be fixed
through adjustments to other modules in the translation pipeline: the
preposition <em class="ltx_emph ltx_font_italic">for</em> instead of <em class="ltx_emph ltx_font_italic">at</em> in lexical selection, not using
the article <em class="ltx_emph ltx_font_italic">the</em> with <em class="ltx_emph ltx_font_italic">Parliament</em> in structural transfer or
lexical selection, the placement of the adjunct <em class="ltx_emph ltx_font_italic">this Tuesday</em> in
structural transfer, and <em class="ltx_emph ltx_font_italic">groups</em> for <em class="ltx_emph ltx_font_italic">bands</em> in lexical selection.</p>
</div>
</section>
<section id="S4.SS3.SSS3" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">4.3.3 </span>Preliminary evaluation</h4>

<div id="S4.SS3.SSS3.p1" class="ltx_para">
<p class="ltx_p">The <span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> module has been manually evaluated on two
language pairs—Spanish–English and Catalan–Italian—by rating the
translation of anaphors with and without the module in the pipeline. Since this
is a preliminary evaluation, only third person possessive determiners were
marked as anaphors.</p>
</div>
<div id="S4.SS3.SSS3.p2" class="ltx_para">
<p class="ltx_p">In Spanish, there is a possessive determiner <span class="ltx_text ltx_font_italic">su</span>, which can be
translated to English as <span class="ltx_text ltx_font_italic">his/her/its/their</span> depending on the gender,
number, and animacy of the antecedent. The first 1000 sentences from the Spanish
Europarl corpus were translated to English using Apertium with and without
<span class="ltx_text ltx_font_typewriter">apertium-anaphora</span> in the translation pipeline, and a basic rule-set was
used for the anaphora resolution.<span id="footnote8" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">8</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">8</sup>
                <span class="ltx_tag ltx_tag_note">8</span>
                
                
                
              The rule set used is the one
contained in the revision of
<a href="https://github.com/apertium/apertium-eng-spa/blob/anaphora-transfer/apertium-eng-spa.spa-eng.arx" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium/apertium-eng-spa/blob/anaphora-transfer/apertium-eng-spa.spa-eng.arx</a>
as of the time of publication.</span></span></span> 120 sentences out of these had at least one
possessive determiner, and a manual evaluation was done to check the accuracy of
the translated anaphors in English.</p>
</div>
<div id="S4.SS3.SSS3.p3" class="ltx_para">
<p class="ltx_p">In Catalan, there is a possessive determiner <span class="ltx_text ltx_font_italic">el seu</span> which can translate
as <span class="ltx_text ltx_font_italic">il suo</span> ‘his/her/its’ or <span class="ltx_text ltx_font_italic">il loro</span> ‘their’ in Italian,
depending on the number of the antecedent. A corpus was created using articles
from Kataluna Esperantisto<span id="footnote9" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">9</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">9</sup>
                <span class="ltx_tag ltx_tag_note">9</span>
                
                
                
              The journal can be found at
<a href="https://esperanto.cat/Kataluna-Esperantisto" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://esperanto.cat/Kataluna-Esperantisto</a>.</span></span></span>, a freely available journal,
and random paragraphs were translated. A total of 108 sentences had at least
one possessive determiner, and a manual evaluation was done to check the
accuracy of the translated anaphors in Italian.</p>
</div>
<div id="S4.SS3.SSS3.p4" class="ltx_para">
<p class="ltx_p">The results of these evaluations<span id="footnote10" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">10</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">10</sup>
                <span class="ltx_tag ltx_tag_note">10</span>
                
                
                
              The complete evaluation data can be
found at
<a href="https://github.com/apertium/apertium-anaphora/tree/master/evaluation" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium/apertium-anaphora/tree/master/evaluation</a>.</span></span></span> are
shown in Table <a href="#S4.T2" title="Table 2 ‣ 4.3.3 Preliminary evaluation ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">2</span></a>. Without a module for anaphora resolution, the
anaphor just translates to whatever is provided in the bilingual dictionary,
which in these pairs is the male singular possessive determiner.</p>
</div>
<figure id="S4.T2" class="ltx_table">
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 2: </span>A preliminary evaluation of translation with and without anaphora
resolution (AR) in the pipeline. Accuracy is the percentage of anaphors
translated correctly.</figcaption>
<table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row ltx_border_t"></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Number of</span></th>
<th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" colspan="2"><span class="ltx_text ltx_font_bold">Accuracy (%)</span></th>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row"><span class="ltx_text ltx_font_bold">Systems</span></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row"><span class="ltx_text ltx_font_bold">anaphors evaluated</span></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_column"><span class="ltx_text ltx_font_bold">Without AR</span></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_column"><span class="ltx_text ltx_font_bold">With AR</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Spanish–English</span></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t">120</th>
<td class="ltx_td ltx_align_left ltx_border_t">29.2</td>
<td class="ltx_td ltx_align_left ltx_border_t">54.2</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb"><span class="ltx_text ltx_font_bold">Catalan–Italian</span></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb">108</th>
<td class="ltx_td ltx_align_left ltx_border_bb">83.3</td>
<td class="ltx_td ltx_align_left ltx_border_bb">75.0</td>
</tr>
</tbody>
</table>
</figure>
<div id="S4.SS3.SSS3.p5" class="ltx_para">
<p class="ltx_p">For Spanish–English translation, use of the module led to an increase in
accuracy of anaphor translation, but for Catalan–Italian it resulted in a
slight decrease in the accuracy of resolution. It is important to note here that
the results without anaphora resolution for Catalan–Italian, where all targeted
anaphors by default are translated as singular, still showed an accuracy of
83.3%. This indicates that the test data was not evenly distributed in terms
of the grammatical number of the antecedents of these anaphors.</p>
</div>
<div id="S4.SS3.SSS3.p6" class="ltx_para">
<p class="ltx_p">It is also important to note that the saliency indicators and their respective
scores can be tuned to the domain of the text to get better results. In the
preliminary evaluation, the rule-set was modified after an initial look at the
results. For example, since the Spanish-English evaluation data was transcribed
speech data (Europarl), we were able to add an “impeding indicator” to
patterns that contained a proper noun followed by a comma, which attaches a
slight negative score to such patterns. These are patterns that are likely to
be the speaker addressing an interlocutor, such as <span class="ltx_text ltx_font_italic">Madam President</span>,
<span class="ltx_text ltx_font_italic">Mister Speaker</span>, etc. The interlocutor in these examples is likely to
not be the antecedent for a third-person possessive determiner anaphor that
follows in the context.</p>
</div>
</section>
<section id="S4.SS3.SSS4" class="ltx_subsubsection">
<h4 class="ltx_title ltx_title_subsubsection">
<span class="ltx_tag ltx_tag_subsubsection">4.3.4 </span>Future Work</h4>

<div id="S4.SS3.SSS4.p1" class="ltx_para">
<p class="ltx_p">For now, the linguistic markers used by the anaphora resolution module and their
corresponding scores need to be manually defined by language experts. The
markers provide linguistic cues for anaphora resolution and the scores are
arrived upon empirically.</p>
</div>
<div id="S4.SS3.SSS4.p2" class="ltx_para">
<p class="ltx_p">If these scores can be learnt from a corpus, it would make it much simpler to
have an anaphora module with decent accuracy. Since the scope of the rules would
be largely defined, it would require much less data to learn the scores as
compared to training a corpus-based (machine learning) model to perform anaphora
resolution from scratch.</p>
</div>
<div id="S4.SS3.SSS4.p3" class="ltx_para">
<p class="ltx_p">Another idea is to learn these scores from related languages, as the linguistic
cues for anaphora resolution shouldn’t vary much among related language pairs.
For example, the rules and scores can be learnt from Spanish, which has abundant
data, and applied to Catalan, which is a low-resource language.</p>
</div>
</section>
</section>
</section>
<section id="S5" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">5 </span>Supporting minoritised languages</h2>

<div id="S5.p1" class="ltx_para">
<p class="ltx_p">It is argued by <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib103" title="Digital language death" class="ltx_ref">17</a>]</cite> that many languages that are not considered
endangered do not have a sufficient level of access to language technology to
survive (i.e., maintain intergenerational transmission) in the digital age. He
presents evidence of a “massive die-off caused by the digital divide,” and
suggests that access to language technology is critical for the continued
survival of any currently used language.</p>
</div>
<div id="S5.p2" class="ltx_para">
<p class="ltx_p">We consider MT to be a crucial part of this access to language technology.
Specifically, MT allows speakers of a low-resource language to access resources
in other languages by translating them into their own language. Additionally,
MT enables much more efficient translation of content into low-resource
languages—for example, a small team of speakers of a low-resource language may
use MT to quickly translate Wikipedia pages from a language with large numbers
of high-quality Wikipedia pages. This, of course, requires some attention to
post-editing the results of MT, but that is often far less work than translating
the information by hand.</p>
</div>
<div id="S5.p3" class="ltx_para">
<p class="ltx_p">It must be stated that the Apertium community does not consider MT to be a
single solution for making production-ready translations of texts like marketing
materials, literature, and legal documents—a perception that we have
encountered anecdotally. Any production-ready translation absolutely requires
at a minimum an editor who knows the target language well, and preferably also
with expertise in translation from the source language. In such environments,
MT is simply a solution that reduces the time investment for human translators
to produce a quality translation. It is also meant as a tool for people who do
not know the source language to make sense of material in that language. In
these ways, MT can be a useful tool for speakers of low-resource languages.</p>
</div>
<div id="S5.p4" class="ltx_para">
<p class="ltx_p">Apertium is designed for rule-based MT. In reference to using corpus-based
approaches to developing MT systems for low-resource languages,
<cite class="ltx_cite ltx_citemacro_cite">[<span class="ltx_ref ltx_missing_citation ltx_ref_self">martín-mor2017technologies</span>]</cite> states that “most minoritised languages … do not
have a sufficient number of texts in digital format, because of a lack of
digital texts, a lack of consensus on the standardisation models, etc. In those
cases, Rule-Based Machine Translation (RBMT) is especially useful, since rules
can be manually written even when languages are not fully standardised”. In
other words, what can be done with corpus-based approaches is limited when the
amount of parallel text is limited. That said, Apertium is open to leveraging
corpus-based methods as much as possible given the limitations, as outlined in
Section <a href="#S3" title="3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>. In reality, many Apertium pairs are
technically hybrid MT systems, although the level of incorporation of
corpus-based methods can vary from absolutely none to a rather large amount with
recent advancements.</p>
</div>
<section id="S5.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">5.1 </span>Released translation pairs</h3>

<figure id="S5.T3" class="ltx_table">
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 3: </span>Released translation systems per language family and sub-family.</figcaption>
<div class="ltx_inline-block ltx_align_center ltx_transformed_outer" style="width:465.6pt;height:378.611111111111px;vertical-align:-10.7pt;"><span class="ltx_transformed_inner" style="transform:translate(-7.2pt,-4.2pt) scale(0.97,0.97) ;-webkit-transform:translate(-7.2pt,-4.2pt) scale(0.97,0.97) ;-ms-transform:translate(-7.2pt,-4.2pt) scale(0.97,0.97) ;">
<table class="ltx_tabular ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row ltx_border_t"></th>
<th class="ltx_td ltx_th ltx_th_column ltx_th_row ltx_border_t"></th>
<th class="ltx_td ltx_th ltx_th_column ltx_border_t"></th>
<th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" colspan="2"><span class="ltx_text ltx_font_bold">Translation systems</span></th>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row"><span class="ltx_text ltx_font_bold">Language family</span></th>
<th class="ltx_td ltx_th ltx_th_column ltx_th_row"></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column"><span class="ltx_text ltx_font_bold">Languages</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column"><span class="ltx_text ltx_font_bold">In-family</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column"><span class="ltx_text ltx_font_bold">Out-of-family</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Afro-Asiatic</span></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Semitic</span></th>
<td class="ltx_td ltx_align_right ltx_border_t">2</td>
<td class="ltx_td ltx_align_right ltx_border_t">1</td>
<td class="ltx_td ltx_align_right ltx_border_t">0</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Austronesic</span></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Malayo-Polynesian</span></th>
<td class="ltx_td ltx_align_right">2</td>
<td class="ltx_td ltx_align_right">1</td>
<td class="ltx_td ltx_align_right">0</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Indo-European</span></th>
<th class="ltx_td ltx_th ltx_th_row"></th>
<td class="ltx_td ltx_align_right">34</td>
<td class="ltx_td ltx_align_right">44</td>
<td class="ltx_td ltx_align_right">3</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row"></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Celtic</span></th>
<td class="ltx_td ltx_align_right">2</td>
<td class="ltx_td ltx_align_right">0</td>
<td class="ltx_td ltx_align_right">2</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row"></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Germanic</span></th>
<td class="ltx_td ltx_align_right">8</td>
<td class="ltx_td ltx_align_right">7</td>
<td class="ltx_td ltx_align_right">9</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row"></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Indo-Iranian</span></th>
<td class="ltx_td ltx_align_right">2</td>
<td class="ltx_td ltx_align_right">1</td>
<td class="ltx_td ltx_align_right">0</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row"></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Romance</span></th>
<td class="ltx_td ltx_align_right">12</td>
<td class="ltx_td ltx_align_right">19</td>
<td class="ltx_td ltx_align_right">8</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row"></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Slavic</span></th>
<td class="ltx_td ltx_align_right">9</td>
<td class="ltx_td ltx_align_right">6</td>
<td class="ltx_td ltx_align_right">2</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_th ltx_th_row"></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Constructed</span></th>
<td class="ltx_td ltx_align_right">1</td>
<td class="ltx_td ltx_align_right">0</td>
<td class="ltx_td ltx_align_right">4</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Turkic</span></th>
<th class="ltx_td ltx_th ltx_th_row"></th>
<td class="ltx_td ltx_align_right">4</td>
<td class="ltx_td ltx_align_right">2</td>
<td class="ltx_td ltx_align_right">0</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Uralic</span></th>
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Finno-Ugric</span></th>
<td class="ltx_td ltx_align_right">1</td>
<td class="ltx_td ltx_align_right">0</td>
<td class="ltx_td ltx_align_right">1</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Basque</span></th>
<th class="ltx_td ltx_th ltx_th_row"></th>
<td class="ltx_td ltx_align_right">1</td>
<td class="ltx_td ltx_align_right">0</td>
<td class="ltx_td ltx_align_right">2</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb"><span class="ltx_text ltx_font_bold">Total</span></th>
<th class="ltx_td ltx_th ltx_th_row ltx_border_bb"></th>
<td class="ltx_td ltx_align_right ltx_border_bb">44</td>
<td class="ltx_td ltx_align_right ltx_border_bb">48</td>
<td class="ltx_td ltx_align_right ltx_border_bb">3</td>
</tr>
</tbody>
</table>
</span></div>
</figure>
<figure id="S5.T4" class="ltx_table">
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 4: </span>Released translation pairs with available evaluation data. Coverage is
the percentage of tokens which receive at least one analysis from the
morphological analyser. WER (Word Error Rate), PER (Position-independent
Word Error Rate), and BLEU scores are computed against a reference
translation. A relatively low WER/PER score or a relatively high BLEU score
generally denotes better translation quality.</figcaption>
<div class="ltx_inline-block ltx_align_center ltx_transformed_outer" style="width:756.9pt;height:828.888888888888px;vertical-align:-221.0pt;"><span class="ltx_transformed_inner" style="transform:translate(-56.5pt,-44.6pt) scale(0.87,0.87) ;-webkit-transform:translate(-56.5pt,-44.6pt) scale(0.87,0.87) ;-ms-transform:translate(-56.5pt,-44.6pt) scale(0.87,0.87) ;">
<table class="ltx_tabular ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Systems</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t"><span class="ltx_text ltx_font_bold">Coverage (%)</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t"><span class="ltx_text ltx_font_bold">WER (%)</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t"><span class="ltx_text ltx_font_bold">PER (%)</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t"><span class="ltx_text ltx_font_bold">BLEU (0–1)</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Aragonese–Spanish<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">a</span></sup></span></th>
<td class="ltx_td ltx_align_right ltx_border_t">94.33</td>
<td class="ltx_td ltx_align_right ltx_border_t">11.61–14.12</td>
<td class="ltx_td ltx_border_t"></td>
<td class="ltx_td ltx_align_right ltx_border_t">0.72–0.79</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Belarusian–Russian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">b</span></sup></span></th>
<td class="ltx_td ltx_align_right">84.3</td>
<td class="ltx_td ltx_align_right">25.72</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Breton–French<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">c</span></sup></span></th>
<td class="ltx_td ltx_align_right">87–90</td>
<td class="ltx_td ltx_align_right">38</td>
<td class="ltx_td ltx_align_right">22</td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Catalan–Aragonese<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">w</span></sup></span></th>
<td class="ltx_td ltx_align_right">87.6–93.2</td>
<td class="ltx_td ltx_align_right">19.37</td>
<td class="ltx_td ltx_align_right">17.85</td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Catalan–Italian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">d</span></sup></span></th>
<td class="ltx_td ltx_align_right">94.7</td>
<td class="ltx_td ltx_align_right">14.2</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Catalan–Romanian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">e</span></sup></span></th>
<td class="ltx_td ltx_align_right">88.7</td>
<td class="ltx_td"></td>
<td class="ltx_td ltx_align_right">29</td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Catalan–Sardinian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">f</span></sup></span></th>
<td class="ltx_td ltx_align_right">94.4</td>
<td class="ltx_td ltx_align_right">20.5</td>
<td class="ltx_td ltx_align_right">13.9</td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Danish–Bokmål Norwegian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">g</span></sup></span></th>
<td class="ltx_td ltx_align_right">88.1–95.9</td>
<td class="ltx_td ltx_align_right">10.87</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Danish–Nynorsk Norwegian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">h</span></sup></span></th>
<td class="ltx_td ltx_align_right">87.3–92.7</td>
<td class="ltx_td ltx_align_right">13.64–22.64</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">French–Arpitan<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">v</span></sup></span></th>
<td class="ltx_td ltx_align_right">92.8–95.8</td>
<td class="ltx_td ltx_align_right">5.7</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">French–Occitan<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">i</span></sup></span></th>
<td class="ltx_td ltx_align_right">92.3</td>
<td class="ltx_td ltx_align_right">10.0</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Italian–Catalan<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">j</span></sup></span></th>
<td class="ltx_td ltx_align_right">91.2</td>
<td class="ltx_td ltx_align_right">15.7</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Italian–Sardinian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">k</span></sup></span></th>
<td class="ltx_td ltx_align_right">89.3–96.4</td>
<td class="ltx_td ltx_align_right">9.9</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">North Sámi–Bokmål Norwegian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">l</span></sup></span></th>
<td class="ltx_td ltx_align_right">77.52–94.72</td>
<td class="ltx_td ltx_align_right">39.68–53.31</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Nynorsk–Bokmål Norwegian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">m</span></sup></span></th>
<td class="ltx_td ltx_align_right">92.6–99.2</td>
<td class="ltx_td ltx_align_right">10.71</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Portuguese–Catalan<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">n</span></sup></span></th>
<td class="ltx_td ltx_align_right">91.4</td>
<td class="ltx_td ltx_align_right">14.0</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Romanian–Catalan<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">o</span></sup></span></th>
<td class="ltx_td ltx_align_right">86.8</td>
<td class="ltx_td"></td>
<td class="ltx_td ltx_align_right">46</td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Russian–Belarusian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">q</span></sup></span></th>
<td class="ltx_td ltx_align_right">83.6</td>
<td class="ltx_td ltx_align_right">23.93</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Spanish–Aragonese<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">r</span></sup></span></th>
<td class="ltx_td ltx_align_right">95.22</td>
<td class="ltx_td ltx_align_right">16.83–19.37</td>
<td class="ltx_td"></td>
<td class="ltx_td ltx_align_right">0.65–0.71</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Serbo-Croatian–Macedonian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">s</span></sup></span></th>
<td class="ltx_td ltx_align_right">74.5–90.96</td>
<td class="ltx_td ltx_align_right">48.33</td>
<td class="ltx_td ltx_align_right">48.33</td>
<td class="ltx_td ltx_align_right">0.36</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Swedish–Danish<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">t</span></sup></span></th>
<td class="ltx_td ltx_align_right">83.7–88.0</td>
<td class="ltx_td ltx_align_right">31</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">Ukranian–Russian<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">p</span></sup></span></th>
<td class="ltx_td ltx_align_right">80.9–90.0</td>
<td class="ltx_td ltx_align_right">14.74</td>
<td class="ltx_td"></td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb"><span class="ltx_text ltx_font_bold">Welsh–English<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">u</span></sup></span></th>
<td class="ltx_td ltx_border_bb"></td>
<td class="ltx_td ltx_align_right ltx_border_bb">53.40–64.94</td>
<td class="ltx_td ltx_align_right ltx_border_bb">27.22–34.35</td>
<td class="ltx_td ltx_align_right ltx_border_bb">0.16–0.32</td>
</tr>
</tbody>
</table>
<ul id="S5.I1" class="ltx_itemize">
<li id="S5.I1.ix1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">a</span> 
<div id="S5.I1.ix1.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib128" title="Free/open source shallow-transfer based machine translation for Spanish and Aragonese" class="ltx_ref">22</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">b</span> 
<div id="S5.I1.ix2.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Belarusian_and_Russian/Work_plan" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Belarusian_and_Russian/Work_plan</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">c</span> 
<div id="S5.I1.ix3.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib228" title="Rule-based augmentation of training data in Breton–French statistical machine translation" class="ltx_ref">48</a>, <a href="#bib.bib230" title="Rule-based Breton to French machine translation" class="ltx_ref">49</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix4" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">d</span> 
<div id="S5.I1.ix4.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_final_report" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_final_report</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix5" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">e</span> 
<div id="S5.I1.ix5.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Romanian_and_Catalan/GSOC_2018" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Romanian_and_Catalan/GSOC_2018</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix6" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">f</span> 
<div id="S5.I1.ix6.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib62" title="Una eina per a una llengua en procés d’estandardització: el traductor automàtic català–sard" class="ltx_ref">11</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix7" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">g</span> 
<div id="S5.I1.ix7.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Scandinavian_MT_project" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Scandinavian_MT_project</a>
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix8" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">h</span> 
<div id="S5.I1.ix8.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Scandinavian_MT_project" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Scandinavian_MT_project</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix9" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">i</span> 
<div id="S5.I1.ix9.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/User:Capsot/GSOC_2018_Occitan_French" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/User:Capsot/GSOC_2018_Occitan_French</a>
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix10" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">j</span> 
<div id="S5.I1.ix10.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_final_report" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_final_report</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix11" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">k</span> 
<div id="S5.I1.ix11.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib234" title="Rule-based machine translation for the Italian–Sardinian language pair" class="ltx_ref">46</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix12" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">l</span> 
<div id="S5.I1.ix12.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib223" title="Evaluating North Sámi to Norwegian assimilation rbmt" class="ltx_ref">40</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix13" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">m</span> 
<div id="S5.I1.ix13.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Scandinavian_MT_project" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Scandinavian_MT_project</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix14" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">n</span> 
<div id="S5.I1.ix14.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_final_report" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_final_report</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix15" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">o</span> 
<div id="S5.I1.ix15.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Romanian_and_Catalan/GSOC_2018" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Romanian_and_Catalan/GSOC_2018</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix16" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">p</span> 
<div id="S5.I1.ix16.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Russian_and_Ukrainian/Work_plan" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Russian_and_Ukrainian/Work_plan</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix17" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">q</span> 
<div id="S5.I1.ix17.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Belarusian_and_Russian/Work_plan" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Belarusian_and_Russian/Work_plan</a> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix18" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">r</span> 
<div id="S5.I1.ix18.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib128" title="Free/open source shallow-transfer based machine translation for Spanish and Aragonese" class="ltx_ref">22</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix19" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">s</span> 
<div id="S5.I1.ix19.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib163" title="A rule-based machine translation system from Serbo-Croatian to Macedonian" class="ltx_ref">28</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix20" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">t</span> 
<div id="S5.I1.ix20.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Scandinavian_MT_project" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Scandinavian_MT_project</a>
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix21" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">u</span> 
<div id="S5.I1.ix21.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib227" title="Apertium-cy – a collaboratively-developed free RBMT system for Welsh to English" class="ltx_ref">45</a>]</cite>
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix22" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">v</span> 
<div id="S5.I1.ix22.p1" class="ltx_para">
<p class="ltx_p"><a href="https://wiki.apertium.org/wiki/Hectoralos/GSOC_2020_rapport_final" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://wiki.apertium.org/wiki/Hectoralos/GSOC_2020_rapport_final</a>
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I1.ix23" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">w</span> 
<div id="S5.I1.ix23.p1" class="ltx_para">
<p class="ltx_p"><a href="http://wiki.apertium.org/wiki/Aragonese_and_Catalan/Evaluation" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://wiki.apertium.org/wiki/Aragonese_and_Catalan/Evaluation</a></p>
</div>
</li>
</ul>
</span></div>
</figure>
<div id="S5.SS1.p1" class="ltx_para">
<p class="ltx_p">As of December 2020, there are 51 translation pairs released, corresponding to
44 languages of 6 language families (Table  <a href="#S5.T3" title="Table 3 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">3</span></a>). See
Appendix <a href="#A1" title="Appendix A List of released languages and translation pairs ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">A</span></a> for the full list. The vast majority of the
languages are Indo-European—and many of these are Romance, Slavic, or Germanic
languages. Non Indo-European languages are Afro-Asiatic (Arabic and Maltese),
Austronesian (Indonesian and Malay), Turkic (Crimean Tatar, Kazakh, Tatar and
Turkish), Uralic (North-Sámi) and isolates (Basque). Table <a href="#S5.T4" title="Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a> shows
quality metrics for some of the released pairs.</p>
</div>
<div id="S5.SS1.p2" class="ltx_para">
<p class="ltx_p">For the most part, translation systems are constructed between languages of the
same family. There are only three released translators between unrelated
languages: North Sámi–Norwegian (Bokmål), Basque–English, and Basque–Spanish.
Examples of translation systems developed for translation between closely
related languages include Malay–Indonesian, Maltese–Arabic, Dutch–Afrikaans
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib160" title="Rapid rule-based machine translation between Dutch and Afrikaans" class="ltx_ref">27</a>]</cite>, Crimean Tatar–Turkish <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib68" title="A dependency treebank for Kurmanji Kurdish" class="ltx_ref">12</a>]</cite>, and
Kazakh–Tatar <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib204" title="A free/open-source Kazakh-Tatar machine translation system" class="ltx_ref">31</a>]</cite>. Even inside subfamilies, translation
systems for Romance languages typically target another Romance language (and not
other Indo-European languages), and the same is true of Germanic into Germanic
and even South Slavic into South Slavic, West Slavic into West Slavic, and East
Slavic into East Slavic. There is a heavier density of translation pairs between
Romance languages (19 for 12 languages), between Slavic languages (6 for 9
languages), and between Scandinavian languages (5 for 5 languages or language
varieties). Two languages tend to break the close-proximity rule: English and
Esperanto, which have a significant number of connections with languages outside
their sub-family.<span id="footnote11" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">11</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">11</sup>
              <span class="ltx_tag ltx_tag_note">11</span>
              
              
              
            We consider Esperanto within a specific constructed
subfamily of Indo-European languages.</span></span></span> Despite this, there is no central
language in Apertium: there are 9 translators into both English and Spanish, 8
into Catalan, 4 into both Norwegian (Bokmål) and Esperanto, 3 into both French
and Portuguese, etc.</p>
</div>
<div id="S5.SS1.p3" class="ltx_para">
<p class="ltx_p">The initial objective of Apertium was to create free and open-source resources
for the languages of Spain. In light of the increasing breadth of the published
pairs and ongoing work leveraging the Apertium platform (see
Table <a href="#S5.T4" title="Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">4</span></a>), particularly thanks to funding from the Google Summer of
Code programme, it may be stated that Apertium has since become a major venue
for creating resources for minoritised and low-resource languages in Europe and
has shown potential as a language technology platform supporting languages all
around the world.</p>
</div>
<div id="S5.SS1.p4" class="ltx_para">
<p class="ltx_p">Eleven of the forty-four languages with released translators are considered
vulnerable or endangered <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib145" title="Atlas of the world’s languages in danger" class="ltx_ref">26</a>]</cite>: Aragonese, Arpitan, Asturian,
Basque, Belarusian, Breton, gCrimean Tatar, North Sámi, Occitan, Sardinian, and
Welsh. Other languages hold minority status in their states, like Afrikaans,
Catalan, Galician, Silesian, and Tatar. Recent work on other under-resourced
and/or minoritised languages includes Bashqort <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib232" title="A prototype machine translation system for Tatar and Bashkir based on free/open-source components" class="ltx_ref">44</a>]</cite>,
Bengali <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib57" title="Development of a morphological analyser for Bengali" class="ltx_ref">9</a>]</cite>,
Chukchi<span id="footnote12" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">12</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">12</sup>
              <span class="ltx_tag ltx_tag_note">12</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2017/projects/4736366453719040/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2017/projects/4736366453719040/</a></span></span></span>,
Gagauz <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib18" title="Finite-state morphological analysis for Gagauz" class="ltx_ref">3</a>]</cite>,
Guarani<span id="footnote13" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">13</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">13</sup>
              <span class="ltx_tag ltx_tag_note">13</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2018/projects/5434804640153600/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2018/projects/5434804640153600/</a></span></span></span>,
Qaraqalpaq<span id="footnote14" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">14</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">14</sup>
              <span class="ltx_tag ltx_tag_note">14</span>
              
              
              
            <a href="https://www.google-melange.com/archive/gsoc/2014/orgs/apertium/projects/beknazar.html" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://www.google-melange.com/archive/gsoc/2014/orgs/apertium/projects/beknazar.html</a>,
<a href="https://summerofcode.withgoogle.com/archive/2019/projects/6137485212516352/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2019/projects/6137485212516352/</a>,
<a href="https://summerofcode.withgoogle.com/archive/2020/projects/4815970624864256/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2020/projects/4815970624864256/</a></span></span></span>,
Karelian <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib185" title="Workflows for kickstarting rbmt in virtually no-resource situation" class="ltx_ref">29</a>]</cite>, Kurmanji
Kurdish<span id="footnote15" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">15</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">15</sup>
              <span class="ltx_tag ltx_tag_note">15</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2016/projects/5069737520267264/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2016/projects/5069737520267264/</a></span></span></span>,
Sorani Kurdish <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib221" title="Translators without Borders develops world’s first crisis-specific machine translation for Kurdish" class="ltx_ref">39</a>]</cite>,
Lingala<span id="footnote16" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">16</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">16</sup>
              <span class="ltx_tag ltx_tag_note">16</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2019/projects/4582884889853952/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2019/projects/4582884889853952/</a></span></span></span>,
Malayalam<span id="footnote17" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">17</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">17</sup>
              <span class="ltx_tag ltx_tag_note">17</span>
              
              
              
            <a href="https://www.google-melange.com/archive/gsoc/2014/orgs/apertium/projects/aboobacker.html" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://www.google-melange.com/archive/gsoc/2014/orgs/apertium/projects/aboobacker.html</a></span></span></span>,
Marathi <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib190" title="Finite-state morphological analysis for Marathi" class="ltx_ref">30</a>]</cite>,
Punjabi<span id="footnote18" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">18</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">18</sup>
              <span class="ltx_tag ltx_tag_note">18</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2020/projects/6209442061746176/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2020/projects/6209442061746176/</a></span></span></span>,
Cuzco
Quechua<span id="footnote19" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">19</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">19</sup>
              <span class="ltx_tag ltx_tag_note">19</span>
              
              
              
            <a href="https://www.google-melange.com/archive/gsoc/2012/orgs/apertium/projects/pato_yap.html" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://www.google-melange.com/archive/gsoc/2012/orgs/apertium/projects/pato_yap.html</a></span></span></span>,
Lule-Saami <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib229" title="Developing prototypes for machine translation between two Sámi languages" class="ltx_ref">47</a>]</cite>, South-Saami <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib12" title="A North Saami to South Saami machine translation prototype" class="ltx_ref">1</a>, <a href="#bib.bib229" title="Developing prototypes for machine translation between two Sámi languages" class="ltx_ref">47</a>]</cite>,
Sakha<span id="footnote20" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">20</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">20</sup>
              <span class="ltx_tag ltx_tag_note">20</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2018/projects/4877442304966656/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2018/projects/4877442304966656/</a></span></span></span>,
Sicilian<span id="footnote21" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">21</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">21</sup>
              <span class="ltx_tag ltx_tag_note">21</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2016/projects/5883995808071680/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2016/projects/5883995808071680/</a></span></span></span>,
Iraqi Türkman<span id="footnote22" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">22</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">22</sup>
              <span class="ltx_tag ltx_tag_note">22</span>
              
              
              
             <a href="https://github.com/apertium/apertium-tki" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://github.com/apertium/apertium-tki</a>,
<a href="https://wiki.apertium.org/wiki/Apertium-tki" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://wiki.apertium.org/wiki/Apertium-tki</a></span></span></span>, and
Uyghur<span id="footnote23" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">23</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">23</sup>
              <span class="ltx_tag ltx_tag_note">23</span>
              
              
              
            <a href="https://summerofcode.withgoogle.com/archive/2018/projects/5988796768190464/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2018/projects/5988796768190464/</a>,
<a href="https://summerofcode.withgoogle.com/archive/2019/projects/5106764196872192/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://summerofcode.withgoogle.com/archive/2019/projects/5106764196872192/</a></span></span></span>.
In some cases, coordinated efforts are under way to develop resources for entire
language families, such as for Turkic languages <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib246" title="Free/open-source technologies for Turkic languages developed in the Apertium project" class="ltx_ref">50</a>]</cite>.</p>
</div>
</section>
<section id="S5.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">5.2 </span>Other languages and work ahead</h3>

<figure id="S5.T5" class="ltx_table">
<figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 5: </span>A selection of unreleased translation pairs with published results.
Coverage is the percentage of tokens which receive at least one analysis
from the morphological analyser. WER (Word Error Rate), PER
(Position-independent Word Error Rate), and BLEU scores are computed against
a reference translation. A relatively low WER/PER score or a relatively high
BLEU score generally denotes better translation quality.</figcaption>
<div class="ltx_inline-block ltx_align_center ltx_transformed_outer" style="width:544.0pt;height:165.277777777778px;vertical-align:-35.3pt;"><span class="ltx_transformed_inner" style="transform:translate(-20.5pt,-4.5pt) scale(0.93,0.93) ;-webkit-transform:translate(-20.5pt,-4.5pt) scale(0.93,0.93) ;-ms-transform:translate(-20.5pt,-4.5pt) scale(0.93,0.93) ;">
<table class="ltx_tabular ltx_guessed_headers ltx_align_middle">
<thead class="ltx_thead">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Systems</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t"><span class="ltx_text ltx_font_bold">Coverage (%)</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t"><span class="ltx_text ltx_font_bold">WER (%)</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t"><span class="ltx_text ltx_font_bold">PER (%)</span></th>
<th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_t"><span class="ltx_text ltx_font_bold">BLEU (0–1)</span></th>
</tr>
</thead>
<tbody class="ltx_tbody">
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t"><span class="ltx_text ltx_font_bold">Kazakh–Turkish<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">x</span></sup></span></th>
<td class="ltx_td ltx_align_right ltx_border_t">83.42</td>
<td class="ltx_td ltx_align_right ltx_border_t">45.77</td>
<td class="ltx_td ltx_align_right ltx_border_t">41.69</td>
<td class="ltx_td ltx_align_right ltx_border_t">0.17</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">North Sámi—Finnish<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">y</span></sup></span></th>
<td class="ltx_td ltx_align_right">76.81</td>
<td class="ltx_td ltx_align_right">34.24</td>
<td class="ltx_td ltx_align_right">-</td>
<td class="ltx_td ltx_align_right">-</td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row"><span class="ltx_text ltx_font_bold">North-Saami–South-Saami<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">z</span></sup></span></th>
<td class="ltx_td ltx_align_right">87.4</td>
<td class="ltx_td ltx_align_right">54.84</td>
<td class="ltx_td ltx_align_right">30.94</td>
<td class="ltx_td"></td>
</tr>
<tr class="ltx_tr">
<th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb"><span class="ltx_text ltx_font_bold">Tatar–Bashkir<sup class="ltx_sup"><span class="ltx_text ltx_font_medium">aa</span></sup></span></th>
<td class="ltx_td ltx_align_right ltx_border_bb">70.19</td>
<td class="ltx_td ltx_align_right ltx_border_bb">8.97</td>
<td class="ltx_td ltx_align_right ltx_border_bb">-</td>
<td class="ltx_td ltx_align_right ltx_border_bb">-</td>
</tr>
</tbody>
</table>
<ul id="S5.I2" class="ltx_itemize">
<li id="S5.I2.ix1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">x</span> 
<div id="S5.I2.ix1.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib19" title="Rule-based machine translation from Kazakh to Turkish" class="ltx_ref">5</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I2.ix2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">y</span> 
<div id="S5.I2.ix2.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib85" title="North-Sámi to Finnish rule-based machine translation system" class="ltx_ref">14</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I2.ix3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">z</span> 
<div id="S5.I2.ix3.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib12" title="A North Saami to South Saami machine translation prototype" class="ltx_ref">1</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
<li id="S5.I2.ix4" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">aa</span> 
<div id="S5.I2.ix4.p1" class="ltx_para">
<p class="ltx_p"><cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib232" title="A prototype machine translation system for Tatar and Bashkir based on free/open-source components" class="ltx_ref">44</a>]</cite> 
<br class="ltx_break"></p>
</div>
</li>
</ul>
</span></div>
</figure>
<div id="S5.SS2.p1" class="ltx_para">
<p class="ltx_p">In Table <a href="#S5.T5" title="Table 5 ‣ 5.2 Other languages and work ahead ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">5</span></a>, we show the performance of some unreleased
machine-translation systems from previous reports or publications.</p>
</div>
<div id="S5.SS2.p2" class="ltx_para">
<p class="ltx_p">An improvement in performance could be possible for these systems with time by
improving morphological disambiguation, adding more stems into the dictionaries,
and adding or refining lexical and structural transfer rules.</p>
</div>
<div id="S5.SS2.p3" class="ltx_para">
<p class="ltx_p">In addition to the languages pairs which have been mentioned in
Table <a href="#S5.T5" title="Table 5 ‣ 5.2 Other languages and work ahead ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">5</span></a>, there are many other pairs that are in various
stages of development but have not been systematically evaluated yet, such as
Basque–English, Cuzco Quechua–Spanish, Guaraní–Spanish,
Karelian–Finnish, Kazakh–Kyrgyz, Kazakh–Russian, Lingala–English,
Marathi–Hindi, Sorani Kurdish–Kurmanji Kurdish, Turkish–Uzbek, and
Uzbek–Qaraqalpaq, among others.</p>
</div>
</section>
</section>
<section id="S6" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">6 </span>Supplementary tools</h2>

<div id="S6.p1" class="ltx_para">
<p class="ltx_p">This section highlights supplementary tools maintained by Apertium that are
useful for developers as well as end-users. Apertium-viewer
(Section <a href="#S6.SS1" title="6.1 Apertium-viewer ‣ 6 Supplementary tools ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6.1</span></a>) is particularly useful for developers
interacting with Apertium resources. The Apertium website software
(Section <a href="#S6.SS2" title="6.2 Website software ‣ 6 Supplementary tools ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">6.2</span></a>) provides access to free and unlimited translation,
morphological analysis, and several under-development features like dictionary
lookup and a spell-checking interface, each of which can prove very useful for
end-users who do not wish to install any software locally.</p>
</div>
<section id="S6.SS1" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">6.1 </span>Apertium-viewer</h3>

<div id="S6.SS1.p1" class="ltx_para">
<p class="ltx_p">Apertium-viewer is a tool that makes it straightforward for users to view and
edit the output of the various stages of an Apertium translation. It reads a
translation pair’s “mode” configuration file, where the specific pipeline for
the translator is defined. It displays how a text changes as it cascades through
the modules, from the source to the target language. The user can change the
text string at every stage for debugging purposes. This tool can be useful for
understanding translation pairs and debugging translations.</p>
</div>
</section>
<section id="S6.SS2" class="ltx_subsection">
<h3 class="ltx_title ltx_title_subsection">
<span class="ltx_tag ltx_tag_subsection">6.2 </span>Website software</h3>

<div id="S6.SS2.p1" class="ltx_para">
<p class="ltx_p">Apertium offers an open-source web API and customisable website
front-end<span id="footnote24" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">24</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">24</sup>
              <span class="ltx_tag ltx_tag_note">24</span>
              
              
              
            <a href="https://apertium.org" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://apertium.org</a></span></span></span> <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib40" title="Apertium’s web toolchain for low-resource language technology" class="ltx_ref">7</a>]</cite>.
Apart from translating text, users can provide a URL to a webpage, which will be
translated with the formatting preserved. Note that the Apertium API and website
software can also be deployed by anyone for any purpose. The software also
provides a front-end to morphological transducers, and there are a number of
beta features under development.</p>
</div>
<div id="S6.SS2.p2" class="ltx_para">
<p class="ltx_p">One of these features is multi-step translation, where a user can use the
interface to translate from one language to another for which there isn’t an
Apertium translation pair via one or more pivot languages.<span id="footnote25" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">25</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">25</sup>
              <span class="ltx_tag ltx_tag_note">25</span>
              
              
              
            This feature
is enabled and available at <a href="https://beta.apertium.org/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://beta.apertium.org/</a>.</span></span></span></p>
</div>
<div id="S6.SS2.p3" class="ltx_para">
<p class="ltx_p">Another feature under development is dictionary lookup, where a user may use the
Apertium website as an online dictionary. That is, a word is not simply
translated, but all possible translations of the word are provided. The
community hopes to include some additional features with dictionary lookup,
including automatic reverse lookups (so that a user has a better understanding
of the results), grammatical information (such as the gender of nouns or the
conjugation paradigms of verbs), and information about MWEs.</p>
</div>
<div id="S6.SS2.p4" class="ltx_para">
<p class="ltx_p">A suggestions feature allows users to suggest corrections to translations. This
is especially helpful as developers can incorporate these corrections back into
the systems.
</p>
</div>
<div id="S6.SS2.p5" class="ltx_para">
<p class="ltx_p">One last feature under development is a spell-checking interface. This feature
provides users with a simple interface to check the spelling of words in a text,
and to be offered suggestions for misspelled words. It is noteworthy that there
are no known spell checkers available for some of the languages with
dictionaries in Apertium, such as Arpitan.</p>
</div>
</section>
</section>
<section id="S7" class="ltx_section">
<h2 class="ltx_title ltx_title_section">
<span class="ltx_tag ltx_tag_section">7 </span>Conclusion</h2>

<div id="S7.p1" class="ltx_para">
<p class="ltx_p">We have presented the latest updates to Apertium, a free and open-source
platform for machine translation, with a focus on MT for low-resource languages.
These updates include approaches to hybridisation of Apertium modules with
corpus-based approaches, new modules that are available for the Apertium RBMT
pipeline, and newly released languages pairs.</p>
</div>
<div id="S7.p2" class="ltx_para">
<p class="ltx_p">The new modules in the pipeline are all optional since they may be useful for
some specific languages pairs, but would not significantly improve others. With
an increasing number of released language pairs, Apertium becomes a preferred
vehicle for translation to and from low-resource languages, which are not as
easily implemented using widely advocated neural approaches to MT due to
sparsity of available text, and are also not considered economical for corporate
work. In addition, the different sub-components of translation pairs can be and
are used independently to produce other types of resources for these languages,
such as electronic dictionaries, a tool for searches in electronic corpora
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib205" title="Сложности при создании текстового корпуса объемом более 400 млн токенов" class="ltx_ref">53</a>, e.g.,]</cite>, spell
checkers,<span id="footnote26" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">26</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">26</sup>
            <span class="ltx_tag ltx_tag_note">26</span>
            
            
            
          Including, for example,
<a href="http://grammar.corpus.tatar/index_en.php?of=search/spellchecker.php" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://grammar.corpus.tatar/index_en.php?of=search/spellchecker.php</a></span></span></span> and
tools supporting language learning and revitalisation
<cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib93" title="Revita: a language-learning platform at the intersection of ITS and CALL" class="ltx_ref">16</a>, <a href="#bib.bib83" title="Tools for supporting language learning for Sakha" class="ltx_ref">13</a>, e.g.,]</cite>.</p>
</div>
</section>
<section id="bib" class="ltx_bibliography">
<h2 class="ltx_title ltx_title_bibliography">References</h2>

<ul id="bib.L1" class="ltx_biblist">
<li id="bib.bib12" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[1]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">L. Antonsen, T. Trosterud, and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2017)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A North Saami to South Saami machine translation prototype</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Lecture Notes in Artificial Intelligence</span> <span class="ltx_text ltx_bib_volume">4</span>, <span class="ltx_text ltx_bib_pages"> pp. 11–27</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://dx.doi.org/10.3384/nejlt.2000-1533.1642" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I2.ix3.p1" title="item z ‣ Table 5 ‣ 5.2 Other languages and work ahead ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item z</span></a>,
<a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib16" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[2]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">B. Baldwin</span><span class="ltx_text ltx_bib_year"> (1997)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">CogNIAC: high precision coreference with limited knowledge and linguistic resources</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of a Workshop on Operational Factors in
Practical, Robust Anaphora Resolution for Unrestricted Texts</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 38–45</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="http://portal.acm.org/citation.cfm?doid=1598819.1598825" title="" class="ltx_ref ltx_bib_external">Link</a>,
<a href="https://dx.doi.org/10.3115/1598819.1598825" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS3.p3" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3</span></a>.
</span>
</li>
<li id="bib.bib18" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[3]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">S. Bayatli, G. Karanfil, M. Gökırmak, and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2018-05)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Finite-state morphological analysis for Gagauz</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Eleventh International Conference on
Language Resources and Evaluation (LREC 2018)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_place">Miyazaki, Japan</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://www.aclweb.org/anthology/L18-1411" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib17" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[4]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">S. Bayatli, S. Kurnaz, A. Ali, J. N. Washington, and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2020)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Unsupervised weighting of transfer rules in rule-based machine translation using maximum-entropy approach</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Journal of Information Science and Engineering</span> <span class="ltx_text ltx_bib_volume">36</span> (<span class="ltx_text ltx_bib_number">2</span>), <span class="ltx_text ltx_bib_pages"> pp. 309–322</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://dx.doi.org/10.6688/JISE.202003%5F36%282%29.0010" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS3.p3" title="3.3 Structural transfer module ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.3</span></a>.
</span>
</li>
<li id="bib.bib19" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[5]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">S. Bayatli, S. Kurnaz, I. Salimzianov, J. N. Washington, and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2018)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Rule-based machine translation from Kazakh to Turkish</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">European Association for Machine Translation (EAMT)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 49–58</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I2.ix1.p1" title="item x ‣ Table 5 ‣ 5.2 Other languages and work ahead ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item x</span></a>.
</span>
</li>
<li id="bib.bib28" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[6]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">E. Bick and T. Didriksen</span><span class="ltx_text ltx_bib_year"> (2015)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">CG-3 – beyond classical constraint grammar</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 20th Nordic Conference of Computational
Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 31–39</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text issn ltx_bib_external">ISSN 1650-3740</span></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S1.p3" title="1 Introduction ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§1</span></a>,
<a href="#S2.I3.i1.p1" title="1st item ‣ 2 Overview of the Apertium platform ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1st item</span></a>,
<a href="#S3.SS1.p4" title="3.1 Morphological disambiguation ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.1</span></a>.
</span>
</li>
<li id="bib.bib40" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[7]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">S. Cherivirala, S. Chiplunkar, J. Washington, and K. Unhammer</span><span class="ltx_text ltx_bib_year"> (2018)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Apertium’s web toolchain for low-resource language technology</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 53–62</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://www.aclweb.org/anthology/W18-2207/" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S6.SS2.p1" title="6.2 Website software ‣ 6 Supplementary tools ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§6.2</span></a>.
</span>
</li>
<li id="bib.bib46" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[8]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">M. Constant, G. Eryiğit, J. Monti, L. van der Plas, C. Ramisch, M. Rosner, and A. Todirascu</span><span class="ltx_text ltx_bib_year"> (2017)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Multiword expression processing: a survey</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Computational Linguistics</span> <span class="ltx_text ltx_bib_volume">43</span> (<span class="ltx_text ltx_bib_number">4</span>), <span class="ltx_text ltx_bib_pages"> pp. 837–892</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://dx.doi.org/10.1162/COLI%5Fa%5F00302" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS2.p1" title="4.2 Processing multi-word expressions ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.2</span></a>.
</span>
</li>
<li id="bib.bib57" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[9]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">A. Z. Md. Faridee and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2009)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Development of a morphological analyser for Bengali</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the First International Workshop on
Free/Open-Source Rule-Based Machine Translation</span>,  <span class="ltx_text ltx_bib_editor">J.A. Pérez-Ortiz, F. Sánchez-Martínez, and F.M. Tyers (Eds.)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_place">Alicante, Spain</span>, <span class="ltx_text ltx_bib_pages"> pp. 43–50</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib60" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[10]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">M. L. Forcada, M. G. Rosell, J. Nordfalk, J. O’Regan, S. Ortiz-Rojas, J. A. Pérez-Ortiz, G. Ramírez-Sánchez, F. Sánchez-Martínez, and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2011)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Apertium: a free/open-source platform for rule-based machine translation platform</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Machine Translation</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#A1.p4" title="Appendix A List of released languages and translation pairs ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">Appendix A</span></a>,
<a href="#S1.p1" title="1 Introduction ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§1</span></a>,
<a href="#S1.p2" title="1 Introduction ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§1</span></a>,
<a href="#footnote27" title="footnote 27 ‣ Appendix A List of released languages and translation pairs ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">footnote 27</span></a>.
</span>
</li>
<li id="bib.bib62" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[11]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">G. Fronteddu, H. Alòs i Font, and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2017)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Una eina per a una llengua en procés d’estandardització: el traductor automàtic català–sard</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Linguamática</span> <span class="ltx_text ltx_bib_volume">9</span> (<span class="ltx_text ltx_bib_number">3</span>), <span class="ltx_text ltx_bib_pages"> pp. 3–20</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://dx.doi.org/10.21814/lm.9.2.255" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I1.ix6.p1" title="item f ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item f</span></a>.
</span>
</li>
<li id="bib.bib68" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[12]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">M. Gökırmak and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2017)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A dependency treebank for Kurmanji Kurdish</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the the International Conference on Dependency Linguistics, Depling 2017</span>,
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p2" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib83" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[13]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">S. Ivanova, A. Katinskaia, and R. Yangarber</span><span class="ltx_text ltx_bib_year"> (2019-September–October)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Tools for supporting language learning for Sakha</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 22nd Nordic Conference on Computational Linguistics</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_place">Turku, Finland</span>, <span class="ltx_text ltx_bib_pages"> pp. 155–163</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://www.aclweb.org/anthology/W19-6117" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S7.p2" title="7 Conclusion ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§7</span></a>.
</span>
</li>
<li id="bib.bib85" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[14]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">R. Johnson, T. A. Pirinen, T. Puolakainen, F. Tyers, T. Trosterud, and K. Unhammer</span><span class="ltx_text ltx_bib_year"> (2017)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">North-Sámi to Finnish rule-based machine translation system</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 21st Nordic Conference on
Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 115–122</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I2.ix2.p1" title="item y ‣ Table 5 ‣ 5.2 Other languages and work ahead ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item y</span></a>.
</span>
</li>
<li id="bib.bib87" class="ltx_bibitem ltx_bib_book">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[15]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. Karlsson, A. Voutilainen, J. Heikkilä, and A. Anttila</span><span class="ltx_text ltx_bib_year"> (1995)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Constraint grammar: a language-independent system for parsing unrestricted text</span>.
</span>
<span class="ltx_bibblock"> <span class="ltx_text ltx_bib_publisher">Mouton de Gruyter</span>, <span class="ltx_text ltx_bib_place">Berlin</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS1.p3" title="3.1 Morphological disambiguation ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.1</span></a>.
</span>
</li>
<li id="bib.bib93" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[16]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">A. Katinskaia, J. Nouri, and R. Yangarber</span><span class="ltx_text ltx_bib_year"> (2018)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Revita: a language-learning platform at the intersection of ITS and CALL</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of LREC: 11th International Conference on
Language Resources and Evaluation</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_place">Miyazaki, Japan</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S7.p2" title="7 Conclusion ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§7</span></a>.
</span>
</li>
<li id="bib.bib103" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[17]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">A. Kornai</span><span class="ltx_text ltx_bib_year"> (2013)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Digital language death</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">PLOS ONE</span> <span class="ltx_text ltx_bib_volume">8</span> (<span class="ltx_text ltx_bib_number">10</span>), <span class="ltx_text ltx_bib_pages"> pp. 1–11</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://dx.doi.org/10.1371/journal.pone.0077056" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.p1" title="5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5</span></a>.
</span>
</li>
<li id="bib.bib107" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[18]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">S. Lappin and H. J. Leass</span><span class="ltx_text ltx_bib_year"> (1994)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">An algorithm for pronominal anaphora resolution</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Computational Linguistics</span> <span class="ltx_text ltx_bib_volume">20</span> (<span class="ltx_text ltx_bib_number">4</span>), <span class="ltx_text ltx_bib_pages"> pp. 535–561</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://www.aclweb.org/anthology/J94-4002" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS3.p3" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3</span></a>.
</span>
</li>
<li id="bib.bib108" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[19]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">H. Lee, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky</span><span class="ltx_text ltx_bib_year"> (2013)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Deterministic coreference resolution based on entity-centric, precision-ranked rules</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Computational Linguistics</span> <span class="ltx_text ltx_bib_volume">39</span> (<span class="ltx_text ltx_bib_number">4</span>).
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS3.p3" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3</span></a>.
</span>
</li>
<li id="bib.bib116" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[20]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">K. Lindén, E. Axelson, S. Hardwick, F. A. Pirinen, and M. Silfverberg</span><span class="ltx_text ltx_bib_year"> (2011)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Hfst—framework for compiling and applying morphologies</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Systems and Frameworks for Computational Morphology</span>, <span class="ltx_text ltx_bib_pages"> pp. 67–85</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S1.p3" title="1 Introduction ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§1</span></a>.
</span>
</li>
<li id="bib.bib117" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[21]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">S. Loáiciga and E. Wehrli</span><span class="ltx_text ltx_bib_year"> (2015)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Rule-based pronominal anaphora treatment for machine translation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Second Workshop on Discourse in Machine Translation</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 86–93</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="http://aclweb.org/anthology/W15-2512" title="" class="ltx_ref ltx_bib_external">Link</a>,
<a href="https://dx.doi.org/10.18653/v1/W15-2512" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS3.p3" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3</span></a>.
</span>
</li>
<li id="bib.bib128" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[22]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">J. P. Martínez Cortés, J. O’Regan, and F. Tyers</span><span class="ltx_text ltx_bib_year"> (2012)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Free/open source shallow-transfer based machine translation for Spanish and Aragonese</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Eight International Conference on Language
Resources and Evaluation (LREC’12)</span>,
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I1.ix1.p1" title="item a ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item a</span></a>,
<a href="#S5.I1.ix18.p1" title="item r ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item r</span></a>.
</span>
</li>
<li id="bib.bib127" class="ltx_bibitem ltx_bib_thesis">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[23]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. S. Martínez</span><span class="ltx_text ltx_bib_year"> (2008)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Using unsupervised corpus-based methods to build rule-based machine translation systems</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_type">phdthesis</span>, <span class="ltx_text ltx_bib_publisher">Universidad de AlicanteUniversidad de Alicante</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://www.dlsi.ua.es//~fsanchez/pub/thesis/thesis-sin.pdf" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS3.p2" title="3.3 Structural transfer module ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.3</span></a>.
</span>
</li>
<li id="bib.bib124" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[24]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">M. Marting and K. B. Unhammer</span><span class="ltx_text ltx_bib_year"> (2014)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">FST trimming: ending dictionary redundancy in Apertium</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 9th Conference on Language Resources and Evaluation</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 19–24</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.p18" title="2 Overview of the Apertium platform ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§2</span></a>.
</span>
</li>
<li id="bib.bib135" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[25]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">R. Mitkov’s</span><span class="ltx_text ltx_bib_year"> (1999)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Multilingual anaphora resolution</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Machine Translation</span> <span class="ltx_text ltx_bib_volume">14</span>, <span class="ltx_text ltx_bib_pages"> pp. 281–299</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS3.SSS1.p1" title="4.3.1 Some unique features ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3.1</span></a>,
<a href="#S4.SS3.SSS2.p3" title="4.3.2 Example Usage ‣ 4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3.2</span></a>,
<a href="#S4.SS3.p3" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3</span></a>,
<a href="#S4.SS3.p4" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3</span></a>.
</span>
</li>
<li id="bib.bib145" class="ltx_bibitem ltx_bib_book">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[26]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_editor">C. Moseley (Ed.)</span><span class="ltx_text ltx_bib_year"> (2010)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Atlas of the world’s languages in danger</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_edition">3rd edition</span>,  <span class="ltx_text ltx_bib_publisher">UNESCO Publishing</span>.
</span>
<span class="ltx_bibblock">Note: <span class="ltx_text ltx_bib_note">Online version: http://www.unesco.org/languages-atlas/</span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib160" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[27]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">P. Otte and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2011)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Rapid rule-based machine translation between Dutch and Afrikaans</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">EAMT 2011: proceedings of the 15th conference of the European
Association for Machine Translation, 30-31 May 2011, Leuven, Belgium</span>,
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p2" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib163" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[28]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">H. Peradin and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2012)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A rule-based machine translation system from Serbo-Croatian to Macedonian</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Third International Workshop on Free/Open-Source Rule-Based Machine Translation (FreeRBMT 2012)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 55–63</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I1.ix19.p1" title="item s ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item s</span></a>.
</span>
</li>
<li id="bib.bib185" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[29]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. A. Pirinen</span><span class="ltx_text ltx_bib_year"> (2019)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Workflows for kickstarting rbmt in virtually no-resource situation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of The 2nd Workshop on Technologies for MT of
Low Resource Languages (LoResMT 2019)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_place">Dublin, Ireland</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib190" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[30]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">V. Ravishankar and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2017)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Finite-state morphological analysis for Marathi</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of The 13th International Conference on Finite State Methods and Natural Language Processing</span>,
</span>
<span class="ltx_bibblock">Note: <span class="ltx_text ltx_bib_note">(to appear)</span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib204" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[31]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">I. Salimzyanov, J. N. Washington, and F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2013)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A free/open-source Kazakh-Tatar machine translation system</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the XIV Machine Translation Summit</span>,  <span class="ltx_text ltx_bib_editor">K. Sima’an, M.L. Forcada, D. Grasmick, H. Depraetere, and A. Way (Eds.)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 175–182</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="http://www.mt-archive.info/10/MTS-2013-Salimzyanov.pdf" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p2" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib203" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[32]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">V. M. Sánchez-Cartagena, J. A. Pérez-Ortiz, and F. Sánchez-Martínez</span><span class="ltx_text ltx_bib_year"> (2015-07-01)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Computer Speech &amp; Language</span> <span class="ltx_text ltx_bib_volume">32</span> (<span class="ltx_text ltx_bib_number">1</span>), <span class="ltx_text ltx_bib_pages"> pp. 46–90</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text issn ltx_bib_external">ISSN 0885-2308</span>,
<a href="https://www.sciencedirect.com/science/article/pii/S0885230814001028" title="" class="ltx_ref ltx_bib_external">Link</a>,
<a href="https://dx.doi.org/https%3A//doi.org/10.1016/j.csl.2014.10.003" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS3.p2" title="3.3 Structural transfer module ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.3</span></a>.
</span>
</li>
<li id="bib.bib199" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[33]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. Sánchez-Martínez, C. Armentano-Oller, J. A. Pérez-Ortiz, and M. L. Forcada</span><span class="ltx_text ltx_bib_year"> (2007)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Training part-of-speech taggers to build machine translation systems for less-resourced language pairs</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Procesamiento del Lenguaje Natural, (XXIII Congreso de la
Sociedad Española de
Procesamiento del
Lenguaje Natural,
Sevilla, Spain,
10-12/09/2007)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 257–264</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS1.p2" title="3.1 Morphological disambiguation ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.1</span></a>.
</span>
</li>
<li id="bib.bib200" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[34]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. Sánchez-Martínez and M. L. Forcada</span><span class="ltx_text ltx_bib_year"> (2007)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Automatic induction of shallow-transfer rules for open-source machine translation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 11th Conference on Theoretical and
Methodological Issues in Machine Translation (TMI 2007)</span>,  <span class="ltx_text ltx_bib_editor">A. Way and B. Gawronska (Eds.)</span>,
</span>
<span class="ltx_bibblock">Vol. <span class="ltx_text ltx_bib_volume">2007:1</span>, <span class="ltx_text ltx_bib_pages"> pp. 181–190</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text isbn ltx_bib_external">ISBN 978-91-977095-0-7</span></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS3.p2" title="3.3 Structural transfer module ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.3</span></a>.
</span>
</li>
<li id="bib.bib202" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[35]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. Sánchez-Martínez and M. L. Forcada</span><span class="ltx_text ltx_bib_year"> (2009)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Inferring shallow-transfer machine translation rules from small parallel corpora</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Journal of Artificial Intelligence Research</span> <span class="ltx_text ltx_bib_volume">34</span>, <span class="ltx_text ltx_bib_pages"> pp. 605–635</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS3.p2" title="3.3 Structural transfer module ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.3</span></a>.
</span>
</li>
<li id="bib.bib82" class="ltx_bibitem ltx_bib_incollection">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[36]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. Sánchez-Martínez and H. Ney</span><span class="ltx_text ltx_bib_year"> (2006)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Using alignment templates to infer shallow-transfer machine translation rules</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Advances in Natural Language Processing</span>,  <span class="ltx_text ltx_bib_editor">T. Salakoski, F. Ginter, S. Pyysalo, and T. Pahikkala (Eds.)</span>,
</span>
<span class="ltx_bibblock">Vol. <span class="ltx_text ltx_bib_volume">4139</span>, <span class="ltx_text ltx_bib_pages"> pp. 756–767</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><span class="ltx_text isbn ltx_bib_external">ISBN 978-3-540-37334-6 978-3-540-37336-0</span>,
<a href="http://link.springer.com/10.1007/11816508_75" title="" class="ltx_ref ltx_bib_external">Link</a>,
<a href="https://dx.doi.org/10.1007/11816508%5F75" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS3.p2" title="3.3 Structural transfer module ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.3</span></a>.
</span>
</li>
<li id="bib.bib198" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[37]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. Sánchez-Martínez, J. A. Pérez-Ortiz, and M. L. Forcada</span><span class="ltx_text ltx_bib_year"> (2006)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Speeding up target language driven part-of-speech tagger training for machine translation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 5th Mexican International Conference on Artificial Intelligence, MICAI 2006</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 844–854</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS1.p2" title="3.1 Morphological disambiguation ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.1</span></a>.
</span>
</li>
<li id="bib.bib220" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[38]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">D. G. Swanson, J. N. Washington, F. M. Tyers, and M. L. Forcada</span><span class="ltx_text ltx_bib_year"> (2021)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A tree-based structural transfer module for the Apertium machine translation platform</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Machine Translation</span>.
</span>
<span class="ltx_bibblock">Note: <span class="ltx_text ltx_bib_note">to appear</span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS1.p3" title="4.1 Recursive structural transfer ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.1</span></a>.
</span>
</li>
<li id="bib.bib221" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[39]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">Translators without Borders</span><span class="ltx_text ltx_bib_year"> (2016-11-30)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Translators without Borders develops world’s first crisis-specific machine translation for Kurdish</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Slator</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://slator.com/press-releases/translators-without-borders-develops-the-worlds-first-crisis-specific-machine-translation-system-for-kurdish-languages/" title="" class="ltx_ref ltx_bib_external">Link</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib223" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[40]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">T. Trosterud and K. B. Unhammer</span><span class="ltx_text ltx_bib_year"> (2012)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Evaluating North Sámi to Norwegian assimilation rbmt</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Third International Workshop on Free/Open-Source Rule-Based Machine Translation (FreeRBMT 2012)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 13–25</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I1.ix12.p1" title="item l ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item l</span></a>.
</span>
</li>
<li id="bib.bib226" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[41]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. Trouilleux</span><span class="ltx_text ltx_bib_year"> (2002)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A rule-based pronoun resolution system for French</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">4th Discourse Anaphora and Anaphor Resolution Colloquium</span>, <span class="ltx_text ltx_bib_pages"> pp. 7</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS3.p3" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3</span></a>.
</span>
</li>
<li id="bib.bib233" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[42]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers, F. Sánchez-Martínez, and M. L. Forcada</span><span class="ltx_text ltx_bib_year"> (2014)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Unsupervised training of maximum-entropy models for lexical selection in rule-based machine translation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 18th Annual Conference of the European Association for Machine Translation</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 145–153</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS2.p2" title="3.2 Lexical selection ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.2</span></a>.
</span>
</li>
<li id="bib.bib231" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[43]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers, F. Sánchez-Martínez, M. L. Forcada, <span class="ltx_text ltx_bib_etal">et al.</span></span><span class="ltx_text ltx_bib_year"> (2012)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Flexible finite-state lexical selection for rule-based machine translation</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S2.I6.i1.p1" title="1st item ‣ 2 Overview of the Apertium platform ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">1st item</span></a>,
<a href="#S3.SS2.p1" title="3.2 Lexical selection ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.2</span></a>.
</span>
</li>
<li id="bib.bib232" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[44]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers, J. N. Washington, I. Salimzyanov, and R. Batalov</span><span class="ltx_text ltx_bib_year"> (2012)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">A prototype machine translation system for Tatar and Bashkir based on free/open-source components</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">First Workshop on Language Resources and Technologies for Turkic Languages</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 11</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I2.ix4.p1" title="item aa ‣ Table 5 ‣ 5.2 Other languages and work ahead ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item aa</span></a>,
<a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib227" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[45]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers and K. Donnelly</span><span class="ltx_text ltx_bib_year"> (2009)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Apertium-cy – a collaboratively-developed free RBMT system for Welsh to English</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">The Prague Bulletin of Mathematical Linguistics</span> <span class="ltx_text ltx_bib_volume">91</span>, <span class="ltx_text ltx_bib_pages"> pp. 57–66</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I1.ix21.p1" title="item u ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item u</span></a>.
</span>
</li>
<li id="bib.bib234" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[46]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers, G. Fronteddu, H. Alòs i Font, and A. Martín-Mor</span><span class="ltx_text ltx_bib_year"> (2017)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Rule-based machine translation for the Italian–Sardinian language pair</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">The Prague Bulletin of Mathematical Linguistics</span>, <span class="ltx_text ltx_bib_pages"> pp. 221–232</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://dx.doi.org/10.1515/pralin-2017-0022" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I1.ix11.p1" title="item k ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item k</span></a>.
</span>
</li>
<li id="bib.bib229" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[47]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers, L. Wiechetek, and T. Trosterud</span><span class="ltx_text ltx_bib_year"> (2009)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Developing prototypes for machine translation between two Sámi languages</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 13th Annual Conference of the European Association of Machine Translation, EAMT09</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 120–128</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib228" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[48]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2009)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Rule-based augmentation of training data in Breton–French statistical machine translation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the 13th Annual Conference of the European Association of Machine Translation, EAMT09</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 213–218</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I1.ix3.p1" title="item c ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item c</span></a>.
</span>
</li>
<li id="bib.bib230" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[49]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">F. M. Tyers</span><span class="ltx_text ltx_bib_year"> (2010)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Rule-based Breton to French machine translation</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceeedings of the 14th Annual Conference of the European Association of Machine Translation, EAMT10</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 174–181</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.I1.ix3.p1" title="item c ‣ Table 4 ‣ 5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">item c</span></a>.
</span>
</li>
<li id="bib.bib246" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[50]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">J. N. Washington, I. Salimzianov, F. M. Tyers, M. Gökırmak, S. Ivanova, and O. Kuyrukçu</span><span class="ltx_text ltx_bib_year"> (to appear)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Free/open-source technologies for Turkic languages developed in the Apertium project</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the Seventh International Conference on
Computer Processing of Turkic Languages (TurkLang 2019)</span>,
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S5.SS1.p4" title="5.1 Released translation pairs ‣ 5 Supporting minoritised languages ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§5.1</span></a>.
</span>
</li>
<li id="bib.bib261" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[51]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">A. Zeldes and S. Zhang</span><span class="ltx_text ltx_bib_year"> (2016)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">When annotation schemes change rules help: a configurable approach to coreference resolution beyond ontonotes</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Proceedings of the NAACL2016 Workshop on Coreference Resolution Beyond OntoNotes (CORBON)</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 92–101</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S4.SS3.p3" title="4.3 Anaphora resolution ‣ 4 New modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§4.3</span></a>.
</span>
</li>
<li id="bib.bib262" class="ltx_bibitem ltx_bib_article">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[52]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">Y. Zhang and S. Clark</span><span class="ltx_text ltx_bib_year"> (2011)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Syntactic processing using the generalized perceptron and beam search</span>.
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_journal">Computational Linguistics</span> <span class="ltx_text ltx_bib_volume">37</span> (<span class="ltx_text ltx_bib_number">1</span>), <span class="ltx_text ltx_bib_pages"> pp. 105–151</span>.
</span>
<span class="ltx_bibblock">External Links: <span class="ltx_text ltx_bib_links"><a href="https://www.aclweb.org/anthology/J11-1005" title="" class="ltx_ref ltx_bib_external">Link</a>,
<a href="https://dx.doi.org/10.1162/coli%5Fa%5F00037" title="" class="ltx_ref doi ltx_bib_external">Document</a></span>
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S3.SS1.p3" title="3.1 Morphological disambiguation ‣ 3 Use of corpus-based approaches in Apertium modules ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§3.1</span></a>.
</span>
</li>
<li id="bib.bib205" class="ltx_bibitem ltx_bib_inproceedings">
<span class="ltx_tag ltx_bib_key ltx_role_refnum ltx_tag_bibitem">[53]</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_author">М. Р. Сайхунов, Р. Р. Хусаинов, and Т. И. Ибрагимов</span><span class="ltx_text ltx_bib_year"> (2019)</span>
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_title">Сложности при создании текстового корпуса объемом более 400 млн токенов</span>.
</span>
<span class="ltx_bibblock">In <span class="ltx_text ltx_bib_inbook">Финоо-угорский мир в полиэтничном пространстве России:
культурное наследие и новые вызовы</span>,
</span>
<span class="ltx_bibblock"><span class="ltx_text ltx_bib_pages"> pp. 548–554</span>.
</span>
<span class="ltx_bibblock ltx_bib_cited">Cited by: <a href="#S7.p2" title="7 Conclusion ‣ Recent advances in Apertium, a free / open-source rule-based machine translation platform for low-resource languages 1 footnote 1 1 footnote 1 Springer Open Access publication. This version from pre-print latex form does not contain some changes made in the editorial process. Published version available: https://link.springer.com/article/10.1007/s10590-021-09260-6" class="ltx_ref"><span class="ltx_text ltx_ref_tag">§7</span></a>.
</span>
</li>
</ul>
</section>
<div class="ltx_pagination ltx_role_newpage"></div>
<section id="A1" class="ltx_appendix">
<h2 class="ltx_title ltx_title_appendix">
<span class="ltx_tag ltx_tag_appendix">Appendix A </span>List of released languages and translation pairs</h2>

<div id="A1.p1" class="ltx_para">
<p class="ltx_p">The released languages are:<span id="footnote27" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">27</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">27</sup>
            <span class="ltx_tag ltx_tag_note">27</span>
            
            
            
          An asterisk (*) indicates that the language
has been released since the previous publication <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib60" title="Apertium: a free/open-source platform for rule-based machine translation platform" class="ltx_ref">10</a>]</cite>.
Norwegian Bokmål and Norwegian Nynorsk are considered two different languages in
Apertium since there is a translator from one to the other, which is not the
case between different varieties of Catalan, Occitan and Portuguese that are
supported in Apertium.</span></span></span></p>
</div>
<div id="A1.p2" class="ltx_para">
<ul id="A1.I1" class="ltx_itemize">
<li id="A1.I1.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i1.p1" class="ltx_para">
<p class="ltx_p">Afrikaans*</p>
</div>
</li>
<li id="A1.I1.i2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i2.p1" class="ltx_para">
<p class="ltx_p">Arabic*</p>
</div>
</li>
<li id="A1.I1.i3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i3.p1" class="ltx_para">
<p class="ltx_p">Aragonese*</p>
</div>
</li>
<li id="A1.I1.i4" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i4.p1" class="ltx_para">
<p class="ltx_p">Arpitan*
</p>
</div>
</li>
<li id="A1.I1.i5" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i5.p1" class="ltx_para">
<p class="ltx_p">Asturian</p>
</div>
</li>
<li id="A1.I1.i6" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i6.p1" class="ltx_para">
<p class="ltx_p">Basque</p>
</div>
</li>
<li id="A1.I1.i7" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i7.p1" class="ltx_para">
<p class="ltx_p">Belarusian*</p>
</div>
</li>
<li id="A1.I1.i8" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i8.p1" class="ltx_para">
<p class="ltx_p">Breton</p>
</div>
</li>
<li id="A1.I1.i9" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i9.p1" class="ltx_para">
<p class="ltx_p">Bulgarian</p>
</div>
</li>
<li id="A1.I1.i10" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i10.p1" class="ltx_para">
<p class="ltx_p">Catalan</p>
</div>
</li>
<li id="A1.I1.i11" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i11.p1" class="ltx_para">
<p class="ltx_p">Danish</p>
</div>
</li>
<li id="A1.I1.i12" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i12.p1" class="ltx_para">
<p class="ltx_p">Dutch*</p>
</div>
</li>
<li id="A1.I1.i13" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i13.p1" class="ltx_para">
<p class="ltx_p">English</p>
</div>
</li>
<li id="A1.I1.i14" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i14.p1" class="ltx_para">
<p class="ltx_p">Esperanto</p>
</div>
</li>
<li id="A1.I1.i15" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i15.p1" class="ltx_para">
<p class="ltx_p">French</p>
</div>
</li>
<li id="A1.I1.i16" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i16.p1" class="ltx_para">
<p class="ltx_p">Galician</p>
</div>
</li>
<li id="A1.I1.i17" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i17.p1" class="ltx_para">
<p class="ltx_p">Hindi*</p>
</div>
</li>
<li id="A1.I1.i18" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i18.p1" class="ltx_para">
<p class="ltx_p">Icelandic</p>
</div>
</li>
<li id="A1.I1.i19" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i19.p1" class="ltx_para">
<p class="ltx_p">Indonesian*</p>
</div>
</li>
<li id="A1.I1.i20" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i20.p1" class="ltx_para">
<p class="ltx_p">Italian</p>
</div>
</li>
<li id="A1.I1.i21" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i21.p1" class="ltx_para">
<p class="ltx_p">Kazakh*</p>
</div>
</li>
<li id="A1.I1.i22" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i22.p1" class="ltx_para">
<p class="ltx_p">Macedonian</p>
</div>
</li>
<li id="A1.I1.i23" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i23.p1" class="ltx_para">
<p class="ltx_p">Malaysian*</p>
</div>
</li>
<li id="A1.I1.i24" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i24.p1" class="ltx_para">
<p class="ltx_p">Maltese*</p>
</div>
</li>
<li id="A1.I1.i25" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i25.p1" class="ltx_para">
<p class="ltx_p">Norwegian Bokmål</p>
</div>
</li>
<li id="A1.I1.i26" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i26.p1" class="ltx_para">
<p class="ltx_p">Norwegian Nynorsk</p>
</div>
</li>
<li id="A1.I1.i27" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i27.p1" class="ltx_para">
<p class="ltx_p">Occitan</p>
</div>
</li>
<li id="A1.I1.i28" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i28.p1" class="ltx_para">
<p class="ltx_p">Polish*</p>
</div>
</li>
<li id="A1.I1.i29" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i29.p1" class="ltx_para">
<p class="ltx_p">Portuguese
</p>
</div>
</li>
<li id="A1.I1.i30" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i30.p1" class="ltx_para">
<p class="ltx_p">Romanian</p>
</div>
</li>
<li id="A1.I1.i31" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i31.p1" class="ltx_para">
<p class="ltx_p">Russian*</p>
</div>
</li>
<li id="A1.I1.i32" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i32.p1" class="ltx_para">
<p class="ltx_p">North Sámi*</p>
</div>
</li>
<li id="A1.I1.i33" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i33.p1" class="ltx_para">
<p class="ltx_p">Sardinian*</p>
</div>
</li>
<li id="A1.I1.i34" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i34.p1" class="ltx_para">
<p class="ltx_p">Serbo-Croatian*</p>
</div>
</li>
<li id="A1.I1.i35" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i35.p1" class="ltx_para">
<p class="ltx_p">Silesian*</p>
</div>
</li>
<li id="A1.I1.i36" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i36.p1" class="ltx_para">
<p class="ltx_p">Slovenian*</p>
</div>
</li>
<li id="A1.I1.i37" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i37.p1" class="ltx_para">
<p class="ltx_p">Spanish</p>
</div>
</li>
<li id="A1.I1.i38" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i38.p1" class="ltx_para">
<p class="ltx_p">Swedish</p>
</div>
</li>
<li id="A1.I1.i39" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i39.p1" class="ltx_para">
<p class="ltx_p">Tatar*</p>
</div>
</li>
<li id="A1.I1.i40" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i40.p1" class="ltx_para">
<p class="ltx_p">Crimean Tatar*</p>
</div>
</li>
<li id="A1.I1.i41" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i41.p1" class="ltx_para">
<p class="ltx_p">Turkish*</p>
</div>
</li>
<li id="A1.I1.i42" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i42.p1" class="ltx_para">
<p class="ltx_p">Ukrainian*</p>
</div>
</li>
<li id="A1.I1.i43" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i43.p1" class="ltx_para">
<p class="ltx_p">Urdu*</p>
</div>
</li>
<li id="A1.I1.i44" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I1.i44.p1" class="ltx_para">
<p class="ltx_p">Welsh</p>
</div>
</li>
</ul>
</div>
<div id="A1.p3" class="ltx_para ltx_noindent">
<p class="ltx_p">The released language pairs (with indication of the translation directions and novelty) are:</p>
<ul id="A1.I2" class="ltx_itemize">
<li id="A1.I2.i1" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i1.p1" class="ltx_para">
<p class="ltx_p">Afrikaans ⇆ Dutch*</p>
</div>
</li>
<li id="A1.I2.i2" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i2.p1" class="ltx_para">
<p class="ltx_p">Aragonese ⇆ Catalan*</p>
</div>
</li>
<li id="A1.I2.i3" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i3.p1" class="ltx_para">
<p class="ltx_p">Aragonese ⇆ Spanish*
</p>
</div>
</li>
<li id="A1.I2.i4" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i4.p1" class="ltx_para">
<p class="ltx_p">Basque → English*</p>
</div>
</li>
<li id="A1.I2.i5" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i5.p1" class="ltx_para">
<p class="ltx_p">Basque → Spanish</p>
</div>
</li>
<li id="A1.I2.i6" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i6.p1" class="ltx_para">
<p class="ltx_p">Belarusian ⇆ Russian*</p>
</div>
</li>
<li id="A1.I2.i7" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i7.p1" class="ltx_para">
<p class="ltx_p">Breton → French</p>
</div>
</li>
<li id="A1.I2.i8" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i8.p1" class="ltx_para">
<p class="ltx_p">Bulgarian ⇆ Macedonian</p>
</div>
</li>
<li id="A1.I2.i9" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i9.p1" class="ltx_para">
<p class="ltx_p">Catalan ⇆ English</p>
</div>
</li>
<li id="A1.I2.i10" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i10.p1" class="ltx_para">
<p class="ltx_p">Catalan → Esperanto</p>
</div>
</li>
<li id="A1.I2.i11" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i11.p1" class="ltx_para">
<p class="ltx_p">Catalan ⇆ French</p>
</div>
</li>
<li id="A1.I2.i12" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i12.p1" class="ltx_para">
<p class="ltx_p">Catalan ⇆ Italian</p>
</div>
</li>
<li id="A1.I2.i13" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i13.p1" class="ltx_para">
<p class="ltx_p">Catalan ⇆ Occitan</p>
</div>
</li>
<li id="A1.I2.i14" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i14.p1" class="ltx_para">
<p class="ltx_p">Catalan ⇆ Portuguese</p>
</div>
</li>
<li id="A1.I2.i15" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i15.p1" class="ltx_para">
<p class="ltx_p">Catalan ⇆ Romanian*</p>
</div>
</li>
<li id="A1.I2.i16" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i16.p1" class="ltx_para">
<p class="ltx_p">Catalan → Sardinian*</p>
</div>
</li>
<li id="A1.I2.i17" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i17.p1" class="ltx_para">
<p class="ltx_p">Catalan ⇆ Spanish</p>
</div>
</li>
<li id="A1.I2.i18" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i18.p1" class="ltx_para">
<p class="ltx_p">Danish ⇆ Norwegian*</p>
</div>
</li>
<li id="A1.I2.i19" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i19.p1" class="ltx_para">
<p class="ltx_p">Danish ⇆ Swedish</p>
</div>
</li>
<li id="A1.I2.i20" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i20.p1" class="ltx_para">
<p class="ltx_p">English ⇆ Esperanto</p>
</div>
</li>
<li id="A1.I2.i21" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i21.p1" class="ltx_para">
<p class="ltx_p">English ⇆ Galician</p>
</div>
</li>
<li id="A1.I2.i22" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i22.p1" class="ltx_para">
<p class="ltx_p">English ⇆ Spanish</p>
</div>
</li>
<li id="A1.I2.i23" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i23.p1" class="ltx_para">
<p class="ltx_p">French → Arpitan*</p>
</div>
</li>
<li id="A1.I2.i24" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i24.p1" class="ltx_para">
<p class="ltx_p">French → Esperanto</p>
</div>
</li>
<li id="A1.I2.i25" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i25.p1" class="ltx_para">
<p class="ltx_p">French → Occitan*</p>
</div>
</li>
<li id="A1.I2.i26" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i26.p1" class="ltx_para">
<p class="ltx_p">French ⇆ Spanish</p>
</div>
</li>
<li id="A1.I2.i27" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i27.p1" class="ltx_para">
<p class="ltx_p">Galician ⇆ Portuguese</p>
</div>
</li>
<li id="A1.I2.i28" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i28.p1" class="ltx_para">
<p class="ltx_p">Galician ⇆ Spanish
</p>
</div>
</li>
<li id="A1.I2.i29" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i29.p1" class="ltx_para">
<p class="ltx_p">Hindi ⇆ Urdu*</p>
</div>
</li>
<li id="A1.I2.i30" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i30.p1" class="ltx_para">
<p class="ltx_p">Icelandic → English</p>
</div>
</li>
<li id="A1.I2.i31" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i31.p1" class="ltx_para">
<p class="ltx_p">Icelandic ⇆ Swedish*</p>
</div>
</li>
<li id="A1.I2.i32" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i32.p1" class="ltx_para">
<p class="ltx_p">Indonesian ⇆ Malaysian*</p>
</div>
</li>
<li id="A1.I2.i33" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i33.p1" class="ltx_para">
<p class="ltx_p">Italian → Sardinian*</p>
</div>
</li>
<li id="A1.I2.i34" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i34.p1" class="ltx_para">
<p class="ltx_p">Kazakh ⇆ Tatar*</p>
</div>
</li>
<li id="A1.I2.i35" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i35.p1" class="ltx_para">
<p class="ltx_p">Macedonian → English</p>
</div>
</li>
<li id="A1.I2.i36" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i36.p1" class="ltx_para">
<p class="ltx_p">Maltese → Arabic*</p>
</div>
</li>
<li id="A1.I2.i37" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i37.p1" class="ltx_para">
<p class="ltx_p">Norwegian ⇆ Swedish*</p>
</div>
</li>
<li id="A1.I2.i38" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i38.p1" class="ltx_para">
<p class="ltx_p">Norwegian Bokmål ⇆ Nynorsk</p>
</div>
</li>
<li id="A1.I2.i39" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i39.p1" class="ltx_para">
<p class="ltx_p">Occitan ⇆ Spanish</p>
</div>
</li>
<li id="A1.I2.i40" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i40.p1" class="ltx_para">
<p class="ltx_p">Polish → Silesian*</p>
</div>
</li>
<li id="A1.I2.i41" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i41.p1" class="ltx_para">
<p class="ltx_p">Portuguese ⇆ Spanish</p>
</div>
</li>
<li id="A1.I2.i42" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i42.p1" class="ltx_para">
<p class="ltx_p">Romanian → Spanish</p>
</div>
</li>
<li id="A1.I2.i43" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i43.p1" class="ltx_para">
<p class="ltx_p">Russian ⇆ Ukrainian*</p>
</div>
</li>
<li id="A1.I2.i44" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i44.p1" class="ltx_para">
<p class="ltx_p">North Sámi → Norwegian*</p>
</div>
</li>
<li id="A1.I2.i45" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i45.p1" class="ltx_para">
<p class="ltx_p">Serbo-Croatian → English*</p>
</div>
</li>
<li id="A1.I2.i46" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i46.p1" class="ltx_para">
<p class="ltx_p">Serbo-Croatian → Macedonian*</p>
</div>
</li>
<li id="A1.I2.i47" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i47.p1" class="ltx_para">
<p class="ltx_p">Serbo-Croatian ⇆ Slovenian*</p>
</div>
</li>
<li id="A1.I2.i48" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i48.p1" class="ltx_para">
<p class="ltx_p">Spanish → Asturian</p>
</div>
</li>
<li id="A1.I2.i49" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i49.p1" class="ltx_para">
<p class="ltx_p">Spanish → Esperanto</p>
</div>
</li>
<li id="A1.I2.i50" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i50.p1" class="ltx_para">
<p class="ltx_p">Crimean Tatar → Turkish*</p>
</div>
</li>
<li id="A1.I2.i51" class="ltx_item" style="list-style-type:none;">
<span class="ltx_tag ltx_tag_item">•</span> 
<div id="A1.I2.i51.p1" class="ltx_para">
<p class="ltx_p">Welsh → English</p>
</div>
</li>
</ul>
</div>
<div id="A1.p4" class="ltx_para">
<p class="ltx_p">It should be noted that many of the pairs already released in the last
publication <cite class="ltx_cite ltx_citemacro_cite">[<a href="#bib.bib60" title="Apertium: a free/open-source platform for rule-based machine translation platform" class="ltx_ref">10</a>]</cite> have been updated. For example, the
Catalan-French pair previously had a bilingual dictionary of 10,554 entries,
while in December 2020 it has 71,537.</p>
</div>
<div class="ltx_pagination ltx_role_newpage"></div>
</section>
</article>
</div>
<footer class="ltx_page_footer">
<div class="ltx_page_logo">Generated  on Fri Feb  4 08:41:29 2022 by <a href="http://dlmf.nist.gov/LaTeXML/">LaTeXML <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg==" alt="[LOGO]"></a>
</div></footer>
</div>
</body>
</html>