2023-rblt/foreword.html

<!DOCTYPE html><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Rule-Based Language Technology: Preface   1  footnote 1  1  footnote 1  This is Flammieâs draft, official version may differ. #1 This work is licensed under a Creative Commons Attribution–NonCommercial-NoDerivatives 4.0 International Licence. Licence details: http://creativecommons.org/licenses/by-nc-nd/4.0/. Find the Publisher’s version from: https://dspace.ut.ee/handle/10062/89595  </title>
<!--Generated on Thu Apr 20 14:40:36 2023 by LaTeXML (version 0.8.6) http://dlmf.nist.gov/LaTeXML/.-->

<link rel="stylesheet" href="../latexml/LaTeXML.css" type="text/css">
<link rel="stylesheet" href="../latexml/ltx-article.css" type="text/css">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
</head>
<body>
<div class="ltx_page_main">
<div class="ltx_page_content">
<article class="ltx_document">
<h1 class="ltx_title ltx_title_document">Rule-Based Language Technology: Preface <span id="footnote1" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup>
        <span class="ltx_tag ltx_tag_note">1</span>
        
        
        
      This is Flammieâs draft, official version may differ. #1</span></span></span>
This work is licensed under a Creative Commons Attribution–NonCommercial-NoDerivatives
4.0 International Licence. Licence details:
<a href="http://creativecommons.org/licenses/by-nc-nd/4.0/" title="" class="ltx_ref ltx_url ltx_font_typewriter">http://creativecommons.org/licenses/by-nc-nd/4.0/</a>.
Find the Publisher’s version from:
<a href="https://dspace.ut.ee/handle/10062/89595" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://dspace.ut.ee/handle/10062/89595</a>
</h1>

<section id="Sx1" class="ltx_section">
<h2 class="ltx_title ltx_title_section">Preface</h2>

<div id="Sx1.p1" class="ltx_para">
<p class="ltx_p">When we search with Google search engine information on rule-based language
technology, we get publications such as Why did rule-based language processing finally
fail<span id="footnote2" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup>
            <span class="ltx_tag ltx_tag_note">2</span>
            
            
            
          <a href="https://medium.com/voice-tech-podcast/why-did-rule-based-natural-language-processing-finallyfail-ff5e3eae16e8" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://medium.com/voice-tech-podcast/why-did-rule-based-natural-language-processing-finallyfail-ff5e3eae16e8</a></span></span></span>
and Rule-based information extraction is
dead!<span id="footnote3" class="ltx_note ltx_role_footnote"><sup class="ltx_note_mark">3</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">3</sup>
            <span class="ltx_tag ltx_tag_note">3</span>
            
            
            
          <a href="https://aclanthology.org/D13-1079/" title="" class="ltx_ref ltx_url ltx_font_typewriter">https://aclanthology.org/D13-1079/</a></span></span></span> The writer of the former article fails
to understand the current development phase of rule-based language technology and is
ready to state that the technology has failed. The heading of the latter article is interesting.
When we open the article, we get the full heading Rule-Based Information Extraction is
Dead! Long Live Rule-Based Information Extraction Systems! The first part of the title
gives the impression that the writer really thinks that the approach is dead, although he
means the opposite. Copying the announcement styles of the British Empire when the
king or queen has passed away, may turn out to be disastrous, because people usually see
on the listing only the beginning of a long title.</p>
</div>
<div id="Sx1.p2" class="ltx_para">
<p class="ltx_p">Why have we decided to compile this book? Is it in defence of the withering
technique?</p>
</div>
<div id="Sx1.p3" class="ltx_para">
<p class="ltx_p">There are two reasons for this. First, although rule-based language technology has a
long history, much longer than the statistical or neural language technology, there is no
such publication that would describe the main approaches using this theoretical
background. Therefore, we decided to invite the key researchers using rule-based
approaches to describe their systems in detail and in such a language that also those who
are not initiated into the systems themselves could get a picture of how the systems work.
This was at least our aim. It is up to the reader to judge whether we have succeeded in it.</p>
</div>
<div id="Sx1.p4" class="ltx_para">
<p class="ltx_p">The second reason for writing the book is that we want to show that the rule-based
approaches suit also to languages with scarce resources, and that a large number or
applications can be constructed using rule-based technology.</p>
</div>
<div id="Sx1.p5" class="ltx_para">
<p class="ltx_p">We are also concerned about the current trend on investing financial resources for
such technologies, which automatically leave out at least 95 percentage of world’s
languages. Africa, for example, with rapidly increasing population, is in danger of getting
marginalised in terms of language technology, unless rule-based technology is taken as
the basis in developing language applications for African languages.
Rule-based technology is capable of giving prestige to such local languages, which
currently are under strong stress as dying languages, because statistical methods do not
suit to them due to insufficient language resources.</p>
</div>
<div id="Sx1.p6" class="ltx_para">
<p class="ltx_p">The book has two sections. In the first section, there are descriptions of such
platforms, which have been developing rule-based systems for a number of years.
Although the use of different programming languages might give the impression that the
approaches are very different, they struggle with the same problems, each searching for
solutions to them. The second section includes descriptions of specific problems. It also
contains articles on various applications for meeting the needs of the public.</p>
</div>
<div id="Sx1.p7" class="ltx_para">
<p class="ltx_p">Among the development platforms described here, two platforms constitute the main
branches of rule-based language technology.</p>
</div>
<div id="Sx1.p8" class="ltx_para">
<p class="ltx_p">The oldest of these three is the two-level description of languages using finite-state
methods. The system included the two-level rule component, which made it possible to
reduce the size of the lexicon, because the rules took care of the morphophonological
changes. The first implementations were published in the 1980’ies. The finite-state
methods were further developed in Xerox Research Centre in France. Also a new rule
system was developed as an alternative for two-level rules. The result of the work by
Xerox is known as Xerox Finite-State Tools, or in short, xfst. This tool package was later
re-implemented in open domain with the name foma. Xfst was further developed in the
HFST tool package, described in this book. Also GiellaLT, described in this book, has
made extensive use of this technology. Salama makes use of the original implementation
of the two-level description.</p>
</div>
<div id="Sx1.p9" class="ltx_para">
<p class="ltx_p">Grammatical Framework differs from the other platforms mainly in that it splits the
structure of the language into two levels. On the deep level, all languages have a common
structure. The description of individual languages takes this deep level as starting point
and branches out into individual surface level languages. In principle, all languages form
a network, where a well-described language can be translated to any other language of the
network. A large number of languages have already been included into the system,
although many of them only to limited extent.</p>
</div>
<div id="Sx1.p10" class="ltx_para">
<p class="ltx_p">In addition to the two main branches of rule-based language technology, the book also
contains descriptions of platforms that are extensions or applications of the main
platforms. One of them is GiellaLT Infrastructure, developed mainly in Tromssa,
Norway. It has put emphasis on developing language tools for minority languages, thus
representing the majority of world’s languages. Instead of developing its own basic
development platforms, it makes use of such platforms as lexc and twolc in
morphological description, as well as of Constraint Grammar in disambiguation and
syntactic mapping, making all these platforms in open access. Giella LT has also been
working on the keyboard problem that is an obstacle in working with many minority
languages with non-standard characters. In machine translation, GiellaLT relies on the
Apertium solution, converting the analysis result to the standards required by Apertium.
In all, GiellaLT addresses various needs of minority language communities, taking the
needs of language users into focus.</p>
</div>
<div id="Sx1.p11" class="ltx_para">
<p class="ltx_p">Apertium focuses primarily on machine translation. It was originally designed for
translation between closely related languages, mainly those in Spain and Portugal, but
over the years the scope has expanded to contain a large number of very different
languages. Its technical and linguistic modules are strictly in open access, which makes it
possible to use the components also in other projects. Apertium makes use of rule-based
technology, such as HFST, LexC (also known as lexc), lexd, and the original lttoolbox.
Also Constraint Grammar rules can be part of Apertium applications. Because Apertium
translation systems are based on modularity, various modules can be transferred to other
applications.</p>
</div>
<div id="Sx1.p12" class="ltx_para">
<p class="ltx_p">Constraint Grammar (CG) is a platform designed originally for morphological
disambiguation and syntactic mapping. Its open access application is described in this
book. It has been continually developed, and new features have been added to it. Because
CG provides a powerful set of possibilities for constraining rule applications, it has also
Rule-Based Language Technology
been widely used for other purposes, such as semantic disambiguation, isolation of multiword expressions, and for adding linguistic tags for facilitating machine translation.
Many systems described in this book have included CG as part of the infrastructure. The
open access CG development environment has been developed in Denmark. The work
has been extended also to computer assisted language learning (CALL), and machine
translation.</p>
</div>
<div id="Sx1.p13" class="ltx_para">
<p class="ltx_p">Salama differs from the other platforms in two respects. It does not get financial
support from public funds, and for this reason its applications cannot be made open
access. It is also mainly one-man business. Instead of expanding the system to many
languages, it concentrates on exploring various possibilities in machine translation
between morphologically complex and linguistically different languages, such as Swahili,
English and Finnish. Salama also puts emphasis on other user applications, such as
dictionary compilation, accurate information retrieval, and language learning.</p>
</div>
<div id="Sx1.p14" class="ltx_para">
<p class="ltx_p">There are also other approaches to rule-based language technology. However, for a
number of reasons they are not included into this book. Part of explanation is that they
concentrate only to a specific phase in computational language processing.</p>
</div>
<div id="Sx1.p15" class="ltx_para">
<p class="ltx_p">The second part of the book contains descriptions of specific problems in rule-based
language technology. They also contain solutions to problems that for a long time have
been difficult to solve. In a sense, chapters in this section contain latest inventions in rulebased language technology.</p>
</div>
<div id="Sx1.p16" class="ltx_para">
<p class="ltx_p">One chapter deals with solutions for handling various types of multi-word expressions.
The question of how to handle such words that are not listed in the lexicon is discussed in
two chapters. There is also a description of a simplified system for writing two-level
rules, needed especially for languages with many phonological and morphological
alternations. One chapter describes experiences of working with several languages, thus
providing useful information for less experienced developers. Finally, there is a chapter
estimating the suitability of rule-based technology for African languages, which
constitute a major market for this technology.</p>
</div>
<div id="Sx1.p17" class="ltx_para">
<p class="ltx_p">The chapters of the book were peer-reviewed by at least two such reviewers, who have
no co-publications with the author(s).
28.3. 2023
Arvi Hurskainen, Kimmo Koskenniemi, and Tommi Pirinen
Editors</p>
</div>
<div class="ltx_pagination ltx_role_newpage"></div>
</section>
</article>
</div>
<footer class="ltx_page_footer">
<div class="ltx_page_logo">Generated  on Thu Apr 20 14:40:36 2023 by <a href="http://dlmf.nist.gov/LaTeXML/">LaTeXML <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg==" alt="[LOGO]"></a>
</div></footer>
</div>
</body>
</html>