-
Notifications
You must be signed in to change notification settings - Fork 0
Guidelines for editing dictionary
ilo Token have a dictionary at ./dictionary/dictionary.ts
. It is like a typical dictionary but it's made for computers to read. Although it should be fairly human readable. The dictionary defines word to word (or word to phrase) translation of each word.
The developers of ilo Token welcomes anyone to contribute to the dictionary. jan Koko, one of the developers, felt it is best left to the community. However, this doesn't mean we'll accept your obscure nimisin. We only want to implement what's commonly used.
Here are the guidelines for editing definition list. It should feel like editing a JSON file! In fact, you don't need to learn programming in order to contribute!
We're not going to add more words. Unless it gets added in lipu Linku in "common" category.
The target vocabulary for ilo Token is:
- Words listed in pu.
- Words listed in nimi ku suli.
- Words used in su.
- Words listed in lipu Linku from core to common.
Once words are added, it shall stay there, so words that has been in "common" category but later moved to lower categories shall stay in ilo Token.
We recommend taking a look at the code first to get a sense of how they are written down. You'll see that it is defined like the following:
word:
definitions;
definitions;
another word:
definitions;
definitions;
Syntax is important, please don't forget the semicolon.
Sometimes, words are considered synonyms like "ale" and "ali". In these cases, we merge them together:
ale, ali:
definitions;
definitions;
Each definitions may contain any of these: word unit, tag, and placeholder. Consider the following.
seli:
burn(v) [object];
burn
is the word unit, (v)
is the tag, and [object]
is the placeholder. Word units and tags always comes together, the tag represents what kind the word is, usually its part of speech. Placeholders represents a place that ilo Token may fill in, although placeholders aren't used much. Placeholders are mainly used to keep the definitions as unambiguous as possible.
A word unit may span multiple words:
jan:
human being(n);
Sometimes, word units are separated by forward slash /
. The function of these are dependent on what the tag is but it tends to be for defining different forms or conjugations.
ona:
they/them(personal pronoun plural);
it/it(personal pronoun singular);
A tag may contain more information which are sometimes needed depending on the tag.
pan:
baked(adj qualifier) goods(n plural);
A definition may have multiple word units, tags, and placeholders, forming a phrase.
olin:
have(v) strong(adj opinion) emotional(adj opinion) bond(n singular) with(prep) [object];
All of these syntax isn't free-form, it must follow a certain pattern. ilo Token isn't going to magically understand it all. Definitions may be rewritten or simplified in order to fit within the limitations.
"kokosila" for example has to be written like the following, we can't add the "in an environment where Toki Pona is more appropriate" part.
kokosila:
speak(v) a(d article singular) non-Toki Pona(adj qualifier) language(n singular);
The patterns are explained further below.
This is a way to tell computers "just ignore what I've written here". In the dictionary, it is denoted by hash sign #
. Whatever followed by #
are ignored. This is useful for disabling pieces of codes as well as writing notes meant for contributors instead of computers.
You may find a couple of comments in the code.
If a word contains a special symbol, wrap it inside backticks `
, ilo Token will not include the backticks.
pu:
interact(v) with(prep) the(d article) book(n singular) titled(adj) `Toki Pona: The Language of Good`(proper n);
Use the tag (n)
to define nouns. With some exceptions, you may also use this for pronouns since pronouns tend to act like a noun.
kasi:
plant(n);
You may add determiners and adjectives before it.
sewi:
highest(aj origin) part(n);
Adjectives before nouns may not be compounded. Just removing the and
is a good work around.
palisa:
long(aj size) and(c) hard(aj material) thing(n); # bad
long(aj size) hard(aj material) thing(n); # good
You may add an adjective and proper noun after the noun.
pu:
the(d article) book(n) titled(adj) `Toki Pona: The Language of Good`(proper n);
ilo Token will automatically apply conjugations e.g. singular and plural forms, but if you wish to force it to be singular only or plural only, add singular
or plural
to the tag.
telo:
liquid(n singular);
mani:
savings(n plural);
In some cases, automatic conjugation can fail, these tends to happen with pronouns. In these cases, you may use slash /
and manually define the singular and plural forms, or simply limit it as singular only or plural only as explained above.
ni:
this/these(n);
that/those(n);
seme:
what/what(n);
which/which(n);
Because personal pronouns has different forms when used as subject or object. We need to define them separately from nouns. Use the tag (personal pronoun)
to define them.
There is no automatic conjugation. Use slashes /
and define them as follows: singular subject, singular object, plural subject, and then plural object.
mi:
I/me/we/us(personal pronoun);
Sometimes, pronouns only have a singular form or a plural form. In these cases, include singular
or plural
in the tag. You'll only need to write the subject and object form.
ona:
they/them(personal pronoun plural);
it/it(personal pronoun singular);
Just an amusing side note: to ilo Token, it is it/it
not it/its
.
Remember to only consider the grammatical number and not the semantic number: they/them
, while can refer to a singular person, is always grammatically plural as it always follows are
when used as a subject.
Remember to define possessives as well, these are determiners.
Use the (adj)
tag to define adjective. You'll need to classify what kind of adjective it is which is needed for reordering chains of adjectives. Apparently, it's "Big Red Balloon" and not "Red Big Balloon"
pona:
good(adj opinion)
Here are the classification for adjectives and will be ordered from left to right. These are based on the list found on Wikipedia.
opinion
size
-
physical quality
– Particularly a visible quality e.g. flat, circular age
color
-
origin
– Where it comes from or where it is located e.g. "nearby object" -
material
– Including the property of the material e.g. "hard object" -
qualifier
– Particularly a modifier of compound nouns e.g. "transgender person"
These are just rough categories to aid in sorting adjective and are not set in stone. If new categories are needed, please open a new issue.
Some adjectives may belong in two or more categories, in these cases, test it out. Here's an example: the "land" in "land animal", it can be origin
or qualifier
. We'll try it with another adjective whose category is in the middle of origin
and qualifier
, let's say "hard" which is material
. Then we'll test it: "hard land animal" or "land hard animal", the former feels less awkward, and so we can determine "land" in "land animal" is a qualifier
.
Adjectives may be followed by adverb.
jelo:
lime(av) yellow(aj color);
Adjectives may be compounded using and(c)
. This form is currently limited: there can't be adverbs; there can't be more than 2 adjectives; and there can't be conjunctions other than "and". If lifting these limitations is needed please open an issue and tell why.
linja:
long(aj size) and(c) flexible(aj material);
ilo Token may remove the word "and" when translating: "moku linja" becomes "long flexible food".
To ilo Token, determiners and adjectives are different classification. Determiners acts as limiter instead of modifier.
Use the tag (d)
. You'll need to specify its classification:
ale, ali:
every(d distributive);
Here are the classification of determiners:
-
article
e.g. "the", "a", and "an" -
demonstrative
e.g. "that balloon" -
distributive
e.g. "every balloon" or "each balloon" -
interrogative
e.g. "which balloon" -
possessive
e.g. "my balloon" -
quantifier
e.g. "few balloons" or "many balloons" -
relative
(unused)
Sometimes, determiners limits what grammatical number the noun can be. In these cases, define them inside the tag as well using keywords singular
or plural
after the determiner classification.
ale, ali:
all(d distributive plural);
The determiner "all" forces the noun to be plural e.g. "all apples".
Remember to only consider the grammatical number:
ala:
no(d quantifier plural);
NOTE: this is a bad example, "no apple" is in fact grammatical.
Giving an example noun for an explanation: "no apples", while this refers to 0 apples, it is grammatically plural by its form.
Sometimes, determiners itself has singular or plural forms. In these cases, use slash /
. There is no automatic conjugation for this.
ni:
this/these(d demonstrative);
that/those(d demonstrative);
Numerals are technically part of determiner or noun. But since numbers in toki pona has interesting grammatical functions, numerals are defined separately. Remember these are for exact numbers, like actual integers. For words describing a rough number e.g. "few", "many", use determiner instead.
Use the tag (num)
.
luka:
5(num);
Numerals are hard-coded to all pu numerals: "ala", "wan", "tu", "luka", "mute", and "ale" or "ali". If other words have numeral definition, it will be simply ignored. If these pu numerals don't have numeral definition, an error may occur.
Defining adverbs are as easy as it can get. Use the tag (adv)
.
pona:
nicely(adv);
Don't add so(adv)
to the word "a", this is hardcoded instead. (adv)
definitions are for content words.
These are hardcoded to "a" and "n". If other words have filler definition, it will be simply ignored. If "a" or "n" don't have filler definition, an error may occur.
Use the tag (f)
.
"a" and "n" are the only words permitted to be elongated like "aaa" and "nnn". You'll have to provide different elongation of the definition in a strict pattern: Only one letter can be repeated and it must be in a consistent increasing pattern.
a:
ah/aah/aaah(f);
n:
hm/hmm/hmmm(f);
You may just provide just 2 forms but we recommend sticking to 3.
a:
ah/aah(f);
You may not provide any elongation at all, these won't be used when "a" or "n" are elongated.
a:
ah(f);
Translating "a a a" to "hahaha" is hardcoded in the code. You don't need to define them.
See also interjection.
Defining interjections are as easy as it can get. Use the tag (i)
.
mu:
bark(i);
Interjection definitions are only used when the Toki Pona word is used alone or with "a" in the sentence.
Don't use interjection for particles "a" and "n", use filler instead. (i)
definitions are for content words.
These are for Toki Pona prepositions. Toki Pona preposition happens to be translatable into English preposition. Use the tag (prep)
. Placeholder [indirect object]
are needed.
lon:
in(prep) [indirect object];
A bit of laziness on my (the developer) part: You may define adjective-preposition phrase as well as nested preposition as a single preposition.
sama:
similar to(prep) [indirect object];
kepeken:
by means of(prep) [indirect object];
For example "kili lili" can mean "part of fruit". You may define this kind of definition like the following.
lili:
part(n) of(prep) [headword];
These are for defining particles. Although particles are hardcoded. These are only used in dictionary mode. Use the closest English word that the word can translate to. Use the tag (particle def)
.
anu:
or(particle def);
You may instead describe how the word is used, wrap it in square brackets []
, you'll have to wrap it in backticks `
too because square brackets are special characters used for placeholders.
a:
`[placed after something for emphasis or emotion]`(particle def);
Order matters, ilo Token will try to use the first definition and output it first, although not always. So please reorder the definitions from most-likely definition to least.
We borrow definitions from lipu Linku which itself avoids calques. However, we still need to avoid words that generally has multiple meaning that could be confused at. For example, the word "cool", which is simultaneously the word for "lete" and "epiku" which have different meaning, so the word "cool" should be avoided.
ilo Token will show many output, and it may be very numerous. To counteract this, please reduce the number of definition if possible, try to use words with broad meaning that aligns well with the Toki Pona word.
Using lipu Linku
I recommend using lipu Linku as a reference. lipu Linku is very high quality. You may borrow definitions from it. You may deviate from lipu Linku if needed.
Defining verb is a little complicated. We care about its present, past, past participle, and present participle forms (we call present participle "gerund"). We also care about whether it is transitive or not as well as which preposition it uses if it is transitive.
There are two functions for this: verb
and intransitiveVerb
. verb
is for transitive verbs.
The most basic use of verb
is like this: verb("trade(d)", "trading")
. You may use parenthesis or slash much like in noun. This assumes the past participle form is the same as the past form.
If you need to define a different past participle form. You'll have to use more complex definition:
verb({
presentPast: "choose/chose",
pastParticiple: "chosen",
gerund: "choosing",
})
You may also use the complex form to define what preposition to use if the original text have an object.
verb({
presentPast: "communicate(d)",
gerund: "communicating",
usePreposition: "about",
})
Don't get confused, "communicate" in this case aren't actually transitive in English. But we're considering it as transitive as the original word "toki" can take an object. So we're bending the definition of "transitive"... or let's just say we're just careful with words, we call it verb
, not transitiveVerb
.
Defining intransitive verb is very similar to how you define transitive verb, only that you can't define preposition.
intransitiveVerb("walk(ed)", "walking")
// or
intransitiveVerb({
presentPast: "fly/flew",
pastParticiple: "flown",
gerund: "flying",
})