This file describe the structure of verbs and nouns in multiple formats (csv, sql, xml, etc.)
Field | Type | Description | وصف |
vocalized | String | vocalized word | الكلمة مشكولة |
unvocalized | String | unvocalized word | الكلمة غير مشكولة |
root | String | root of the verb | جذر الفعل |
normalized | String | normalized form of verb (Hamzat are unified) | الفعل منمّط، الهمزات والألفات موحّدة |
stamped | String | normalized verb without affixation letters | بصمة الفعل، حذف كل حروف الزيادة |
future_type | String | The future mark, used only for trilateral verbs | حركة عين الفعل الثلاثي في المضارع |
triliteral | Boolean | the verb is triliteral (3 letters) or not | الفعل ثلاثي/غير ثلاثي |
transitive | Boolean | transitive or not | فعل متعدي/ لازم |
double_trans | Boolean | has double transitivity for two objects | متعدي لمفعولين |
think_trans | Boolean | the verb is transitive to human | متعدي للغاقل |
unthink_trans | Boolean | the verb is transitive to non human being | متعدي لغير العاقل |
reflexive_trans | Boolean | pronominal verb | فعل من أفعال القلوب |
past | Boolean | can be conjugated in past tense | يتصرف في الماضي |
future | Boolean | can be conjugated in present and future tense | يتصرف في المضارع |
imperative | Boolean | can be conjugated in imperative | يتصرف في الأمر |
passive | Boolean | can be conjugated in passive voice | يتصرف في المبني للمجهول |
future_moode | Boolean | can be conjugated in future moode (jusive, subjuctive, ) | يتصرف في المضارع المجزوم أو المنصوب |
confirmed | Boolean | can be conjugated in confirmed tenses | يتصرف في المؤكد |
We can regroup features as:
Word: vocalized form of word, with full diacritics, e.g: "ضَرَبَ" [ to hit]
Basic verb features:
- root of verb
- transitive or not transitive
- Tri-letters or not : length of verb lemma
- future type: the mark used in future tense used only for tri letters verbs.
Features used for search and lookup:
Unvocalized: word without diacritics e.g. "ضرب"
Normalized form: unify Hamzat and Alefat to find other word forms, e.g. normalized form of "سأل" is "سءل".
Stamped form:
This feature is used to find lemma from stem with letters variants.
The stamp is generated by removing letters which can be used as affixation letters (prefix, infix, suffix) such as (ALEF, YEH, WAW, ALEF_MAKSURA, HAMZA, ALEF_HAMZA_ABOVE, WAW_HAMZA, YEH_HAMZA, ALEF_MADDA, SHADDA).
The following verbs generate a stamp "كتب", which help to find more similar verbs from inflected verb
- كَتَبَ يَكْتُبُ
- اِكْتَأَبَ يَكْتَئِبُ
- اِكْتَبَى يَكْتَبِي
- كَتَّبَ يُكَتِّبُ
- كاتَبَ يُكَاتِبُ
- أَكْتَبَ يُكْتِبُ
The following verbs generate the stamp "رم", which help to find more similar verbs from inflected verb: رَامَ أَرَمَ رَأَمَ رَئِمَ راءَمَ أَرْأَمَ رَمَأَ أَرْمَأَ رَمَّ رَمَّمَ أَرَمَّ رَمَى رامَى أَرْمَى رَوَّمَ رَيَّمَ وَرِمَ وَرَّمَ أَوْرَمَ
Conjugation verb features
- Accepted tenses to be used to conjugate verb (boolean features) [ past, future, imperative, passive mode, future moods, confirmed mood]
Syntax and semantic affixes:
Advanced features about syntactical affixes added to verb:
think_trans: the verb accept to be attached with human attached pronoun like (هم، هن، نا، ني)
for example: the verb "تنفّس" don't accept a human as object.
unthink_trans: the verb accept to be attached with non human attached pronoun like (ها)
for example: the verb "تنفّس" accept a non human as object (تنفّس الغاز)
reflexive_trans: the verb accept to be attached with a reflexive attached pronoun like (نا ني)
for example: the verb "ضرب" accept a reflexive object (ضربت نفسي، ضربتني)
double_trans: has double transitivity for two objects, can accept two attached pronouns like :
CREATE TABLE verbs ( id int unique,
vocalized varchar(30) not null,
unvocalized varchar(30) not null,
root varchar(30),
normalized varchar(30) not null,
stamped varchar(30) not null,
future_type varchar(5),
triliteral tinyint(1) default 0,
transitive tinyint(1) default 0,
double_trans tinyint(1) default 0,
think_trans tinyint(1) default 0,
unthink_trans tinyint(1) default 0,
reflexive_trans tinyint(1) default 0,
past tinyint(1) default 0,
future tinyint(1) default 0,
imperative tinyint(1) default 0,
passive tinyint(1) default 0,
future_moode tinyint(1) default 0,
confirmed tinyint(1) default 0,
<?xml version='1.0' encoding='utf8'?>
<verb future_type='كسرة'
reflexive_trans='0' >
<tenses past='0'
Field | Description | وصف |
vocalized | vocalized word | الكلمة مشكولة |
unvocalized | unvocalized word | غير مشكولة |
wordtype | word type( Noun of Subject, noun of object, …) | نوع الكلمة (اسم فاعل، اسم مفعول، صيغة مبالغة..) |
root | word root | جذر الكلمة |
wazn | word pattern or template | وزن الكلمة |
normalized | normalized form of noun (Hamzat are unified) | الاسم منمّط، الهمزات والألفات موحدة الأشكال |
stamped | normalized noun without affixation letters | بصمة الاسم، حروف الزيادة محذوفة |
category | word category | صنف الكلمة أو قسمها الفرعي |
original | original verb or noun (masdar) | مصدر الكلمة فعل او اسم |
mankous | if the word is mankous, ends with Yeh | اسم منقوص |
defined | the word is defined or not | معرفة |
gender | the word gender | نوع أو جنس الكلمة |
feminin | the feminin form of the word | مؤنث الكلمة |
masculin | the masculin form of the word | مذكر الكلمة |
number | the word is sigle, dual or plural | عدد مفرد/مثنى/جمع |
single | the single form of the word | مفرد الكلمة |
dualable | accept dual suffix | يقبل التثنية |
feminable | the word accept Teh_marbuta | يقبل تاء التأنيث |
masculin_plural | accept masculine plural | يقبل جمع المذكر السالم |
feminin_plural | accept feminine plural | يقبل جمع المؤنث السالم |
broken_plural | the irregular plural if exists | جموع تكسيره إن وجدت |
mamnou3_sarf | doesnt accept tanwin | ممنوع من الصرف |
relative | relative | منسوب يالياء |
w_suffix | accept waw suffix | يقبل الاحقة ـو الخاصة بجمع المذكر السالم عند إضافته إلى ما بعده |
hm_suffix | accept Heh+Meem suffix | يقبل اللاحقة ـهم |
kal_prefix | accept Kaf+Alef+Lam prefix | يقبل السابقة كالـ |
ha_suffix | accept Heh suffix | يقبل اللاحقة ـه |
k_prefix | accept preposition prefixes without "AL" definition article | يقبل سابقة الجر دون ال التعريف |
annex | accept the oral annexation | يقبل الإضافة إلى ما بعده مثل المقيمي الصلاة |
definition | word description | شرح الكلمة |
note | notes about the dictionary entry. | ملاحظات على المدخل في القاموس |
We can regroup features as:
Word: vocalized form of word, with full diacritics, e.g: "ضَرْبَ" [ hit]
Basic word features:
- root of word
- future type: the mark used in future tense used only for tri letters verbs.
- word type and category as a sub type
- root and wazn جذر ووزن
- lemma (original)
- gender (مذكر، مؤنث)
- number (عدد: مفرد، مثنى، جمع)
- its single if the word is plural
- its feminin if the word is masculine
- its irregular plural if exists
- if the noun is defined ( originaly defined like proper nouns)
Features used for search and lookup:
Unvocalized: word without diacritics e.g. "ضرب"
Normalized form: unify Hamzat and Alefat to find other word forms, e.g. normalized form of "سؤال" is "سءول".
Stamped form:
This feature is used to find lemma from stem with letters variants.
The stamp is generated by removing letters which can be used as affixation letters (prefix, infix, suffix) such as (ALEF, YEH, WAW, ALEF_MAKSURA, HAMZA, ALEF_HAMZA_ABOVE, WAW_HAMZA, YEH_HAMZA, ALEF_MADDA, SHADDA).
Noun inflection:
Noun can accept affixes or cases like:
- dualable: accept dual suffix يقبل التثنية
- feminable: the word accept Teh_marbuta يقبل تاء التأنيث
- masculin_plural: accept masculine plural يقبل جمع المذكر السالم
- feminin_plural: accept feminine plural يقبل جمع المؤنث السالم
- mamnou3_sarf: doesn't accept tanwin ممنوع من الصرف
- w_suffix : accept waw suffix يقبل الاحقة ـو الخاصة بجمع المذكر السالم عند إضافته إلى ما بعده
- hm_suffix : accept Heh+Meem suffix يقبل اللاحقة ـهم
- kal_prefix : accept Kaf+Alef+Lam prefix يقبل السابقة كالـ
- ha_suffix : accept Heh suffix يقبل اللاحقة ـه
- k_prefix : accept preposition prefixes without "AL" definition article يقبل سابقة الجر دون ال التعريف
`id` int(11) unique,
`vocalized` varchar(30) DEFAULT NULL,
`unvocalized` varchar(30) DEFAULT NULL,
`normalized` varchar(30) DEFAULT NULL,
`stamp` varchar(30) DEFAULT NULL,
`wordtype` varchar(30) DEFAULT NULL,
`root` varchar(10) DEFAULT NULL,
`wazn` varchar(30) DEFAULT NULL,
`category` varchar(30) DEFAULT NULL,
`original` varchar(30) DEFAULT NULL,
`gender` varchar(30) DEFAULT NULL,
`feminin` varchar(30) DEFAULT NULL,
`masculin` varchar(30) DEFAULT NULL,
`number` varchar(30) DEFAULT NULL,
`single` varchar(30) DEFAULT NULL,
`broken_plural` varchar(30) DEFAULT NULL,
`defined` tinyint(1) DEFAULT 0,
`mankous` tinyint(1) DEFAULT 0,
`feminable` tinyint(1) DEFAULT 0,
`dualable` tinyint(1) DEFAULT 0,
`masculin_plural` tinyint(1) DEFAULT 0,
`feminin_plural` tinyint(1) DEFAULT 0,
`mamnou3_sarf` tinyint(1) DEFAULT 0,
`relative` tinyint(1) DEFAULT 0,
`w_suffix` tinyint(1) DEFAULT 0,
`hm_suffix` tinyint(1) DEFAULT 0,
`kal_prefix` tinyint(1) DEFAULT 0,
`ha_suffix` tinyint(1) DEFAULT 0,
`k_prefix` tinyint(1) DEFAULT 0,
`annex` tinyint(1) DEFAULT 0,
`definition` text,
`note` text
) ;
<noun id='60000'>
<wordtype>اسم فاعل</wordtype>
<definition>". ""تَرَكَ ابْناً بَارّاً"" : صَادِقاً وَصَالِحاً وَمُحْسِناً. ""اِبْنُكَ البارُّ يُحِبُّكَ"</definition>