-
Notifications
You must be signed in to change notification settings - Fork 1.6k
pattern de
The pattern.de module contains a fast part-of-speech tagger for German (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for German verb conjugation and noun singularization & pluralization.
It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.
The functions in this module take the same parameters and return the same values as their counterparts in pattern.en. Refer to the documentation there for more details.
German nouns and adjectives inflect according to gender. The gender()
function predicts the gender (MALE
, FEMALE
, NEUTRAL
) of a given noun with about 75%
accuracy:
>>> from pattern.de import gender, MALE, FEMALE, NEUTRAL
>>> print gender('Katze')
FEMALE
The article()
function returns the
article (INDEFINITE
or DEFINITE
) inflected by gender and role (SUBJECT
, OBJECT
, INDIRECT
or PROPERTY
). In the following example, role=OBJECT
means that the article is used in
front of a noun that is the object of the sentence, as in: Ich sehe
die Katze (I see the
cat – what do I see? → the cat).
>>> from pattern.de import article, DEFINITE, FEMALE, OBJECT
>>> print article('Katze', DEFINITE, gender=FEMALE, role=OBJECT)
die
For German nouns there is singularize()
and pluralize()
. The implementation
uses a statistical approach with 84% accuracy for singularization and
72% for pluralization.
>>> from pattern.de import singularize, pluralize
>>> print singularize('Katzen')
>>> print pluralize('Katze')
Katze
Katzen
For German verbs there is conjugate()
,
lemma()
, lexeme()
and tenses()
. The lexicon for verb conjugation
contains about 2,000 common German verbs. For unknown verbs it will fall
back to a rule-based approach with an accuracy of about 87%.
>>> from pattern.de import conjugate
>>> from pattern.de import INFINITIVE, PRESENT, SG, SUBJUNCTIVE
>>>
>>> print conjugate('war', INFINITIVE)
>>> print conjugate('war', PRESENT, 1, SG, mood=SUBJUNCTIVE)
sein
sei
German verbs have more tenses than English verbs. In particular, the
plural differs for each person and there are additional forms for the
IMPERATIVE
and SUBJUNCTIVE
mood. The conjugate()
function takes the following
optional parameters:
Tense | Person | Number | Mood | Aspect | Alias | Example |
INFINITVE | None | None | None | None | "inf" | sein |
PRESENT | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sg" | ich __bin__ |
PRESENT | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sg" | du __bist__ |
PRESENT | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sg" | er __ist__ |
PRESENT | 1 | PL | INDICATIVE | IMPERFECTIVE | "1pl" | wir __sind__ |
PRESENT | 2 | PL | INDICATIVE | IMPERFECTIVE | "2pl" | ihr __seid__ |
PRESENT | 3 | PL | INDICATIVE | IMPERFECTIVE | "3pl" | sie __sind__ |
PRESENT | None | None | INDICATIVE | PROGRESSIVE | "part" | seiend |
PRESENT | 2 | SG | IMPERATIVE | IMPERFECTIVE | "2sg!" | sei |
PRESENT | 1 | PL | IMPERATIVE | IMPERFECTIVE | "1pl!" | seien |
PRESENT | 2 | PL | IMPERATIVE | IMPERFECTIVE | "2pl!" | seid |
PRESENT | 1 | SG | SUBJUNCTIVE | IMPERFECTIVE | "1sg?" | ich __sei__ |
PRESENT | 2 | SG | SUBJUNCTIVE | IMPERFECTIVE | "2sg?" | du __seiest__ |
PRESENT | 3 | SG | SUBJUNCTIVE | IMPERFECTIVE | "3sg?" | ihr __sei__ |
PRESENT | 1 | PL | SUBJUNCTIVE | IMPERFECTIVE | "1pl?" | wir __seien__ |
PRESENT | 2 | PL | SUBJUNCTIVE | IMPERFECTIVE | "2pl?" | ihr __seiet__ |
PRESENT | 3 | PL | SUBJUNCTIVE | IMPERFECTIVE | "3pl?" | sie __seien__ |
PAST | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sgp" | ich __war__ |
PAST | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sgp" | du __warst__ |
PAST | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sgp" | er __war__ |
PAST | 1 | PL | INDICATIVE | IMPERFECTIVE | "1ppl" | wir __waren__ |
PAST | 2 | PL | INDICATIVE | IMPERFECTIVE | "2ppl" | ihr __wart__ |
PAST | 3 | PL | INDICATIVE | IMPERFECTIVE | "3ppl" | sie __waren__ |
PAST | None | None | INDICATIVE | PROGRESSIVE | "ppart" | gewesen |
PAST | 1 | SG | SUBJUNCTIVE | IMPERFECTIVE | "1sgp?" | ich __wäre__ |
PAST | 2 | SG | SUBJUNCTIVE | IMPERFECTIVE | "2sgp?" | du __wärest__ |
PAST | 3 | SG | SUBJUNCTIVE | IMPERFECTIVE | "3sgp?" | er __wäre__ |
PAST | 1 | PL | SUBJUNCTIVE | IMPERFECTIVE | "1ppl?" | wir __wären__ |
PAST | 2 | PL | SUBJUNCTIVE | IMPERFECTIVE | "2ppl?" | ihr __wäret__ |
PAST | 3 | PL | SUBJUNCTIVE | IMPERFECTIVE | "3ppl?" | sie __wären__ |
Instead of optional parameters, a single short alias, or PARTICIPLE
or PAST+PARTICIPLE
can also be given. With no
parameters, the infinitive form of the verb is returned.
German adjectives inflect with an -e
, -em
,
-en
, -er
, or -es
suffix (e.g., neugierig → die neugierige Katze) depending on gender
and role. You can get the base form with the predicative()
function, or vice versa
with attributive()
. For predicative, a
statistical approach is used with an accuracy of 98%. For attributive,
you need to supply gender (MALE
, FEMALE
, NEUTRAL
) and role (SUBJECT
, OBJECT
, INDIRECT
, PROPERTY
).
>>> from pattern.de import attributive, predicative
>>> from pattern.de import MALE, FEMALE, SUBJECT, OBJECT
>>>
>>> print predicative('neugierige')
>>> print attributive('neugierig', gender=FEMALE)
>>> print attributive('neugierig', gender=FEMALE, role=OBJECT)
>>> print attributive('neugierig', gender=FEMALE, role=INDIRECT, article="die")
neugierig
neugierige
neugierige
neugierigen
For parsing there is parse()
, parsetree()
and split()
. The parse()
function annotates words in the given
string with their part-of-speech
tags (e.g.,
NN
for nouns and VB
for verbs). The parsetree()
function takes a string and
returns a tree of nested objects (Text
→ Sentence
→ Chunk
→ Word
). The split()
function takes the output of parse()
and returns a Text
. See the pattern.en documentation
(here) how
to manipulate Text
objects.
>>> from pattern.de import parse, split
>>>
>>> s = parse('Die Katze liegt auf der Matte.')
>>> for sentence in split(s):
>>> print sentence
Sentence('Die/DT/B-NP/O Katze/NN/I-NP/O liegt/VB/B-VP/O'
'auf/IN/B-PP/B-PNP der/DT/B-NP/I-PNP Matte/NN/I-NP/I-PNP ././O/O')
The parser is built on Gerold Schneider & Martin Volk's German language
model. The accuracy is around 85%. The
original
STTS
tagset is mapped to Penn Treebank tagset. If you
need to work with the original tags you can also use parse()
with an optional parameter tagset="STTS"
.
Reference: Schneider,
G. & Volk, M. (1998).
Adding manual constraints and lexical look-up to a Brill-tagger for
German. Proceedings of ESSLLI-98.
There's no sentiment()
function for
German yet.
Note: We did a test by
automatically assigning scores (-1.0
→
+1.0
) to adjectives translated from
English, but this approach only had 35% accuracy.