pattern de

The pattern.de module contains a fast part-of-speech tagger for German (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for German verb conjugation and noun singularization & pluralization.

Documentation

The functions in this module take the same parameters and return the same values as their counterparts in pattern.en. Refer to the documentation there for more details.

Gender

German nouns and adjectives inflect according to gender. The gender() function predicts the gender (MALE, FEMALE, NEUTRAL) of a given noun with about 75% accuracy:

>>> from pattern.de import gender, MALE, FEMALE, NEUTRAL
>>> print gender('Katze')

FEMALE

Article

The article() function returns the article (INDEFINITE or DEFINITE) inflected by gender and role (SUBJECT, OBJECT, INDIRECT or PROPERTY). In the following example, role=OBJECT means that the article is used in front of a noun that is the object of the sentence, as in: Ich sehe die Katze (I see the cat – what do I see? → the cat).

>>> from pattern.de import article, DEFINITE, FEMALE, OBJECT
>>> print article('Katze', DEFINITE, gender=FEMALE, role=OBJECT)

die

Noun singularization & pluralization

For German nouns there is singularize() and pluralize(). The implementation uses a statistical approach with 84% accuracy for singularization and 72% for pluralization.

>>> from pattern.de import singularize, pluralize
>>> print singularize('Katzen')
>>> print pluralize('Katze')

Katze
Katzen

Verb conjugation

For German verbs there is conjugate(), lemma(), lexeme() and tenses(). The lexicon for verb conjugation contains about 2,000 common German verbs. For unknown verbs it will fall back to a rule-based approach with an accuracy of about 87%.

>>> from pattern.de import conjugate
>>> from pattern.de import INFINITIVE, PRESENT, SG, SUBJUNCTIVE
>>>  
>>> print conjugate('war', INFINITIVE)
>>> print conjugate('war', PRESENT, 1, SG, mood=SUBJUNCTIVE) 

sein
sei

German verbs have more tenses than English verbs. In particular, the plural differs for each person and there are additional forms for the IMPERATIVE and SUBJUNCTIVE mood. The conjugate() function takes the following optional parameters:

Tense	Person	Number	Mood	Aspect	Alias	Example
INFINITVE	None	None	None	None	"inf"	sein
PRESENT	1	SG	INDICATIVE	IMPERFECTIVE	"1sg"	ich __bin__
PRESENT	2	SG	INDICATIVE	IMPERFECTIVE	"2sg"	du __bist__
PRESENT	3	SG	INDICATIVE	IMPERFECTIVE	"3sg"	er __ist__
PRESENT	1	PL	INDICATIVE	IMPERFECTIVE	"1pl"	wir __sind__
PRESENT	2	PL	INDICATIVE	IMPERFECTIVE	"2pl"	ihr __seid__
PRESENT	3	PL	INDICATIVE	IMPERFECTIVE	"3pl"	sie __sind__
PRESENT	None	None	INDICATIVE	PROGRESSIVE	"part"	seiend

PRESENT	2	SG	IMPERATIVE	IMPERFECTIVE	"2sg!"	sei
PRESENT	1	PL	IMPERATIVE	IMPERFECTIVE	"1pl!"	seien
PRESENT	2	PL	IMPERATIVE	IMPERFECTIVE	"2pl!"	seid

PRESENT	1	SG	SUBJUNCTIVE	IMPERFECTIVE	"1sg?"	ich __sei__
PRESENT	2	SG	SUBJUNCTIVE	IMPERFECTIVE	"2sg?"	du __seiest__
PRESENT	3	SG	SUBJUNCTIVE	IMPERFECTIVE	"3sg?"	ihr __sei__
PRESENT	1	PL	SUBJUNCTIVE	IMPERFECTIVE	"1pl?"	wir __seien__
PRESENT	2	PL	SUBJUNCTIVE	IMPERFECTIVE	"2pl?"	ihr __seiet__
PRESENT	3	PL	SUBJUNCTIVE	IMPERFECTIVE	"3pl?"	sie __seien__

PAST	1	SG	INDICATIVE	IMPERFECTIVE	"1sgp"	ich __war__
PAST	2	SG	INDICATIVE	IMPERFECTIVE	"2sgp"	du __warst__
PAST	3	SG	INDICATIVE	IMPERFECTIVE	"3sgp"	er __war__
PAST	1	PL	INDICATIVE	IMPERFECTIVE	"1ppl"	wir __waren__
PAST	2	PL	INDICATIVE	IMPERFECTIVE	"2ppl"	ihr __wart__
PAST	3	PL	INDICATIVE	IMPERFECTIVE	"3ppl"	sie __waren__
PAST	None	None	INDICATIVE	PROGRESSIVE	"ppart"	gewesen

PAST	1	SG	SUBJUNCTIVE	IMPERFECTIVE	"1sgp?"	ich __wäre__
PAST	2	SG	SUBJUNCTIVE	IMPERFECTIVE	"2sgp?"	du __wärest__
PAST	3	SG	SUBJUNCTIVE	IMPERFECTIVE	"3sgp?"	er __wäre__
PAST	1	PL	SUBJUNCTIVE	IMPERFECTIVE	"1ppl?"	wir __wären__
PAST	2	PL	SUBJUNCTIVE	IMPERFECTIVE	"2ppl?"	ihr __wäret__
PAST	3	PL	SUBJUNCTIVE	IMPERFECTIVE	"3ppl?"	sie __wären__

Instead of optional parameters, a single short alias, or PARTICIPLE or PAST+PARTICIPLE can also be given. With no parameters, the infinitive form of the verb is returned.

Attributive & predicative adjectives

German adjectives inflect with an -e, -em , -en, -er, or -es suffix (e.g., neugierig → die neugierige Katze) depending on gender and role. You can get the base form with the predicative() function, or vice versa with attributive(). For predicative, a statistical approach is used with an accuracy of 98%. For attributive, you need to supply gender (MALE, FEMALE, NEUTRAL) and role (SUBJECT, OBJECT, INDIRECT, PROPERTY).

>>> from pattern.de import attributive, predicative
>>> from pattern.de import MALE, FEMALE, SUBJECT, OBJECT
>>>   
>>> print predicative('neugierige') 
>>> print attributive('neugierig', gender=FEMALE)
>>> print attributive('neugierig', gender=FEMALE, role=OBJECT)
>>> print attributive('neugierig', gender=FEMALE, role=INDIRECT, article="die")

neugierig
neugierige 
neugierige 
neugierigen

Parser

For parsing there is parse(), parsetree() and split(). The parse() function annotates words in the given string with their part-of-speech tags (e.g., NN for nouns and VB for verbs). The parsetree() function takes a string and returns a tree of nested objects (Text → Sentence → Chunk → Word). The split() function takes the output of parse() and returns a Text. See the pattern.en documentation (here) how to manipulate Text objects.

>>> from pattern.de import parse, split
>>>  
>>> s = parse('Die Katze liegt auf der Matte.')
>>> for sentence in split(s):
>>>     print sentence 

Sentence('Die/DT/B-NP/O Katze/NN/I-NP/O liegt/VB/B-VP/O'
         'auf/IN/B-PP/B-PNP der/DT/B-NP/I-PNP Matte/NN/I-NP/I-PNP ././O/O')

The parser is built on Gerold Schneider & Martin Volk's German language model. The accuracy is around 85%. The original STTS tagset is mapped to Penn Treebank tagset. If you need to work with the original tags you can also use parse() with an optional parameter tagset="STTS".

Reference: Schneider, G. & Volk, M. (1998).
Adding manual constraints and lexical look-up to a Brill-tagger for German. Proceedings of ESSLLI-98.

Sentiment analysis

There's no sentiment() function for German yet.

Note: We did a test by automatically assigning scores (-1.0 → +1.0) to adjectives translated from English, but this approach only had 35% accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly