Copyright (C) 2011-2017 mailto:[email protected]
parse-english is a minimum viable English parser implemented in LexYacc. It parses in parallel all possible interpretations of an English sentence accepted by a grammar and generates abstract syntax trees for successful parses. The algorithm is completely deterministic. No training data is required.
See old version here: NatLang
input:
the quick brown fox jumps over the lazy dog.
output:
cd ./demo/0_parse-english_full_nlp ./demo.sh "the quick brown fox jumps over the lazy dog"
Switch | Description |
---|---|
-e SENTENCE | input sentence |
-l | Lisp mode |
-g | graph mode (slow for deep trees) |
-d | dot mode |
-x | extract ontology mode |
-q | quiet mode |
-m | memory debug |
-n | indent lisp |
Unix tools and 3rd party components (accessible from $PATH):
gcc flex bison
- Parallel reentrant parsing
- Lisp / graph / dot output (multiple trees)
- Present tense
- Progressive tense
- Future tense
- Past tense
- Past perfect tense
- Passive voice
- Questions
- Conditionals
- Imperitive mood
- Comparisons
- Hard coded grammar & vocabulary.
- A brute force algorithm tries all supported interpretations of a sentence. This is slow for long sentences.
- BNF rules are suitable for specifying constituent-based phrase structure grammars, but are a poor fit for expressing non-local dependencies.
target | action |
---|---|
all | make binaries |
test | all + run tests |
pure | test + use valgrind to check for memory leaks |
dot | test + generate .png graph for tests |
lint | use cppcheck to perform static analysis on .cpp files |
doc | use doxygen to generate documentation |
xml | test + generate .xml for tests |
import | test + use ticpp to serialize-to/deserialize-from xml |
clean | remove all intermediate files |
- "Part-of-speech tagging"
- http://en.wikipedia.org/wiki/Part-of-speech_tagging
- "Princeton WordNet"
- http://wordnet.princeton.edu/
- "Syntactic Theory: A Unified Approach"
- ISBN: 0340706104
- "Enju - A fast, accurate, and deep parser for English"
- http://www.nactem.ac.uk/enju/
Natural Language Processing, English parser, Yacc, BNF