A Java version of Hazm (Python library for digesting Persian text)
- Text cleaning
- Sentence and word tokenizer
- Word lemmatizer
- POS tagger
- Dependency parser
- Corpus readers for Hamshahri and Bijankhan
- You can download pre-trained tagger and parser models for persian and put these models in the
core/src/main/resources
folder of your project.
You must install this module with maven.
To make a single jar file run this codes:
mvn clean compile assembly:single
For using this project as library in maven just use:
mvn clean install
To run and see the help:
java -jar jhazm-jar-with-dependencies.jar
For example to do POS Tag on bundled sample file use:
java -jar jhazm-jar-with-dependencies.jar -a partOfSpeechTagging -o test.txt
Or to run on any other file:
java -jar jhazm-jar-with-dependencies.jar -a partOfSpeechTagging -o test.txt -i input.txt
Or on some piece of text:
java -jar jhazm-jar-with-dependencies.jar -a partOfSpeechTagging -o test.txt -t "سلام من خوب هستم!"
Good Luck!