-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e91e821
commit 1329d7a
Showing
1 changed file
with
60 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,60 @@ | ||
# BengaliNLP | ||
#Bengali Natural Language Processing(BengaliNLP) | ||
|
||
[![PyPI version](https://img.shields.io/pypi/v/bengalinlp)](https://pypi.org/project/bengalinlp/) | ||
[![Downloads](https://static.pepy.tech/badge/bengalinlp)](https://pepy.tech/project/bengalinlp) | ||
|
||
BengaliNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes. | ||
|
||
|
||
## Features | ||
- Tokenization | ||
- [Basic Tokenizer](./docs/README.md#basic-tokenizer) | ||
- [NLTK Tokenizer](./docs/README.md#nltk-tokenization) | ||
- [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization) | ||
- Embeddings | ||
- [Word2vec embedding](./docs/README.md#bengali-word2vec) | ||
- [Fasttext embedding](./docs/README.md#bengali-fasttext) | ||
- [Glove Embedding](./docs/README.md#bengali-glove-word-vectors) | ||
- [Doc2vec Document embedding](./docs/README.md#document-embedding) | ||
- Part of speech tagging | ||
- [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging) | ||
- Named Entity Recognition | ||
- [CRF-based NER](./docs/README.md#bengali-crf-ner) | ||
- [Text Cleaning](./docs/README.md#text-cleaning) | ||
- [Corpus](./docs/README.md#bengali-corpus-class) | ||
- Letters, vowels, punctuations, stopwords | ||
|
||
## Installation | ||
|
||
### PIP installer | ||
|
||
``` | ||
pip install bengalinlp | ||
``` | ||
**or Upgrade** | ||
|
||
``` | ||
pip install -U bengalinlp | ||
``` | ||
- Python: 3.8, 3.9, 3.10, 3.11 | ||
- OS: Linux, Windows, Mac | ||
|
||
### Build from source | ||
``` | ||
git clone https://github.com/banglawiki/bengalinlp.git | ||
cd bengalinlp | ||
python setup.py install | ||
``` | ||
|
||
## Sample Usage | ||
|
||
```py | ||
from bengalinlp import BasicTokenizer | ||
|
||
tokenizer = BasicTokenizer() | ||
|
||
raw_text = "আমি বাংলায় গান গাই।" | ||
tokens = tokenizer(raw_text) | ||
print(tokens) | ||
# output: ["আমি", "বাংলায়", "গান", "গাই", "।"] | ||
``` |