Predict the named entities present in a file using spaCy. spaCy is a powerful, user-friendly, open-source Natural Language Processing library in Python.
Text to be processed is extracted from documents using textract. The results (named entities and some context) are then saved in an excel file.
-
Download the files, and set up a virtual environment:
git clone https://github.com/Tim-Abwao/named-entity-extractor.git cd named-entity-extractor python3 -m venv venv source venv/bin/activate
-
Install the required packages:
pip install -U pip pip install openpyxl pandas spacy textract python -m spacy download en_core_web_md
-
Start the app:
python -m entity_extractor
A tkinter GUI (demonstrated above) should pop up to help navigate to, and select a document to process.
NOTE: For help with
tkinter
- related issues, please see TkDocs.