Text2Graph: Autonomous Knowledge Graph Construction Framework

Text2Graph is a Python-based framework for the autonomous construction of domain-specific Knowledge Graphs (KG) from unstructured text data. The system transforms textual data into a labeled property graph-based representation using various NLP and semantic processing tools and integrates with Neo4j for graph storage and querying.

Features

Automated NLP pipeline for entity extraction, relation extraction, and semantic enrichment.
Integration with Neo4j for graph-based storage and querying.
Support for large-scale Knowledge Graphs with rich domain-specific entity typing, event tagging, and temporal relationships.
Extensible and schema-free architecture.

Requirements

System Requirements

Ubuntu Linux (or Windows Subsystem for Linux (WSL))
Neo4j 4.4 with APOC plugin
Docker for external NLP services
Python 3.8+

Python Dependencies

Ensure Python 3.8+ is installed. You can install the required dependencies using the following instructions:

Clone the repository

git clone https://github.com/neostrange/text2graph.git
cd text2graph

Install Spacy Mode

pip install spacy
python -m spacy download en_core_web_trf
python -m spacy download en_core_web_lg

Additional SpaCy Dependency
```
pip install spacy-dbpedia-spotlight
```

Install WordNet (nltk) 3.1

pip install nltk
python -c "import nltk; nltk.download('wordnet')"

Other Python Dependencies

pip install cgitb requests distutils spacy json tokenize    
GPUtil textwrap py2neo configparser neo4j
pip install py2neo==2021.2.3
pip install GPUtil

SpaCy Transformers (Version Fix) If you encounter issues with SpaCy's transformers, you may need to downgrade:
```
pip install spacy-transformers==1.1.6
```

Neo4j Setup

Install Neo4j 4.4:** Follow the Neo4j installation instructions for your system.
Enable APOC Plugin**
- Install the APOC plugin via the Neo4j plugin manager.
- Create an apoc.conf file in the Neo4j configuration folder (typically located at /var/lib/neo4j/conf/), and add the following:
```
apoc.import.file.enabled=true
```
Copy Dataset Files**
- Copy the files from the dataset folder in this repository to the import folder of Neo4j (usually located at /var/lib/neo4j/import/).
Restart Neo4j**
- Restart the Neo4j service to apply the changes:
```
sudo systemctl restart neo4j
```
Neo4j Configuration for WSL**
- If running on WSL, enable the Neo4j default listen address in the Neo4j configuration (/etc/neo4j/neo4j.conf):
```
dbms.default_listen_address=0.0.0.0
```

Docker Services

To extend the pipeline, you must ensure the following Docker containers are running:

Coreference Resolution:
- Build from this repository: Spacy-Coref-Docker
- Expose on localhost:9999:
```
docker run -p 9999:9999 neostrange/spacy-experimental-coref
```
Event Tagging:
- Build from this repository: TTK Docker
- Expose on localhost:5050:
```
docker run -p 5050:5050 neostrange/ttk
```
Temporal Expression Tagging:
- Use the HeidelTime WebService: HeidelTime-WebService-Docker
- Expose on localhost:5000:
```
docker run -p 5000:5000 neostrange/heideltime
```
Word Sense Disambiguation:
- For Word Sesnse Disambiguation, text2graph uses AMuSE-WSD.
- Please Follow the instructions given in AMUSE-WSD:
- Download the Docker image for AMUSE-WSD
- Expose on localhost:81:
```
docker run -p 81:81 amuse/amuse-wsd
```
Semantic Role Labeling:
- text2graph uses AllenNlp to perform semantic role labeling. You can find the instrcutions to setup docker container for AllenNlp SRL
- From AllenNLP Docker:
- Expose on localhost:8000:
```
docker run -p 8000:8000 allennlp/allennlp
```

Make sure all Docker services are running before initiating the text2graph pipeline to ensure full functionality for entity enrichment, event tagging, and temporal expressions.

WSL Specific Setup

If you're running the project on WSL (Windows Subsystem for Linux), you may need to configure the firewall:

Add WSL to Windows Firewall:

Run the following command in PowerShell (as Administrator):

New-NetFirewallRule -DisplayName "WSL" -Direction Inbound -InterfaceAlias "vEthernet (WSL)" -Action Allow

Restart Your System:
- After applying the firewall rule, restart your computer.

Usage Instructions

Set up the Python Path (optional):

If you need to work with nested directories, you can add the current working directory to the Python path:

export PYTHONPATH="$(pwd):$PYTHONPATH"

Run the Pipeline

The Text2Graph pipeline is a modular system designed to efficiently generate Knowledge Graphs from textual data. It consists of several distinct phases, each focusing on specific NLP tasks. Let's walk through how to run the pipeline and explore each phase:

Phase 1: Basic Linguistic Analysis (python3 GraphBasedNLP.py --input /path/to/text/documents)

Function: This phase performs the foundational tasks of Natural Language Processing (NLP) on the input text documents.
Input: You can specify the path to your text documents using the --input argument. If no argument is provided, the script will load text data files by default from the data/dataset folder within the Text2Graph repository. Currently, this folder contains pre-loaded files from the MEANTIME corpus for your convenience.

Running Phase 1:

Open a terminal window and navigate to the directory containing the GraphBasedNLP.py script within your Text2Graph installation.
(Optional) If you have your own text documents, execute the script with the --input argument followed by the path to your data directory:
```
python3 GraphBasedNLP.py --input /path/to/your/text/documents
```
If you'd like to use the pre-loaded MEANTIME corpus data, simply run the script without any arguments:
```
python3 GraphBasedNLP.py```
```

Phase 2: Refinement Phase

Function: This phase focuses on refining the extracted information from Phase 1. It establishes connections between different linguistic elements and ensures consistency within the data.
Input: The output from Phase 1 (typically stored in a Neo4j database) serves as the input for this phase. Running Phase 2:

Ensure Phase 1 has completed successfully.
Navigate to the directory containing the RefinementPhase.py script.
Execute the script
```
python3 RefinementPhase.py
```

Phase 3: Temporal Enrichment

Function: This phase enriches the Knowledge Graph with temporal information. It involves identifying and tagging time expressions and event triggers within the text data.
Input: The refined data from Phase 2 is used as input for this phase.

Running Phase 3:

Ensure Phases 1 and 2 have completed successfully.
Navigate to the directory containing the TemporalPhase.py script.
Execute the script:
```
python3 TemporalPhase.py
```

Phase 4: Event Enrichment

Function: This phase focuses on enriching event information within the Knowledge Graph. It establishes links between identified events and entities, as well as other events, based on the linguistic elements present in the graph.
Input: The temporally enriched data from Phase 3 is used as input for this phase.

Running Phase 4:

Ensure Phases 1, 2 and 3 have completed successfully.
Navigate to the directory containing the EventEnrichmentPhase.py script.
Execute the script:
```
python3 EventEnrichmentPhase.py
```

Phase 5: TLink Recognition

Function: This phase aims to identify Temporal Links (TLinks) within the Knowledge Graph. TLinks describe temporal relationships between events, such as "before," "after," or "during".
Input: The event-enriched data from phase 4, will serve as input for TLink recognition. Running Phase 5:

Ensure all the previous steps have been completed.
Navigate to the directory containing TlinksRecognizer.py script.
Execute the script:
```
python3 TlinksRecognizer.py
```

Neo4j Interaction

Note: While the REST endpoints powered by FastAPI are not yet implemented, you can still interact with the generated Knowledge Graph directly through the Neo4j Browser or Neo4j Bloom.

These tools provide a user-friendly interface for exploring and querying the graph data. You can execute Cypher queries to retrieve specific information or visualize the graph structure.

Screenshots

Publications

A. Hur, N. Janjua, and M. Ahmed, "A Survey on State-of-the-art Techniques for Knowledge Graphs Construction and Challenges ahead," 2021 IEEE Fourth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), Laguna Hills, CA, USA, 2021, pp. 99-103, doi: 10.1109/AIKE52691.2021.00021.
Ali Hur, Naeem Janjua, and Mohiuddin Ahmed, "Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLP," Expert Systems with Applications, Volume 239, 2024, 122269, doi: 10.1016/j.eswa.2023.122269.
A. Hur, N. Janjua, "Constructing Domain-Specific Knowledge Graphs From Text: A Case Study on Subprime Mortgage Crisis," Special Issue on Knowledge Graph Construction, Semantic Web Journal (by IOS Press), 2024 (Under Review).

Contributing to Text2Graph

We welcome contributions to Text2Graph! If you encounter any bugs, have feature requests, or wish to contribute new functionality, please submit a pull request or open an issue on our GitHub repository. Your contributions help us enhance Text2Graph and make it more valuable for the community.

Thank you for your support!

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
__pycache__		__pycache__
data		data
images		images
notebooks		notebooks
tarsqi-dataset		tarsqi-dataset
util		util
.gitignore		.gitignore
EventEnrichment.txt		EventEnrichment.txt
EventEnrichmentPhase.py		EventEnrichmentPhase.py
GraphBasedNLP.py		GraphBasedNLP.py
LICENSE		LICENSE
README.md		README.md
RefinementPhase.py		RefinementPhase.py
SemanticEnrichment.txt		SemanticEnrichment.txt
TemporalPhase.py		TemporalPhase.py
TextProcessor.py		TextProcessor.py
TlinksRecognizer.py		TlinksRecognizer.py
Utils.py		Utils.py
__inti__.py		__inti__.py
config.ini		config.ini
requirements.txt		requirements.txt
text2graph_logo.png		text2graph_logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2Graph: Autonomous Knowledge Graph Construction Framework

Features

Requirements

System Requirements

Python Dependencies

Neo4j Setup

Docker Services

WSL Specific Setup

Usage Instructions

Run the Pipeline

Neo4j Interaction

Screenshots

Publications

Contributing to Text2Graph

License

About

Releases

Packages

Contributors 2

Languages

License

neostrange/text2graphs

Folders and files

Latest commit

History

Repository files navigation

Text2Graph: Autonomous Knowledge Graph Construction Framework

Features

Requirements

System Requirements

Python Dependencies

Neo4j Setup

Docker Services

WSL Specific Setup

Usage Instructions

Run the Pipeline

Neo4j Interaction

Screenshots

Publications

Contributing to Text2Graph

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages