Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create graphlet.nlp.ie module for information extraction as part of property graph construction #11

Open
rjurney opened this issue Sep 7, 2022 · 0 comments

Comments

@rjurney
Copy link
Contributor

rjurney commented Sep 7, 2022

Use of graphlet.etl Schema Models

We can use graphlet.etl's Pandera Schema Models schema models to define the entities and relations we are extracting.

About graphlet.etl

The module graphlet.etl helps to construct enterprise knowledge graphs as property graphs via Extract, Transform, Load (ETL) / Extract, Load, Transform (ELT) with the assistance of Pandera Schema Models on top of PySpark and Dask. These models are useful in that they define the types of nodes and edges of a heterogeneous information network (HIN) with semi-structured data as properties of nodes and edges in a central place to which other features can refer such as entity resolution.

The classes EntitySchema, NodeSchema and EdgeSchema can be sub-classes to define the types of relations to be extracted.

Use of FlairNLP

FlairNLP is the most commonly used project for Named Entity Recognition and relationships extraction. Flair makes it easy to stack embeddings of different types - for example character and word embeddings as in a flair model.

See the following tutorials:

Features

We need to define the minimum features required to support the integration of these two libraries.
Using flair and transfer learning to perform NER and relation extraction makes the tasks primarily a labeling problem. Platforms like snorkel and skweak are helpful for generating labels programmatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant