Skip to content

Knowledge-Graph-Hub/neat-ml

Repository files navigation

Network Embedding All the Things (NEAT)

Quality Gate Status Maintainability Rating Coverage

NEAT is a flexible pipeline for:

  • Parsing a graph serialization
  • Generating node and edge embeddings
  • Training classifiers for link prediction and label expansion
  • Making predictions
  • Creating well formatted output and metrics for the predictions
  • Doing all of the above reproducibly, with cloud compute (or locally, if preferred)

Quick Start

pip install neat-ml
neat run --config neat_quickstart.yaml # This example file is in the repo here

NEAT will write graph embeddings to a new quickstart_output directory.

Requirements

This pipeline has grape as a major dependency.

Methods from tensorflow and are supported, but are not installed as dependencies to avoid version conflicts.

Please install the versions of tensorflow, scikit-learn, CUDA, and cudnn compatible with your system and with each other prior to installing NEAT if you wish to use these methods.

On Linux, the tensorflow installation may be easiest using conda as follows:

wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O anaconda.sh
bash ./anaconda.sh -b
echo "export PATH=\$PATH:$HOME/anaconda3/bin" >> $HOME/.bashrc
conda init
conda install cudnn
conda install tensorflow

Installation

pip install neat-ml

Running NEAT

neat run --config tests/resources/test.yaml # example
neat run --config [your yaml]

The pipeline is driven by a YAML file (e.g. tests/resources/test.yaml), which contains all parameters needed to complete the pipeline. The contents and expected values for this file are defined by the neat-ml-schema.

This includes hyperparameters for machine learning and also things like files/paths to output results. Specify paths to node and edge files:

GraphDataConfiguration:
  graph:
    directed: False
    node_path: path/to/nodes.tsv
    edge_path: path/to/edges.tsv

If the graph data is in a compressed file and/or a remote location (e.g., on KG-Hub), one or more URLs may be specified in the source_data parameter:

GraphDataConfiguration:
  source_data:
    files:
      - path: https://kg-hub.berkeleybop.io/kg-obo/bfo/2019-08-26/bfo_kgx_tsv.tar.gz
        desc: "This is BFO, your favorite basic formal ontology, now in graph form."
      - path: https://someremoteurl.com/graph2.tar.gz
        desc: "This is some other graph - it may be useful."

A diagram explaining the design a bit is here.

If you are uploading to AWS/S3, see here for configuring AWS credentials:

Credits

Developed by Deepak Unni, Justin Reese, J. Harry Caufield, and Harshad Hegde.

About

Network Embedding All the Things

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •