Skip to content

fastwlk is a Python package that implements a fast version of the Weisfeiler-Lehman kernel for sparse graphs.

License

Notifications You must be signed in to change notification settings

pjhartout/fastwlk

Repository files navigation

FastWLK

https://codecov.io/gh/pjhartout/fastwlk/branch/main/graph/badge.svg?token=U054MJONED

Quick Links

Documentation

Installation

Usage

Contributing

What does fastwlk do?

fastwlk is a Python package that implements a fast version of the Weisfeiler-Lehman kernel. It manages to outperform current state-of-the-art implementations on sparse graphs by implementing a number of improvements compared to vanilla implementations:

  1. It parallelizes the execution of Weisfeiler-Lehman hash computations since each graph's hash can be computed independently prior to computing the kernel.
  2. It parallelizes the computation of similarity of graphs in RKHS by computing batches of the inner products independently.
  3. When comparing graphs, lots of computations are spent processing positions/hashes that do not actually overlap between Weisfeiler-Lehman histograms. As such, we manually loop over the overlapping keys, outperforming numpy dot product-based implementations on collections of sparse graphs.

This implementation works best when graphs have relatively few connections compared to the number of possible connections and are reasonably dissimilar from one another. If you are not sure the graphs you are using are either sparse or dissimilar enough, try to benchmark this package with others out there using this script.

How fast is fastwlk?

Running the benchmark script in examples/benchmark.py shows that for the graphs in data/graphs.pkl, we get an approximately 80% speed improvement over other implementations like grakel. The example dataset contains 2-nn graphs extracted from 100 random proteins from the human proteome from the AlphaFold EBI database.

To see how much faster this implementation is for your use case:

$ git clone git://github.com/pjhartout/fastwlk
$ poetry install
$ poetry run python examples/benchmark.py

You will need to swap out the provided graphs.pkl with a pickled iterable of graphs from the database you are interested in.

About

fastwlk is a Python package that implements a fast version of the Weisfeiler-Lehman kernel for sparse graphs.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published