Skip to content

YuxingLu613/HMKG-Human-Metabolome-Knowledge-Graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

HMKG: Human Metabolome Knowledge Graph

Introduction

The Human Metabolome Database (HMDB) is the largest metabolome database in the world. We provide a toolkit that can transform HMDB into a knowledge graph, named HMKG, which will help researchers get to know the relations among metabolities and provide a more intuitive and comprehensive understanding of metabolic processes. Additionally, we encourage researchers to apply deep learning and representation learning techniques on HMKG, which could help advance research in this field using AI.Also, we would like to use HMKG to introduce deep learning and representation learning techniques into metabolomics, thus boosting the AI development in metabolomics field.

Goal and Philosophy

  • Fill the blank of KG in Metabolomics.
  • Explore the relationship between metabolites.
  • Promote the development of metabolomics research.

Content

.
├── README.md
├── build_graph.py
├── cal_simlarity.ipynb
├── convert_xml.py
├── data
│   ├── hmdb_metabolities.json (download by your own)
│   ├── hmdb_metabolities.json
│   └── selected_metabolities.csv
├── main.py
├── output
│   └── HMDB_embedding.json
├── requirements.txt
├── select_metabolites.py
└── utils.py

Requirements

py2neo==4.3.0

pandas==1.4.4

xmltodict==0.12.0

Usage

To generate HMKG, you need to first download the HMDB All Metabolitesdata file from https://hmdb.ca/downloads and store it into the ./data file.

Then you can run the code below.

python main.py -XML_DATA_PATH ./data/hmdb_metabolities.xml \ 
	       -JSON_DATA_PATH ./data/hmdb_metabolities.json \
	       -CREATE_GRAPH True

If you want to generate a subgraph using specific metebolities, you can add the required metabolities in the ./data/select_metabolities.csv and add argument

python main.py -XML_DATA_PATH ./data/hmdb_metabolities.xml \ 
	       -JSON_DATA_PATH ./data/hmdb_metabolities.json \
	       -CREATE_GRAPH True \
	       -SELECT_METABOLITIES ./data/select_metabolities.csv

If you want generate triples for further researches or downstream tasks, you can add argument

python main.py -XML_DATA_PATH ./data/hmdb_metabolities.xml \ 
	       -JSON_DATA_PATH ./data/hmdb_metabolities.json \
	       -CREATE_GRAPH True \
	       -CREATE_TRIPLE True

More functions are being developed~

KG Embedding

We recommend you to use the generated triples to conduct KG embedding and get the representation vector of the metabolities.

Some recommended repositories are listed below:


Here we apply the KG Embedding methods in https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding Repository, and get the TransE embedding of the sub knowledge graph containing 474 metabolities.

The Statistics of the subgraph are as follows:

Metabolities Nodes Relations Triples
474 93231 17 191,479

The TransE metrics are as follows:

MRR MR HITS@1 HITS@3 HITS@10
0.2715 8598.2 0.1841 0.3104 0.4628

To calculate the similarity between metabolities' embeddings, you can use the cal_similarity.ipynb and enter two HMDB number to get the similarity.

KG Embedding Results

TBA

Statistics

TBA

Citation

TBA

About

The Knowledge Graph (KG) built using data from HMDB

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published