Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple connected components in mondo ontology #495

Open
kmanpearl opened this issue Jul 31, 2024 · 0 comments
Open

multiple connected components in mondo ontology #495

kmanpearl opened this issue Jul 31, 2024 · 0 comments

Comments

@kmanpearl
Copy link

I was expecting the processed mondo ontology to only have 1 connected component but this is not the case. I am not sure if this is because I am misunderstanding the processing steps, I am missing some function call/argument in my code (shown below), or a bug.

from obnb.data import MondoDiseaseOntology
root = '../data/obnb/FullyRedundant'
dat = MondoDiseaseOntology(root=root)
g = dat.data
undirected = g.to_undirected_sparse_graph()
len(undirected.connected_components())
# 3136

There were 2 connected components, one with ~23k nodes and one with 41 nodes, and the rest of the ~3k components are nodes with no edges in the ontology.

This caused a further problem because the term MONDO:0006560 has gene annotations in obnb but no ontology edges, thus when using an edge list to create node embeddings it is not considered part of the ontology. I had to manually remove this term from the gene set collection before I could use my net2onto method with mondo.

If I am misunderstanding and this is not a feature that is implemented, then can we please add a feature that filters ontologies to only contain the largest connected component? Or fix it if it is a bug? And if I am just missing something in my code then please let me know what the proper way to process the ontology is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant