You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was expecting the processed mondo ontology to only have 1 connected component but this is not the case. I am not sure if this is because I am misunderstanding the processing steps, I am missing some function call/argument in my code (shown below), or a bug.
from obnb.data import MondoDiseaseOntology
root = '../data/obnb/FullyRedundant'
dat = MondoDiseaseOntology(root=root)
g = dat.data
undirected = g.to_undirected_sparse_graph()
len(undirected.connected_components())
# 3136
There were 2 connected components, one with ~23k nodes and one with 41 nodes, and the rest of the ~3k components are nodes with no edges in the ontology.
This caused a further problem because the term MONDO:0006560 has gene annotations in obnb but no ontology edges, thus when using an edge list to create node embeddings it is not considered part of the ontology. I had to manually remove this term from the gene set collection before I could use my net2onto method with mondo.
If I am misunderstanding and this is not a feature that is implemented, then can we please add a feature that filters ontologies to only contain the largest connected component? Or fix it if it is a bug? And if I am just missing something in my code then please let me know what the proper way to process the ontology is.
The text was updated successfully, but these errors were encountered:
I was expecting the processed mondo ontology to only have 1 connected component but this is not the case. I am not sure if this is because I am misunderstanding the processing steps, I am missing some function call/argument in my code (shown below), or a bug.
There were 2 connected components, one with ~23k nodes and one with 41 nodes, and the rest of the ~3k components are nodes with no edges in the ontology.
This caused a further problem because the term MONDO:0006560 has gene annotations in obnb but no ontology edges, thus when using an edge list to create node embeddings it is not considered part of the ontology. I had to manually remove this term from the gene set collection before I could use my net2onto method with mondo.
If I am misunderstanding and this is not a feature that is implemented, then can we please add a feature that filters ontologies to only contain the largest connected component? Or fix it if it is a bug? And if I am just missing something in my code then please let me know what the proper way to process the ontology is.
The text was updated successfully, but these errors were encountered: