Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce query ids by only using bound nodes for pfocr figure search #211

Merged
merged 2 commits into from
Aug 22, 2024

Conversation

tokebe
Copy link
Member

@tokebe tokebe commented Aug 21, 2024

Doesn't address all relevancy problems due to creative mode interactions (pfocr augmentation happens at template-level, therefor not grabbing explicitly the bound nodes in the final inferred response), however this should ensure slightly higher relevancy of a given figure to the template it came from, in turn making it slightly more relevant to the overall inferred result.

Most importantly, because only the bound nodes are used, fewer ids are being used to query pfocr, which should decrease overall pfocr search time. Full traverseResultforNodes is still used in scoring and for matchedQueries, so the score should still be slightly higher-quality.

@colleenXu
Copy link
Contributor

colleenXu commented Aug 21, 2024

Related to previous work here biothings/biothings_explorer#837, probably trying to improve the amount of time spent on pfocr result-augment queries?

Also maybe related to biothings/biothings_explorer#847 (comment)

@colleenXu
Copy link
Contributor

colleenXu commented Aug 21, 2024

@tokebe

I see something odd in pfocr-test-creative-increases-ADRB2.json.zip.
I found some matched curies in the first few results' pfocr sections that aren't in the KG at all. I suspect that they come from the result nodes' IDs from other namespaces (mistakenly treated as NCBIGene IDs)

  • NCBIGene:285 (ANGPT2 in the linked figure)
    • VS the result does have the intermediate ADRB1 (NCBIGene:153), which has the equivalent ID HGNC:285
  • NCBIGene:5816 (PVALB in the linked figure)
    • VS the result does have the chem (-)-adrenaline ("CHEBI:28918"), which has the equivalent ID PUBCHEM.COMPOUND:5816
  • NCBIGene:64689 (GORASP1 in the linked figure)
    • VS the result does have the chem D-glucose ("CHEBI:4167"), which has the equivalent ID PUBCHEM.COMPOUND:64689

I also see that in most cases, matchedCuries don't include the input Gene NCBIGene:154 (ADRB2)...but I think you said this was okay?
(My understanding is that the query process was adjusted - but PFOCR result-augmentation still runs on a template-level, where there are intermediate QNodes for intermediate genes to be bound to. So figures can match based ONLY on those intermediate genes)

@tokebe
Copy link
Member Author

tokebe commented Aug 22, 2024

Fixed.

@tokebe tokebe merged commit f90197c into main Aug 22, 2024
0 of 2 checks passed
@tokebe tokebe added On CI Related changes are deployed to CI server On CI -> Test Related changes are on CI server, pending Test deployment On Test Related changes are deployed to Test server and removed On CI Related changes are deployed to CI server On CI -> Test Related changes are on CI server, pending Test deployment labels Aug 22, 2024
@colleenXu colleenXu added On Test -> Prod and removed On Test Related changes are deployed to Test server labels Aug 28, 2024
@tokebe tokebe deleted the pfocr-changes branch September 3, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants