Reduce query ids by only using bound nodes for pfocr figure search #211

tokebe · 2024-08-21T20:46:26Z

Doesn't address all relevancy problems due to creative mode interactions (pfocr augmentation happens at template-level, therefor not grabbing explicitly the bound nodes in the final inferred response), however this should ensure slightly higher relevancy of a given figure to the template it came from, in turn making it slightly more relevant to the overall inferred result.

Most importantly, because only the bound nodes are used, fewer ids are being used to query pfocr, which should decrease overall pfocr search time. Full traverseResultforNodes is still used in scoring and for matchedQueries, so the score should still be slightly higher-quality.

colleenXu · 2024-08-21T21:04:21Z

Related to previous work here biothings/biothings_explorer#837, probably trying to improve the amount of time spent on pfocr result-augment queries?

Also maybe related to biothings/biothings_explorer#847 (comment)

colleenXu · 2024-08-21T22:01:40Z

@tokebe

I see something odd in pfocr-test-creative-increases-ADRB2.json.zip.
I found some matched curies in the first few results' pfocr sections that aren't in the KG at all. I suspect that they come from the result nodes' IDs from other namespaces (mistakenly treated as NCBIGene IDs)

NCBIGene:285 (ANGPT2 in the linked figure)
- VS the result does have the intermediate ADRB1 (NCBIGene:153), which has the equivalent ID HGNC:285
NCBIGene:5816 (PVALB in the linked figure)
- VS the result does have the chem (-)-adrenaline ("CHEBI:28918"), which has the equivalent ID PUBCHEM.COMPOUND:5816
NCBIGene:64689 (GORASP1 in the linked figure)
- VS the result does have the chem D-glucose ("CHEBI:4167"), which has the equivalent ID PUBCHEM.COMPOUND:64689

I also see that in most cases, matchedCuries don't include the input Gene NCBIGene:154 (ADRB2)...but I think you said this was okay?
(My understanding is that the query process was adjusted - but PFOCR result-augmentation still runs on a template-level, where there are intermediate QNodes for intermediate genes to be bound to. So figures can match based ONLY on those intermediate genes)

tokebe · 2024-08-22T14:55:51Z

Fixed.

chore: reduce query ids by only using bound nodes

79e7175

fix: only keep supported prefixes

5a527a5

tokebe merged commit f90197c into main Aug 22, 2024
0 of 2 checks passed

colleenXu added On Test -> Prod and removed On Test Related changes are deployed to Test server labels Aug 28, 2024

tokebe mentioned this pull request Aug 28, 2024

PFOCR result-augmentation: adding chem/disease support biothings/biothings_explorer#847

Open

tokebe deleted the pfocr-changes branch September 3, 2024 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce query ids by only using bound nodes for pfocr figure search #211

Reduce query ids by only using bound nodes for pfocr figure search #211

tokebe commented Aug 21, 2024

colleenXu commented Aug 21, 2024 •

edited

Loading

colleenXu commented Aug 21, 2024 •

edited

Loading

tokebe commented Aug 22, 2024

Reduce query ids by only using bound nodes for pfocr figure search #211

Reduce query ids by only using bound nodes for pfocr figure search #211

Conversation

tokebe commented Aug 21, 2024

colleenXu commented Aug 21, 2024 • edited Loading

colleenXu commented Aug 21, 2024 • edited Loading

tokebe commented Aug 22, 2024

colleenXu commented Aug 21, 2024 •

edited

Loading

colleenXu commented Aug 21, 2024 •

edited

Loading