Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query caching #37

Open
bschilder opened this issue Aug 22, 2024 · 2 comments
Open

Query caching #37

bschilder opened this issue Aug 22, 2024 · 2 comments

Comments

@bschilder
Copy link
Contributor

One thing to consider is adding query caching.

If the exact same query is run more than once within some time frame (or say within a single R session) it might be desirable to enable caching to speed up the same query when run subsequently.

For example, the first time this runs it takes ~8 seconds. If cached, it could be instantaneous to run it a second time.

monarch <- monarch_engine()

alz_diseases <- monarch |>
	fetch_nodes(query_ids = "MONDO:0004975") |>
	expand(predicates = "biolink:subclass_of", direction = "in", transitive = TRUE)

That said, we would need some way of tracking exactly how the query was constructed, and if anything was modified (inputs IDs, arguments, global options, etc.) we would need to automatically detect this and rerun the query from scratch.

Speaking of global options, if we do implement caching it would be good to have a way of turning it off globally (through setting options vars) or locally (wrapping some function within another function that forces a fresh query only for the code wrapped within the function, eg nocache({id="HP:00001"; fun1(id)})).

@oneilsh
Copy link
Collaborator

oneilsh commented Aug 27, 2024

Good call - yeah the amount of time to cache is an open question. Perhaps that (and disabling cacheing) is something that can be set at the engine level via the preferences feature? I think I like the within a single R session option - safe and should be easy to implement. I could also see longer term cacheing (e.g. 2 weeks) with the obvious caveat of not fetching fresh data from the graph - which might cause an issue if we cache a query and then a later query pulls updated info that conflicts somehow.

I'm thinking the memoise package applied to the lower level Neo4j functions (cypher_query.neo4j_engine and cypher_query_df.neo4j_engine) would be the right place, with some logic for checking if caching is disabled.

@bschilder
Copy link
Contributor Author

I'm thinking the memoise package applied to the lower level Neo4j functions (cypher_query.neo4j_engine and cypher_query_df.neo4j_engine) would be the right place, with some logic for checking if caching is disabled.

That makes sense to me, tying the caching closer to the neo4j queries is probably the best way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants