Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial caching #336

Open
jimmymathews opened this issue Jul 10, 2024 · 0 comments
Open

Partial caching #336

jimmymathews opened this issue Jul 10, 2024 · 0 comments
Labels
feature New feature

Comments

@jimmymathews
Copy link
Collaborator

As predicted, the performance of the application for datasets with a large number of samples has decreased in the new architecture (parallelized, stateless workers).
Since as part of that work the binary payloads representing each sample's data have been reduced greatly, it should be feasible to have the worker pods cache the intermediate data and reduce the internal bandwidth usage, database connections, etc.

There are two levels possible:

  1. Just cache, in memory in the pods, the payloads coming from the database for a given sample, provided that the payload is one of the relatively small ones. This can be subject to a liberal LRU eviction policy, say after roughly 100MB of cache.
  2. Also cache, in memory in the pods, for each metric type requested, the metric-specific data structure which is created just before computation.

In either case, the pods which have cached some sample (case 1) or some sample preloaded-for-a-metric (case 2) should preferentially take jobs from the queue which they can perform better than the others due to this caching, i.e. jobs for those cached samples.

I think that this strategy can be used rather than a previously considered strategy to restore the monolithic per-metric-type, data-preloaded containers only in the case of the small datasets. I think it is better to have a single design and not try to support too many different computation pipelines.

@jimmymathews jimmymathews added the feature New feature label Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

No branches or pull requests

1 participant