Chunking for Bayes: looking for feedback #1950
mike-lawrence
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When doing Bayesian inference we typically have a model with a number of array-representable variables, ex:
and inference yields "samples" of the model variables akin to that produced by the following pseudo-code:
I'm currently conflicted on how best to specify chunking for this scenario. The data are generated such that iteration is slowest over those two last/new dimensions, which might suggest chunking using a low chunk size for those dimensions, yet the typical computations that are performed on the results will typically be of the kind where compute is done across samples, and possibly across both chains and samples, i.e.:
So designing chunking to accommodate the way the data fill the arrays will end up yielding poor performance for the likely later read patterns. Yet chunking that seeks to optimize for the likely later read patterns will mean a lot of waiting for enough samples to accrue between write operations, which isn't a huge deal, but impedes aspirations I have for doing compute on intermediate results.
Seem like an unavoidable trade-off?
Beta Was this translation helpful? Give feedback.
All reactions