Chunking for Bayes: looking for feedback #1950

mike-lawrence · 2024-06-04T15:31:39Z

mike-lawrence
Jun 4, 2024

When doing Bayesian inference we typically have a model with a number of array-representable variables, ex:

real a // i.e. array[1]
array[2] b ;
array[3,4] c ;

and inference yields "samples" of the model variables akin to that produced by the following pseudo-code:

// initialize variables with two extra dimensions (in this example, arbitrarily placed at the end)
array[1,num_chains,num_samples_per_chain] a ; 
array[2,num_chains,num_samples_per_chain] b ;
array[3,4,num_chains,num_samples_per_chain] c ;

for chain in num_chains: // this is sometimes parallelized
    for sample in num_samples_per_chain: // this is sometimes parallelized (but still "within chain")
        // computations here often take a *long* time
        a[...,chain,sample] = ... ; 
        b[...,chain,sample] = ... ;
        c[...,chain,sample] = ... ;

I'm currently conflicted on how best to specify chunking for this scenario. The data are generated such that iteration is slowest over those two last/new dimensions, which might suggest chunking using a low chunk size for those dimensions, yet the typical computations that are performed on the results will typically be of the kind where compute is done across samples, and possibly across both chains and samples, i.e.:

// compute on b across samples but within chains
for dim1 in dims(b)[1]:
    for dim2 in dims(b)[2]:
        for chain in dims(b)[3]:
            do_something( b[dim1,dim2,chain,:] )

// compute on b across samples and chains
for dim1 in dims(b)[1]:
    for dim2 in dims(b)[2]:
            do_something( b[dim1,dim2,:,:] )

So designing chunking to accommodate the way the data fill the arrays will end up yielding poor performance for the likely later read patterns. Yet chunking that seeks to optimize for the likely later read patterns will mean a lot of waiting for enough samples to accrue between write operations, which isn't a huge deal, but impedes aspirations I have for doing compute on intermediate results.

Seem like an unavoidable trade-off?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunking for Bayes: looking for feedback #1950

{{title}}

Replies: 0 comments

Select a reply

Chunking for Bayes: looking for feedback #1950

mike-lawrence Jun 4, 2024

Replies: 0 comments

mike-lawrence
Jun 4, 2024