Skip to content

Rollback Support

yogeshnachnani edited this page Feb 23, 2016 · 4 revisions

Consider the example of a Resource allocator below that needs to allocate N resources with a batch size of B. If each batch can be executed independently, it makes sense to do it in parallel using Flux tasks

The same flow can be modelled using flux primitives as below:

num_batches := N/B
for batch_num in num_batches
    new Task() {
        execute() {
            allocate(batch_num)
        }
    }
end

In Flux, each task unit can be executed independently on any of the available worker node. Thus, each task can fail independently - either due to a semantic error (failure to allocate resources) or due to a runtime failure (task timed out). It can be difficult to bring the system back to a stable state in case some tasks succeed and some failed.

As a convenience, users can define rollback() methods for each task to handle any cleanup activities. Flux would automatically trigger rollbacks in case of an un handled or unexpected error.

Additionally, users can use the flux context to save any local context that may be needed for a rollback

num_batches := N/B
for batch_num in num_batches
    new Task() {
        execute() {
            save batch_num to flux context
            allocate(batch_num)
        }
        rollback() {
            retrieve batch_num from flux context
            de_allocate(batch_num)
        }  
    }
end
Clone this wiki locally