Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR builds on the work in the initial PR to move business rules t…
…o celery along with info learned deploying this. Avoid filling the task queue with orchestration tasks and starving the workers. =============================================================================== In the previous system there were about 3 layers of tasks, that orchestrated other tasks, by using the .replace() API in each task. Unfortunately it was possible for celery workers to become full of orchestration tasks leaving no room for the business rule tasks at the bottom of the to actually run. This PR attempts two mitigations: 1. Use celery workflows instead of .replace() This PR builds a celery workflow in the check_workbasket using celery constructs such as chain and group. In theory, since most of the work is done ahead of time the system should have more awareness of the task structure avoiding the issue of starvation. 2. Cancel existing workbasket checks when a new check is requested. When check_workbasket is started, it will attempt to revoke existing check_workbasket tasks for the same workbasket. Treat intermediate data structures as ephemeral =============================================== A celery task may execute at any time, right now - or when a system comes up tomorrow, based on this assumption models such as TrackedModelCheck (which stores the result of a business rule check on a TrackedModel) are no longer passed to celery tasks by ID, instead all the information needed to receate the data is passed to the celery task, this means the system will still work even if developers delete these while it is running. Reduce layers in business rule checking ======================================= BusinessRuleChecker and LinkedModelsBusinessRuleChecker are now the only checkers, these now take BusinessRule instances, instead of being subclassed for each business rule. While more parameters are passed when rules are checked a conceptual layer has been removed and the simplification is reflected with around 20 lines of code being removed from checks.py Celery flower is now very easier to read ======================================== Due to the changes above, the output in celery flower should correspond more closely to a users intentions - ids of models. Content Checksums ================= Result caching now validates using checksums of the content, which should reduce the amount of checking the system needs to do. When a workbasket has been published, it's content could invalidate some content in other unpublished workbaskets, by associating business rule checks with checksums of a models content, any models that do not clash can be skipped. Model checksums (generated by `.content_hash()`) are not currently stored in the database (though it may be desirable to store them on TrackedModels, as it would provide an mechanism to address any content in the system). The checksuming scheme is a combination of the type and a sha256 of the fields in `.copyable_fields` (which should represent the fields a user can edit, but not fields such as pk). Blake3 was tested, as it provides a fast hashing algorithm, in practice it didn't provide much of a speedup over sha256. PK ranges ========= Occasionally workbaskets with many items may need to be checker (the initial workbasket has 9 million items). Based on the observations that the ID column of the contained TrackedModels is mostly continguous, the system allows passing sequences of contiguous TrackedModels specified by tuples of (first_pk, last_pk). This is relatively compact, suitable for passing over the network with celery and readable in Celery flower. This also enables chunking of tasks - further enabled by specifying a maximum amount of items in each tuple. On TrackedModelQueryset `.as_pk_intervals` and `.from_pk_intervals` are provided to go to and from this format. Greets ====== This PR adapts changes and builds on the hard work done in the initial work to check the business rules with celery, thanks to Simon Worthington and the hard work of the other devs on the project.
- Loading branch information