ADR - content hashing caching strategy for business rules. #650

stuaxo · 2022-08-11T11:09:06Z

When a workbasket is published it may clash with any of the unpublished workbaskets - however there are a class of business rules whose results should still be valid if the content of the model they reference hasn't changed.

This ADR proposes a content checksum mechanism for models, that checksum can be saved and used to check if a model has changed.

The content checksum only covers user editable fields.

paulpepper-trade · 2022-08-11T11:38:20Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+
+After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future.
+
+This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent.


I'm uncertain whether there are fields that can be mutated but are not user editable. If there is a difference, and editable fields are a subset of mutable fields, then would mutable be a preferable term here?
Tiny typo "... check if to versions".

paulpepper-trade · 2022-08-11T14:34:18Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.
+
+
+The hash generated in this ADR will not currently be attached to TrackedModel - thought it may be desirable in future.


Attached == cached?

Saved to it's own field I guess.

paulpepper-trade · 2022-08-11T14:45:52Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+The fields to be hashed are contained in `copyable_fields` but since this doesn't communicate the intent used here, this will be proxied as `mutable_fields`, which is explicitly about providing which fields can be hashed to check equivilence.
+
+
+TrackedModel will gain an API named get_content_hash() that returns a sha256 hash.


I suppose it only matters that it's a sufficiently unique hash.

paulpepper-trade · 2022-08-11T14:50:23Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+
+This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change.
+
+When a workbasket is published, the validity of all other unpublished workbaskets must be verified - with this system, most of the business rule checks would remain valid and the only the checksums need to be verified.


... only the checksums need to be verified.

May be clearer to say "... only the content hash needs to be verified."?

paulpepper-trade · 2022-08-12T10:47:28Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+Context
+-------
+
+After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future.


We talked about what can invalidate a check after it has completed. We know that after a check has been run, then newly published tariff data can cause some types of check to become invalid - an ME32 check against a Measure, for instance. So hashing doesn't help avoid the need to run a check again in those circumstances.
It would be worth pointing out that caveat here.

Yep, this is really important - I've updated the PR description to cover this, and will update the ADR to reflect this (date validity checkers will be their own adr)

gabelton

Really cool stuff! I would love an example involving a current business rule, if at all possible, just so that I can visualise it / walk it through in my head, but looks great apart from that. Please excuse the pedantic spelling comments...

gabelton · 2022-08-15T14:59:33Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+
+TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point.
+
+Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.


Is it possible to give an example where two different types produce the same hash? It's probably trivial, but I have Monday brain

gabelton · 2022-08-15T15:05:09Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+The fields to be hashed are contained in `copyable_fields` but since this doesn't communicate the intent used here, this will be proxied as `mutable_fields`, which is explicitly about providing which fields can be hashed to check equivilence.
+
+
+TrackedModel will gain an API named get_content_hash() that returns a sha256 hash.


Is it possible to be more specfic than "API"? Would it be a class_method? As I read it currently, it sounds a bit like the hash will be attached to the tracked model, which contradicts what's written on L28. I know that's not the intent, but just be good to clarify

Good point, I guess I meant a method - e.g. .content_hash(), vs something we save forever in a field.

gabelton · 2022-08-15T15:09:09Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+Consequences
+------------
+
+This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change.


Will part of this work involve creating a list of rules to which this kind of caching will be applicable?

Yes, though it doesn't need to be exhaustive - as it can be extended if needed later.

gabelton · 2022-08-15T15:10:19Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+
+After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future.
+
+This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent.


--> mechanism
--> equivalent

gabelton · 2022-08-15T15:11:17Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+
+This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent.
+
+In the simplest sense this means hashing fields that exclude the PK.


What's meant by "fields that exclude the PK" ?

gabelton · 2022-08-15T15:11:58Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+
+TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point.
+
+Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.


--> necessary

gabelton · 2022-08-15T15:12:17Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+
+TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point.
+
+Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.


is it worth adding a link to docs on python's default hashing?

gabelton · 2022-08-15T15:12:35Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.
+
+
+The hash generated in this ADR will not currently be attached to TrackedModel - thought it may be desirable in future.


gabelton · 2022-08-15T15:13:25Z

docs/source/adr/0016-content-checksums-for-tracked-models.rst

+Consequences
+------------
+
+This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change.


--> becoming

ADR on adding content checksums to TrackedModels.

9a5d21b

stuaxo force-pushed the adr-trackedmodel-content-checksums-for-business-rules branch from 7fc822e to 9a5d21b Compare August 11, 2022 11:10

paulpepper-trade approved these changes Aug 11, 2022

View reviewed changes

paulpepper-trade reviewed Aug 12, 2022

View reviewed changes

stuaxo changed the title ~~ADR on adding content checksums to TrackedModels.~~ ADR - content hashing caching strategy for business rules. Aug 12, 2022

gabelton approved these changes Aug 15, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR - content hashing caching strategy for business rules. #650

ADR - content hashing caching strategy for business rules. #650

stuaxo commented Aug 11, 2022 •

edited

Loading

paulpepper-trade Aug 11, 2022

paulpepper-trade Aug 11, 2022

stuaxo Aug 16, 2022

paulpepper-trade Aug 11, 2022

paulpepper-trade Aug 11, 2022

paulpepper-trade Aug 12, 2022

stuaxo Aug 12, 2022

gabelton left a comment

gabelton Aug 15, 2022

gabelton Aug 15, 2022

stuaxo Aug 16, 2022 •

edited

Loading

gabelton Aug 15, 2022

stuaxo Aug 16, 2022

gabelton Aug 15, 2022

gabelton Aug 15, 2022

gabelton Aug 15, 2022

gabelton Aug 15, 2022

gabelton Aug 15, 2022

gabelton Aug 15, 2022


		After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future.

		This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent.

		Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.


		The hash generated in this ADR will not currently be attached to TrackedModel - thought it may be desirable in future.

		The fields to be hashed are contained in `copyable_fields` but since this doesn't communicate the intent used here, this will be proxied as `mutable_fields`, which is explicitly about providing which fields can be hashed to check equivilence.


		TrackedModel will gain an API named get_content_hash() that returns a sha256 hash.


		This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change.

		When a workbasket is published, the validity of all other unpublished workbaskets must be verified - with this system, most of the business rule checks would remain valid and the only the checksums need to be verified.


		TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point.

		Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name.


		This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent.

		In the simplest sense this means hashing fields that exclude the PK.

ADR - content hashing caching strategy for business rules. #650

Are you sure you want to change the base?

ADR - content hashing caching strategy for business rules. #650

Conversation

stuaxo commented Aug 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabelton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuaxo Aug 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuaxo commented Aug 11, 2022 •

edited

Loading

stuaxo Aug 16, 2022 •

edited

Loading