Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative implementation for complex number support #5614

Closed

Conversation

janssenhenning
Copy link
Contributor

@janssenhenning janssenhenning commented Aug 6, 2022

Also see #5561

In this PR the serialization isn't done on the low level for each backend but on the NodeAttributes/NodeExtras class. Added a function deserialize_value, which is the counterpart of clean_value, which is inserted into the attribute retrieval when the node is stored

@sphuber Support for this could also be added on the BackendNode and the corresponding extras mixin but I didn't see a clean way to change this without renaming/changing the abstractmethods needed when implementing a backend. In this case you could add something like set/get_raw_attribute as the abstractmethods (values are always assumed json-serializable when the node is stored) and set/get_attribute would add the serialization.
atm the BackendNode of psql_dos also still calls clean_value, since this behaviour is explicitly tested on the BackendNode directly in many places.

Problems/TODOs:

  • Support for complex attributes as filters in the QueryBuilder is really messy, since it doesn't have a nice layer to insert serialization of the filters in a backend agnostic way. Maybe something like ✨ NEW: Add orm.Entity.fields interface for QueryBuilder #5088 would be useful for this
  • Support for complex numbers in Jsonable. It would work in principle but the initial serialization json.loads(json.dumps(dictionary)) would need to be modified. Simply wrapping dictionary with clean_value does not work, since the class wants to support NaN/inf by converting them into strings

Alternative implementation of aiidateam#5561

These are stored as objects of the form
`{'__complex__': True, 'real': x.real, 'imag': x.imag}` with the
`__complex__` key being used as the identifier for detecting these
objects for deserializations.

For serialization/deserialization we use clean_value and a new function
deserialize_value. `clean_value` is pulled out of BackendNode and is
now done on the level of NodeAttributes and NodeExtras. The
deserialization is added to the get/get_many and all routines of these
classe

Added support for complex attributes in QueryBuilder
complex numbers are compared like objects. For comparisons
attribute.complex.real/imag can be used

Open Questions:
  - What are the performance implications of the added deserialization?
    (Since clean_value is already called I think there would be no impact
     from that side)
  - Jsonable class should handle complex numbers. Setting attributes
    work, but the initial check json.loads(json.dumps(...)) fails
For the psql_dos backend the value in filters must be cleaned beforehand
since otherwise the cast to JSONB will fail since the complex
serialization is not registered in the backend
@janssenhenning janssenhenning force-pushed the feature/complex-data-v2 branch from 0f84acb to 9f746ec Compare November 21, 2022 20:39
@sphuber
Copy link
Contributor

sphuber commented Nov 21, 2022

Thanks @janssenhenning , apologies for the radio silence on this. Now that we have the hackathon, I think it would be good to have a final discussion about this (maybe in person tomorrow would be good, with those that are interested) and then make a decision on how to best implement this.

I went over the issue and your PR again, and I will write a few thoughts down here that I had that I would like to discuss.

Current situation

  • There only is "serialization" of data that should be stored on a Node as attributes and extras, which is provided by the aiida.orm.implementation.utils.clean_value function. The concept is that the clean_value returns input data such that if it makes a round-trip to being stored as JSON, when it is retrieved, it would be unaltered.
  • The clean_value is currently only called by the PsqlDosBackend storage backend implementation. It calls it only when attributes and extras of stored nodes are mutated and just before a new node is stored. This is done for efficiency purposes, such that attributes and extras are not cleaned multiple times before being stored indefinitely.

The concept

  • Until recently, the data storage in AiiDA was hardcoded, but recently it has become pluginnable and in principle, plugin packages can implement their own data storage backend. I say in principle, because it is not sure whether the abstract interface aiida.orm.implementation.storage_backend.StorageBackend is sufficiently abstracted to completely replace all functionality. Most probable candidate for things to break is the QueryBuilder. That being said, there are already storage backend plugins that simply use the PsqlDosBackend for the database implementation but then extend/replace the file repository.
  • The main promise of AiiDA's data storage is that it consists of two categories of data:
    • JSON-serializable data that can be queried
    • Binary blobs
  • It seems logical that the connection between the front-end ORM and the backend data storage should then only ever communicate in these data types. A storage backend implementation that implements the abstract interface then knows that it only ever has to deal with binary blobs or JSON data.
  • If this could be a hard-contract guaranteed by aiida-core it would make it easier to implement consistent storage backends.

What is missing

  • Given the above, I think it would be good to move the responsibility of serializing/deserializing JSON data from the storage backend implementation to the abstract backend interface.
  • The advantage is that the behavior would be consistent across backend implemenations
  • The disadvantage is that it is probably a bit more restrictive and might limit the implementation to implement the most efficient method (see for example how PsqlDosBackend only serializes just before storing a node, or when mutating stored nodes)
  • @janssenhenning already touched on this on how to potentially do this. We could chose to implement the serialization/deserialization directly on the getters and setters of the BackendNode and BackendEntityExtrasMixin. However, this would make the abstract interface more concrete and require implementations to call super, with the risk of forgetting this and the serialization not being called.
  • Alternative is to keep the getters/setters abstract, and provide a concrete serialization and deserialization methods on the abstract interface that implementation can call. This has the same problem though where an implementation can fail to call the serializer/deserializer. Still I think having the serialization/deserialization in one place in aiida-core would be a good thing.

All in all, I think changing this might make sense, and if we do it, we should do it soon. Currently there are no storage backend plugins out there that I know off (other than my own), so we can now still easily change this, but that might change.

@giovannipizzi @chrisjsewell do you have time to discuss this tomorrow in person?

@janssenhenning
Copy link
Contributor Author

@sphuber Thanks for the detailed answer. Would be happy to discuss today how we could get this into aiida-core 👍

@sphuber
Copy link
Contributor

sphuber commented Jun 7, 2024

See #5561 (comment) for rationale for closing

@sphuber sphuber closed this Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants