Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roundtrip JSON serialization/deserialization #17

Closed
arpit15 opened this issue Sep 13, 2024 · 9 comments · Fixed by #20
Closed

Roundtrip JSON serialization/deserialization #17

arpit15 opened this issue Sep 13, 2024 · 9 comments · Fixed by #20

Comments

@arpit15
Copy link

arpit15 commented Sep 13, 2024

Thanks for creating this essential missing piece from pydantic. I want to serialize a bunch of classes containing ndarray and deserialize them such they after initialization the class has elements with type ndarray. However, according to examples, that is not the supported behavior.
I am wondering if there is a way to get this behavior using numpydantic.

from numpydantic import NDArray
from pydantic import BaseModel

class MyModel(BaseModel):
    array: NDArray

myobj = MyModel(array=[1.0, 2.0])
json_s = myobj.model_dump_json()
loaded_obj = MyModel.model_validate_json(json_s)
assert isinstance(loaded_obj.array, np.ndarray)
@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Sep 13, 2024

Glad you are having fun :)

Yes. I am surprised that the validation logic is different when parsing json and consider the fact that this doesn't work a bug. Finishing something up for the night and will return tomorrow.

edit: it looks like there will have to be a change upstream in pydantic, will raise an issue with them tomorrow which will have more details in it

@sneakers-the-rat sneakers-the-rat added the bug Something isn't working label Sep 13, 2024
@arpit15
Copy link
Author

arpit15 commented Sep 13, 2024

Thanks for quick response. Looking forward to reading the issue on pydantic. Hopefully there are other workarounds or quick solution which can be merged in pydantic.

@arpit15
Copy link
Author

arpit15 commented Sep 13, 2024

An easy hack which works for me is add an Annotated type over NDArray

from typing import Annotated
from numpydantic import NDArray as _NDArray
from pydantic import BaseModel, AfterValidator
import numpy as np

NDArray = Annotated[_NDArray, AfterValidator(lambda x: np.array(x))]
class MyModel(BaseModel):
    array: NDArray

myobj = MyModel(array=[1.0, 2.0])
json_s = myobj.model_dump_json()
loaded_obj = MyModel.model_validate_json(json_s)
assert isinstance(loaded_obj.array, np.ndarray)

@sneakers-the-rat
Copy link
Collaborator

Ha, yes :) that should work, though we lose the ability to use the model with other array backends (shape and dtype validation should still work).

The basic problem is that when parsing json, pydantic-core just uses the json schema and not the python validators. The json schema is correct for an n-dimensional array in json (a list of lists, parametrised according to the shape and dtype constraints), so it validates, but we need a way to hook on the coercion parts of the array interfaces at the end of the json parsing. Im going to look further if there's a way to chain a validator for just the json validation, and if not we might have to do some more monkeypatching

@arpit15
Copy link
Author

arpit15 commented Sep 13, 2024

Makes sense! I will be happy to test out your changes. Let me know if I can help in any way.

@sneakers-the-rat sneakers-the-rat changed the title Regarding deserialization ndarray Roundtrip JSON serialization/deserialization Sep 17, 2024
@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Sep 18, 2024

Just letting you know i've figured this out and will issue a patch tonight or tomorrow <3. simpler than i thought, just need to change the way we're generating the json schema on the NDArray class (which we will soon rewrite anyway to make a proper generic, but that's another issue).

edit: for more info - i had misunderstood how json_or_python_schema worked. Since __get_pydantic_core_schema__ receives the _source_type but __get_pydantic_json_schema__ doesn't, we generated the json schema then because that's when we have the shape and dtype values. But json_or_python_schema is what is making pydantic use the json schema when revalidating the json. If instead we just generate the json schema in __get_pydantic_json_schema__ it uses the python validator which correctly roundtrips.

@arpit15
Copy link
Author

arpit15 commented Sep 18, 2024

@sneakers-the-rat thanks for the explanation. After your explanation, I kinda understand __get_pydantic_json_schema__ and __get_pydantic_json_schema__. It would be great to have to have it handled by your lib. I am wondering if there is a planned release cycle for numpydantic.

@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Sep 18, 2024

No planned release cycle, I just fix bugs as they come up and make enhancements as requested at this point, but I am using semver and will do appropriate deprecation warnings in the case of breaking changes. Next major planned version 2.0.0 will be to replace the basic NDArray type with a proper Generic with TypeVarTuple while keeping current behavior, and moving away from nptyping with full removal in 3.0.0, so plenty of warning.

Ill make this patch shortly

Making a note with a checklist item to

@sneakers-the-rat
Copy link
Collaborator

for ur consideration: #20
docs: https://numpydantic.readthedocs.io/en/dump_json/serialization.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants