-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic v2 Overhaul [DNM yet] #321
base: master
Are you sure you want to change the base?
Conversation
…testing so very much a WIP still.
…and only use json on numpy arrays for now.
…ires python 3.7.1+, but apparently pinning python to `^3.7` which is supposed to be equivalent to `>=3.7, <4.0` doesn't let the stack resolve, so pinning python to `^3.7.1` was the correct solution, even though it should have resolved...
…ther. I moved the commented block in case we needed to change it later.
…te, so split it up the hard way since there doesn't appear to be any other way to acess the name. This is the single most frustrating bug to track down since the error is thrown in pydantic but doesn't backtrace through the pydantic.__getattr__ -> `AutoPydanticDocGenerator` -> `__get__` -> `doc_formatter` -> `annotation.__name__` function because all the exceptions are caught the whole way up. GYA!
e32570b might be the most frustrating little bug i fixed ever because we and pydantic do such a good job of trapping bugs all the way up the stack, and the actual issue was Python less than 3.10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks very good to me. (Though basemodels.py is beyond my understanding.) I think I've picked up enough to migrate psi4's pydantic usage.
I'm not sure what a good plan for next steps is. Merge to a pydantic-next
branch on the MolSSI repo while the rest of the stack builds out? conda packages off qcarchive
?
@@ -13,7 +13,6 @@ jobs: | |||
fail-fast: true | |||
matrix: | |||
python-version: ["3.7", "3.9", "3.11"] | |||
pydantic-version: ["1", "2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to leave this line in for now to help the CI lanes "Required" labels cope.
@@ -65,18 +84,24 @@ def parse_raw(cls, data: Union[bytes, str], *, encoding: Optional[str] = None) - | |||
raise TypeError("Input is neither str nor bytes, please specify an encoding.") | |||
|
|||
if encoding.endswith(("json", "javascript", "pickle")): | |||
return super().parse_raw(data, content_type=encoding) | |||
# return super().parse_raw(data, content_type=encoding) | |||
return cls.model_validate_json(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should model_validate_json
work on javascript or pickle? (not that I know who uses javscript or pickle with this.
qcelemental/util/autodocs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched over to autodoc-pydantic Spring 2022 (or thereabouts), so I should check if this is still in use at all. Native Sphinx on pydantic is awful, so DGAS had something better, but autodoc-pydantic is far better still.
…ping and pre Python3.9 in tandem with typing_extensions Annotation Source information: beartype/beartype#42 Annotated checks against instances of _GenericAlias to see if you can support Data[type_info], e.g. NDArray[int] (in Python types since 3.9, and kinda of as _GenricAlias before that.) Prior to 3.9, Numpy implemented their own version of _GenericAlias which isn't Python _GenericAlias, so the types are not considered "Generic" when Annotated[NDArray, Metadata][type_info] does its thing. So. This fix code does the heavy lifting to re-cast the NDArray type with _GenericAlias from python typing. I've tried to reuse as much data from NDArray as I possibly can and still use np.ndarray (which is not np.typing.NDArray) to still correctly type hint np.ndarrays.
I have just spend 2 whole days chasing down the two most subtle interactions ever: One was in the autodoc code (pre Python 3.10 issue with annotations), and one was with Annotated + Numpy.typing.NDArray in pre Python 3.9 and how Annotated does instance checking against Python's native type I'll be able to cycle back around to your comments now @loriab. |
Thanks for all your work on this so far, @Lnaden. I've updated through psi4 with pydantic v2 API and hit a couple serialization bugs/changes that I hope you can either fix or advise on. (1) The
(2) optking is using |
…lization wrappers together. Who knew? Also fixed an issue where the default `model_dump` wasn't calling the correct settings.
@loriab Thanks for testing the downstream things for me to find some of these bugs. (1) That was indeed a bug, two separate ones that. I had updated the (2) Not sure on what exactly to do here because I don't understand your expected outputs. Can you tell me what the function was expecting type wise and what it gets out in this v2 version? Explicit outputs would be super helpful. I'm almost certain I can fix this on my end, I just need more details. I know that one difference in Pydantic between v1 and v2 that might impact this is in JSON encoding. In v1, the Thanks again for testing downstream things, this is a huge help in finding subtle interactions like the one from your (1) |
Thanks very much! (1b) Still on the Datum part, now it gets through all the serializing all the real ndarrays, but it balks on the complex (freq for transition states). If you change dict -> json in
(1c) This next one (still on Datum), I don't need, and I don't know if its a behavior change, I just thought I'd run it by you since you've got a feel for serialization. It doesn't look like one can recreate the model once serialized to json because arrays end up as strings. (in test_datum.py)
|
Should have resolved 1b and 1c now. (1b) Since serialization doesnt chain anymore, I have to handle all cases in one function instead of letting the type detection handle it. Complex NumPy arrays now are cast to list, and then the complexs are also cast to list. So this should be handled. (1c) The output of JSON strings is now correctly a list instead of a string-ified list. Technically speaking, your code wouldn't work on |
1b definitely resolved, thanks! (1d) What's your view on numpy floats? I can easily solve this in my code by casting to float, but as-is, one can create a Datum with a numpy float but it won't json-ize.
|
I think thats something I can fix on my end, it should be casting to float correctly, dunno why its not. Probably has to do with the "automatically inferred output type of the serialization function based on the type hint" and I'm not being generous enough. |
(1d) Fixed this by forcing the NumPy scalars to be cast to Python types. Wheeeeeeeeeeeeeee edge cases This had nothing to do with the type hint return idea unfortunately. |
Confirmed (1d) fixed! thanks, @Lnaden |
Nice strategy with the deprecation warning on |
Thanks. I tried to preserve the API on our side as much as possible while keeping the deprecation warning. I can remove the warning on our side and just let people use both. There really shouldn't be any harm in it, so I can cut that if its obnoxious. |
I think it's good to have the warning. It makes for a nice warning hunt :-) I agree on allowing both for the foreseeable future. It would have been vexing to hunt all the And they're not obnoxious -- pytest collects them very nicely.
|
…d as a function first before calling tolist
…ich calls different logic overall than to_jsonable_python for some reason I don't understand
Is this effort still active? Having QCElemental with Pydantic v2 support would be great, however a smooth transition from v1 to v2 API would be appreciated. Currently QCElemental is relying on the v1 API provided in Pydantic v2, instead of directly switching to the v2 API and removing the support for v1, would it be possible to have both available dependent on which Pydantic API is available? An example for this could be the approach taken in the beanie package. |
It's still active, and it works (except for one line with optking). And downstream v2-proper PRs are largely ready for QCEngine and Psi4. It's frustrating that pydantic hasn't provided a per-model upgrade scheme because as is, whole software stacks need to be migrated together. Last I heard, OpenFF was having trouble with their v2 migration. Any updates, @mattwthompson ? Thanks for the beanie model. I think that'd work pretty cleanly for QCEngine and maybe Psi4. QCElemental might need some completely separate files for v1/v2 since it access low-level pydantic stuff more. But I agree something needs to be done -- I'm translating new v2 models back to v1 just so I can actually use them in the stack. |
Our "proper" v2 migration is tied up in a huge mess of things and stalled - the crux of the thing being getting our custom https://github.com/openforcefield/openff-models/blob/main/openff/models/types.py#L20-L37 |
Thanks for the update, @mattwthompson . After I get through these next papers I'll try a dual v1/v2 version of qcel. But I bet one wants a triple version for complete compatibility -- v1, v2 as v1, v2. |
# Encoders, to be deprecated at some point | ||
ndarray_encoder = {np.ndarray: lambda v: v.flatten().tolist()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used in these models? I see similar code elsewhere (https://github.com/MolSSI/QCElemental/pull/321/files#diff-451348f65352b6379290ba87c966a55901f8fd840a77da1f09bcaf8fbdcc7485L46-L49) but not using this variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be leftover from when I was doing the initial refactor pass. They changed the way they do encoders for custom stuff to such a degree that I might have added that, then had to change the internals later on and never cycled back to this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically see the entire "types" file where I had to add a bunch of new validation logic: https://github.com/MolSSI/QCElemental/pull/321/files#diff-47aa66e9bfa663f2f9b398bd05273dad682d226378e90ba94d633f0bbf0a7885R1-R96
Description
This PR upgrades the entire QCElemental package to use the Pydantic v2 system, entirely removing any and all calls to the v1 API.
This PR also preserves QCElemental's API as best as it can, throwing deprecation warnings in places where we may want to upgrade; which can be a point of discussion.
All tests pass locally, however, this should NOT be merged yet until a formal discussion can be had. Given the massive change this PR introduces, and considering how many packages depend on this repo, I want everyone to be sure about a full stack upgrade before this is merged. This is a medium priority, but no rush PR (@loriab and @bennybp specifically)
Happy to talk and explain any and everything.
Changelog description
Updates to Pydantic v2, rendering v1 incompatible.
Status