-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate with numpydantic #7
base: main
Are you sure you want to change the base?
Conversation
Allow specifying dtype and array shapes The dtype is broken down to Python basic types. TODO test and probably implement support for all the nifty tools of pydantic et al, specifying value ranges etc
Hm, it seems that older Python doesn't like this approach to type variables... Not sure how to fix this. |
Apparently only 3.12 supports the current syntax |
Inherit from `Generic[...]`
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7 +/- ##
===========================================
- Coverage 100.00% 95.34% -4.66%
===========================================
Files 1 1
Lines 45 129 +84
Branches 4 15 +11
===========================================
+ Hits 45 123 +78
- Misses 0 3 +3
- Partials 0 3 +3 ☔ View full report in Codecov by Sentry. |
Strings cannot be passed easily as type variables, they become `ForwardRef`
Remove conversion to float
* Reference quantity defines base unit for schema, allows for more flexibility * Interoperability with both pydantic-style and numpydantic-style numeric types for individual field values Types like numpydantic.dtype.Float still TBD, don't work with `Generic`
This starts looking pretty neat. The machinery can easily be spun out to a separate package. Maybe |
... or |
...and add test for array that would have caught this
...or part of pint? hgrecco/pint#1166 (comment) |
@uellue Hey thanks for the ping over in hgrecco/pint#1166, I'm glad to see it's definitely a slog to take something from "working" to "pleasant" in terms of proper type checking in the IDE, etc. See p2p-ld/numpydantic#10 where @sneakers-the-rat identifies the following (@sneakers-the-rat please let me know if you'd prefer I not ping you out of the blue like this):
So @uellue if you're hitting against some crufty type expressivity issues, you're not alone 😅 |
please let me know if i can help, or if you'd like additional features in numpydantic, i have a good deal of fun working on it. adding additional interfaces/dtypes/etc. is relatively straightforward to do, but the docs are probably incomplete on that. so again just let me know how i can help! |
Instead, use inheritance to specify the different variants. Make type checkers more happy, I hope! Drive-by: flake8 -- the way how vscode does it seems to have changed
@sneakers-the-rat Thank you for your work on numpydantic and for offering to help! Your remark to make type checkers more happy was a good inspiration, using inheritance is a bit more boilerplate but seems to work better now. As specific items that can likely be improved I have the following:
Our next steps here would be to test the approach internally in LiberTEM first to smoothen it out. After that it is probably a good time to think about spinning it out. We "just" have to address a more fundamental open item with the metadata schema itself before we can use it more widely. |
I'm working on better serialization now, roundtripping to/from json is the next thing on the agenda (see help me understand the goal here, the dtypes module has some historical crust from nptyping (those Capitalized types), but i've been trying to make a path away from that, so those compound/generic types like
again see docs above and try
for that, following numpy's lead, just use
yes, this is the major goal of v2. the string specification is a legacy of the labels are not necessary but yes the goal is to make the syntax for that something like
though as you can see there are some tradeoffs re: specificity and brevity and so feedback on the forms you would prefer would be useful :) |
Thank you for the pointers!
When I try to build this model (edit: newer version plain NDArray): from typing_extensions import Annotated
from pydantic import FiniteFloat, BaseModel
from numpydantic import Shape, NDArray
class T(BaseModel):
t: NDArray[Shape['2, 2'], FiniteFloat] I get this error Cell In[4], line 1
----> 1 class T(BaseModel):
2 t: NDArray[Shape['2, 2'], FiniteFloat]
Cell In[4], line 2, in T()
1 class T(BaseModel):
----> 2 t: NDArray[Shape['2, 2'], FiniteFloat]
File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:150, in SubscriptableMeta.__getitem__(cls, item)
147 if getattr(cls, "_parameterized", False):
148 raise NPTypingError(f"Type nptyping.{cls} is already parameterized.")
--> 150 args = cls._get_item(item)
151 additional_values = cls._get_additional_values(item)
152 assert hasattr(cls, "__args__"), "A SubscriptableMeta must have __args__."
File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:76, in NDArrayMeta._get_item(cls, item)
74 def _get_item(cls, item: Any) -> Tuple[Any, ...]:
75 cls._check_item(item)
---> 76 shape, dtype = cls._get_from_tuple(item)
77 return shape, dtype
File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:118, in NDArrayMeta._get_from_tuple(cls, item)
115 def _get_from_tuple(cls, item: Tuple[Any, ...]) -> Tuple[Shape, DType]:
116 # Return the Shape Expression and DType from a tuple.
117 shape = cls._get_shape(item[0])
--> 118 dtype = cls._get_dtype(item[1])
119 return shape, dtype
File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/ndarray.py:128, in NDArrayMeta._get_dtype(cls, dtype_candidate)
126 elif cls._is_literal_like(dtype_candidate): # pragma: no cover
127 structure_expression = dtype_candidate.__args__[0]
--> 128 dtype = Structure[structure_expression]
129 check_type_names(dtype, dtype_per_name)
130 elif isinstance(dtype_candidate, tuple): # pragma: no cover
File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:150, in SubscriptableMeta.__getitem__(cls, item)
147 if getattr(cls, "_parameterized", False):
148 raise NPTypingError(f"Type nptyping.{cls} is already parameterized.")
--> 150 args = cls._get_item(item)
151 additional_values = cls._get_additional_values(item)
152 assert hasattr(cls, "__args__"), "A SubscriptableMeta must have __args__."
File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:216, in ContainerMeta._get_item(cls, item)
214 def _get_item(cls, item: Any) -> Tuple[Any, ...]:
215 if not isinstance(item, str):
--> 216 raise InvalidArgumentsError(
217 f"Unexpected argument of type {type(item)}, expecting a string."
218 )
220 if item in cls._known_expressions:
221 # No need to do costly validations and normalizations if it has been done
222 # before.
223 return (item,)
InvalidArgumentsError: Unexpected argument of type <class 'type'>, expecting a string. Wouldn't it be great if one could use the pydantic constrained types also in numpydantic like this? Essentially, one could map the check for a single item on the whole array. :-)
I can't even pass it as an argument for the type:
Hm, Python
Yes, that would be great! Looking much cleaner. |
Absolutely. I'll open an issue and see if there's a performant way to do this for the other non-numpy interfaces (eg. For hdf5 arrays loading the whole array to check this could be costly), but imo being able to do it at all is more important than it being fast. Ill open an issue shortly
Yes, we need better abstraction around types than we have atm. Most of that was inherited from nptyping, censoring it in to start removing these problems is somewhat recent. I went with tuples as a first pass because the typing api is a bit of a pain, but the intention was to support them too. I'll raise another issue here.
There's a little bit of an intuition impedance mismatch between python and numpy, especially when it comes to specifying "any float type" for multiple array backends, I went with a permissive definition where "float" validates against any float type and one can specify float64 to make it more specific, but if that's confusing I agree the Float compound type should work. I have a bit of time to work on this in an hour or two. Will raise these as issues and try to resolve |
Union types in
working on docs (sorry it's so scratchy atm) |
@sneakers-the-rat Awesome, thanks a lot! When trying to use the new release I ran into an issue where Code to get the JSON schema: JSON schema that is generated
Code to make it work, drop the Could it be that the |
i am giving a good chuckle because you have certainly gotten outside the dotted lines of the public API, but i am very curious what you're up to here! is there a target example you could point me to for the kind of array that should be validated here? I just ran into this problem and that's one of the major reasons for 1.6, to separate the core schema and the json schema generation, and needing to call in to dunder methods makes me think there's something big missing in the docs or implementation here |
Well, the general purpose is to generate a schema for 'pint.Quantity'. In particular, it should also work in JSON form: Encoding it as JSON and also validating it as JSON as far as possible, for interoperability and serialization. For the JSON representation my idea is a tuple of "whatever the magnitude is" and a literal string for the units. So in order to get a schema for Pint quantities where the magnitude is an array, I try to get a schema for the array part from numpydantic and insert it into my tuple schema. What would be the "proper" way to do that? For the basic types in pydantic I can use na = numpydantic.NDArray[numpydantic.Shape['2 x, 2 y'], numpydantic.dtype.Float64]
pydantic.TypeAdapter(na)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:270](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=269), in TypeAdapter._init_core_attrs(self, rebuild_mocks)
269 try:
--> 270 self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
271 self._validator = _getattr_no_parents(self._type, '__pydantic_validator__')
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:112](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=111), in _getattr_no_parents(obj, attribute)
111 else:
--> 112 raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__
During handling of the above exception, another exception occurred:
UnboundLocalError Traceback (most recent call last)
Cell In[7], line 1
----> 1 pydantic.TypeAdapter(na)
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:257](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=256), in TypeAdapter.__init__(self, type, config, _parent_depth, module)
252 if not self._defer_build():
253 # Immediately initialize the core schema, validator and serializer
254 with self._with_frame_depth(1): # +1 frame depth for this __init__
255 # Model itself may be using deferred building. For backward compatibility we don't rebuild model mocks
256 # here as part of __init__ even though TypeAdapter itself is not using deferred building.
--> 257 self._init_core_attrs(rebuild_mocks=False)
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:135](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=134), in _frame_depth.<locals>.wrapper.<locals>.wrapped(self, *args, **kwargs)
132 @wraps(func)
133 def wrapped(self: TypeAdapterT, *args: P.args, **kwargs: P.kwargs) -> R:
134 with self._with_frame_depth(depth + 1): # depth + 1 for the wrapper function
--> 135 return func(self, *args, **kwargs)
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:282](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=281), in TypeAdapter._init_core_attrs(self, rebuild_mocks)
275 core_config = config_wrapper.core_config(None)
277 self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
278 self._validator = create_schema_validator(
279 schema=self._core_schema,
280 schema_type=self._type,
281 schema_type_module=self._module_name,
--> 282 schema_type_name=str(self._type),
283 schema_kind='TypeAdapter',
284 config=core_config,
285 plugin_settings=config_wrapper.plugin_settings,
286 )
287 self._serializer = SchemaSerializer(self._core_schema, core_config)
289 if rebuild_mocks and isinstance(self._core_schema, _mock_val_ser.MockCoreSchema):
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:101](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py#line=100), in NDArrayMeta.__str__(cls)
97 def __str__(cls) -> str:
98 shape, dtype = cls.__args__
99 return (
100 f"{cls.__name__}[{cls._shape_expression_to_str(shape)}, "
--> 101 f"{cls._dtype_to_str(dtype)}]"
102 )
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/ndarray.py:155](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/ndarray.py#line=154), in NDArrayMeta._dtype_to_str(cls, dtype)
153 elif isinstance(dtype, tuple):
154 result = ", ".join([str(dt) for dt in dtype])
--> 155 return result
UnboundLocalError: cannot access local variable 'result' where it is not associated with a value |
...and on the Python side I try to get hold of a validation function for that part and call it in my own validation function. |
Allow specifying dtype and array shapes
The dtype is broken down to Python basic types.
TODO test and probably implement support for all the nifty tools of pydantic et al, specifying value ranges etc