Integrate with numpydantic #7

uellue · 2024-09-13T17:06:42Z

Allow specifying dtype and array shapes

The dtype is broken down to Python basic types.

TODO test and probably implement support for all the nifty tools of pydantic et al, specifying value ranges etc

Allow specifying dtype and array shapes The dtype is broken down to Python basic types. TODO test and probably implement support for all the nifty tools of pydantic et al, specifying value ranges etc

uellue · 2024-09-13T17:12:45Z

Hm, it seems that older Python doesn't like this approach to type variables... Not sure how to fix this.

uellue · 2024-09-13T17:20:03Z

Apparently only 3.12 supports the current syntax

Inherit from `Generic[...]`

codecov-commenter · 2024-09-13T17:29:48Z

Codecov Report

Attention: Patch coverage is 94.87179% with 6 lines in your changes missing coverage. Please review.

Project coverage is 95.34%. Comparing base (c096d53) to head (5c15230).

Files with missing lines	Patch %	Lines
src/libertem_schema/__init__.py	94.87%	3 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              main       #7      +/-   ##
===========================================
- Coverage   100.00%   95.34%   -4.66%     
===========================================
  Files            1        1              
  Lines           45      129      +84     
  Branches         4       15      +11     
===========================================
+ Hits            45      123      +78     
- Misses           0        3       +3     
- Partials         0        3       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Strings cannot be passed easily as type variables, they become `ForwardRef`

Remove conversion to float

pyproject.toml

* Reference quantity defines base unit for schema, allows for more flexibility * Interoperability with both pydantic-style and numpydantic-style numeric types for individual field values Types like numpydantic.dtype.Float still TBD, don't work with `Generic`

uellue · 2024-09-18T09:31:21Z

This starts looking pretty neat. The machinery can easily be spun out to a separate package. Maybe pintdantic?

uellue · 2024-09-18T09:34:07Z

This starts looking pretty neat. The machinery can easily be spun out to a separate package. Maybe pintdantic?

... or pint-schema?

...and add test for array that would have caught this

uellue · 2024-09-18T10:50:46Z

...or part of pint? hgrecco/pint#1166 (comment)

blakeNaccarato · 2024-09-20T04:03:29Z

@uellue Hey thanks for the ping over in hgrecco/pint#1166, I'm glad to see numpydantic was a good lead! I'm wrangling with typing expressivity limitations myself over in one of my projects, as well.

it's definitely a slog to take something from "working" to "pleasant" in terms of proper type checking in the IDE, etc.

See p2p-ld/numpydantic#10 where @sneakers-the-rat identifies the following (@sneakers-the-rat please let me know if you'd prefer I not ping you out of the blue like this):

Fixing the linter errors seems like a first, low hanging fruit priority, and we may need to wait to python 3.14 for the functional type annotations to hit, but we should figure out a solid strategy for type checkers.

So @uellue if you're hitting against some crufty type expressivity issues, you're not alone 😅

sneakers-the-rat · 2024-09-20T04:13:57Z

please let me know if i can help, or if you'd like additional features in numpydantic, i have a good deal of fun working on it. adding additional interfaces/dtypes/etc. is relatively straightforward to do, but the docs are probably incomplete on that. so again just let me know how i can help!

Instead, use inheritance to specify the different variants. Make type checkers more happy, I hope! Drive-by: flake8 -- the way how vscode does it seems to have changed

uellue · 2024-09-20T08:40:33Z

@sneakers-the-rat Thank you for your work on numpydantic and for offering to help!

Your remark to make type checkers more happy was a good inspiration, using inheritance is a bit more boilerplate but seems to work better now.

As specific items that can likely be improved I have the following:

How to handle complex numbers on the JSON side? Is that already working in numpydantic, am I missing something? For wave optics, which we are doing, complex numbers are important.
Interoperability between the numpydantic/NumPy dtypes and the annotated native types used by pydantic. Right now the pydantic types work for single values, but not for arrays. It would be cool if one could use the pydantic ones for both!
The generic numpydantic types like numpydantic.dtype.Float don't work at all at the moment. Should they even? It would be nice to specify "some kind of floating point".
I get syntax error in forward annotation '2 x, 2 y' from flake8 for the shape specification. Can that be solved somehow? Generally, would it be possible to specify the shape not with a string but, for example, a tuple?

Our next steps here would be to test the approach internally in LiberTEM first to smoothen it out. After that it is probably a good time to think about spinning it out. We "just" have to address a more fundamental open item with the metadata schema itself before we can use it more widely.

sneakers-the-rat · 2024-09-20T09:14:12Z

How to handle complex numbers on the JSON side? Is that already working in numpydantic, am I missing something? For wave optics, which we are doing, complex numbers are important.

I'm working on better serialization now, roundtripping to/from json is the next thing on the agenda (see #17) and i'll make sure it works with complex numbers.

Interoperability between the numpydantic/NumPy dtypes and the annotated native types used by pydantic.

help me understand the goal here, the dtypes module has some historical crust from nptyping (those Capitalized types), but i've been trying to make a path away from that, so those compound/generic types like Float are just tuples of types that are checked as something like type in tuple(Float). There's a small bit of documentation here but i need to expand it: https://numpydantic.readthedocs.io/en/latest/api/dtype.html

The generic numpydantic types like numpydantic.dtype.Float don't work at all at the moment.

again see docs above and try dtype in Float. I should be clearer about those.

It would be nice to specify "some kind of floating point".

for that, following numpy's lead, just use float <3

In [1]: from numpydantic import NDArray, Shape

In [2]: from typing import Any

In [3]: import numpy as np

In [4]: NDArray[Any, float]([1.0,2.0])
Out[4]: array([1., 2.])

In [5]: NDArray[Any, float](np.array([1.0,2.0], dtype=np.float16))
Out[5]: array([1., 2.], dtype=float16)

In [6]: NDArray[Any, float](np.array([1.0,2.0], dtype=np.float32))
Out[6]: array([1., 2.], dtype=float32)

In [7]: NDArray[Any, float](np.array([1.0,2.0], dtype=np.float64))
Out[7]: array([1., 2.])

I get syntax error in forward annotation '2 x, 2 y' from flake8 for the shape specification. Can that be solved somehow?

yes, this is the major goal of v2. the string specification is a legacy of nptyping, which i used as a way to bridge from the major thing that existed in the space at the time, but has a bunch of really undesirable problems like this. TypeVarTuple is in python 3.12 and that along with the Generics system should solve a lot of these problems, but it will take a bit of work to excavate these out.

the labels are not necessary but yes the goal is to make the syntax for that something like

NDArray[(2, 2)]
NDArray[Shape[2,2]]
NDArray[(2,2), Any]

though as you can see there are some tradeoffs re: specificity and brevity and so feedback on the forms you would prefer would be useful :)

uellue · 2024-09-20T12:05:58Z

Thank you for the pointers!

Interoperability between the numpydantic/NumPy dtypes and the annotated native types used by pydantic.

help me understand the goal here,

When I try to build this model (edit: newer version plain NDArray):

from typing_extensions import Annotated
from pydantic import FiniteFloat, BaseModel
from numpydantic import Shape, NDArray

class T(BaseModel):
    t: NDArray[Shape['2, 2'], FiniteFloat]

I get this error

Cell In[4], line 1
----> 1 class T(BaseModel):
      2     t: NDArray[Shape['2, 2'], FiniteFloat]

Cell In[4], line 2, in T()
      1 class T(BaseModel):
----> 2     t: NDArray[Shape['2, 2'], FiniteFloat]

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:150, in SubscriptableMeta.__getitem__(cls, item)
    147 if getattr(cls, "_parameterized", False):
    148     raise NPTypingError(f"Type nptyping.{cls} is already parameterized.")
--> 150 args = cls._get_item(item)
    151 additional_values = cls._get_additional_values(item)
    152 assert hasattr(cls, "__args__"), "A SubscriptableMeta must have __args__."

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:76, in NDArrayMeta._get_item(cls, item)
     74 def _get_item(cls, item: Any) -> Tuple[Any, ...]:
     75     cls._check_item(item)
---> 76     shape, dtype = cls._get_from_tuple(item)
     77     return shape, dtype

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:118, in NDArrayMeta._get_from_tuple(cls, item)
    115 def _get_from_tuple(cls, item: Tuple[Any, ...]) -> Tuple[Shape, DType]:
    116     # Return the Shape Expression and DType from a tuple.
    117     shape = cls._get_shape(item[0])
--> 118     dtype = cls._get_dtype(item[1])
    119     return shape, dtype

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/ndarray.py:128, in NDArrayMeta._get_dtype(cls, dtype_candidate)
    126 elif cls._is_literal_like(dtype_candidate):  # pragma: no cover
    127     structure_expression = dtype_candidate.__args__[0]
--> 128     dtype = Structure[structure_expression]
    129     check_type_names(dtype, dtype_per_name)
    130 elif isinstance(dtype_candidate, tuple):  # pragma: no cover

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:150, in SubscriptableMeta.__getitem__(cls, item)
    147 if getattr(cls, "_parameterized", False):
    148     raise NPTypingError(f"Type nptyping.{cls} is already parameterized.")
--> 150 args = cls._get_item(item)
    151 additional_values = cls._get_additional_values(item)
    152 assert hasattr(cls, "__args__"), "A SubscriptableMeta must have __args__."

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:216, in ContainerMeta._get_item(cls, item)
    214 def _get_item(cls, item: Any) -> Tuple[Any, ...]:
    215     if not isinstance(item, str):
--> 216         raise InvalidArgumentsError(
    217             f"Unexpected argument of type {type(item)}, expecting a string."
    218         )
    220     if item in cls._known_expressions:
    221         # No need to do costly validations and normalizations if it has been done
    222         # before.
    223         return (item,)

InvalidArgumentsError: Unexpected argument of type <class 'type'>, expecting a string.

Wouldn't it be great if one could use the pydantic constrained types also in numpydantic like this? Essentially, one could map the check for a single item on the whole array. :-)

again see docs above and try dtype in Float. I should be clearer about those.

I can't even pass it as an argument for the type: TypeError: Too many arguments for <class 'libertem_schema.Length'>; actual 5, expected 1. Is it meant to be used as an argument for a generic type, and if yes, how do I do that? Maybe a Union could be better than a tuple for that?

for that, following numpy's lead, just use float <3

Hm, Python float should usually mean specifically np.float64 since it is the double type. That's why I though numpydantic.dtype.Float could be a neat way to specify "any floating point type".

(...) the goal is to make the syntax for that something like (...)

Yes, that would be great! Looking much cleaner.

sneakers-the-rat · 2024-09-20T22:43:58Z

Wouldn't it be great if one could use the pydantic constrained types also in numpydantic like this?

Absolutely. I'll open an issue and see if there's a performant way to do this for the other non-numpy interfaces (eg. For hdf5 arrays loading the whole array to check this could be costly), but imo being able to do it at all is more important than it being fast. Ill open an issue shortly

Maybe a Union could be better than a tuple for that?

Yes, we need better abstraction around types than we have atm. Most of that was inherited from nptyping, censoring it in to start removing these problems is somewhat recent. I went with tuples as a first pass because the typing api is a bit of a pain, but the intention was to support them too. I'll raise another issue here.

Hm, Python float should usually mean specifically np.float64 since it is the double type. That's why I though numpydantic.dtype.Float could be a neat way to specify "any floating point type".

There's a little bit of an intuition impedance mismatch between python and numpy, especially when it comes to specifying "any float type" for multiple array backends, I went with a permissive definition where "float" validates against any float type and one can specify float64 to make it more specific, but if that's confusing I agree the Float compound type should work.

I have a bit of time to work on this in an hour or two. Will raise these as issues and try to resolve

sneakers-the-rat · 2024-09-24T07:29:50Z

Union types in 1.6.1:

working on docs (sorry it's so scratchy atm)

https://numpydantic.readthedocs.io/en/latest/dtype.html

uellue · 2024-09-24T09:40:32Z

@sneakers-the-rat Awesome, thanks a lot!

When trying to use the new release I ran into an issue where pydantic complains about a forbidden extra field.

Code to get the JSON schema:

https://github.com/uellue/LiberTEM-schema/blob/426cfaee9093c1687af1688feaf73211cef45b01/src/libertem_schema/__init__.py#L157-L165

JSON schema that is generated

{'dtype': 'numpy.float64',
 'items_schema': {'items_schema': {'ge': -1.7976931348623157e+308,
                                   'le': 1.7976931348623157e+308,
                                   'type': 'float'},
                  'max_length': 2,
                  'metadata': {'name': 'y'},
                  'min_length': 2,
                  'type': 'list'},
 'max_length': 2,
 'metadata': {'name': 'x'},
 'min_length': 2,
 'type': 'list'}

Code to make it work, drop the 'dtype' key if present:

https://github.com/uellue/LiberTEM-schema/blob/426cfaee9093c1687af1688feaf73211cef45b01/src/libertem_schema/__init__.py#L194C1-L196C47

Could it be that the 'dtype' key should rather be in the 'metadata' sub-dictionary?

sneakers-the-rat · 2024-09-24T09:55:11Z

i am giving a good chuckle because you have certainly gotten outside the dotted lines of the public API, but i am very curious what you're up to here!

is there a target example you could point me to for the kind of array that should be validated here? I just ran into this problem and that's one of the major reasons for 1.6, to separate the core schema and the json schema generation, and needing to call in to dunder methods makes me think there's something big missing in the docs or implementation here

uellue · 2024-09-24T12:03:11Z

i am giving a good chuckle because you have certainly gotten outside the dotted lines of the public API, but i am very curious what you're up to here!

is there a target example you could point me to for the kind of array that should be validated here? I just ran into this problem and that's one of the major reasons for 1.6, to separate the core schema and the json schema generation, and needing to call in to dunder methods makes me think there's something big missing in the docs or implementation here

Well, the general purpose is to generate a schema for 'pint.Quantity'. In particular, it should also work in JSON form: Encoding it as JSON and also validating it as JSON as far as possible, for interoperability and serialization. For the JSON representation my idea is a tuple of "whatever the magnitude is" and a literal string for the units. So in order to get a schema for Pint quantities where the magnitude is an array, I try to get a schema for the array part from numpydantic and insert it into my tuple schema.

What would be the "proper" way to do that? For the basic types in pydantic I can use TypeAdapter, but that doesn't seem to work with numpydantic:

na = numpydantic.NDArray[numpydantic.Shape['2 x, 2 y'], numpydantic.dtype.Float64]
pydantic.TypeAdapter(na)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:270](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=269), in TypeAdapter._init_core_attrs(self, rebuild_mocks)
    269 try:
--> 270     self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
    271     self._validator = _getattr_no_parents(self._type, '__pydantic_validator__')

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:112](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=111), in _getattr_no_parents(obj, attribute)
    111 else:
--> 112     raise AttributeError(attribute)

AttributeError: __pydantic_core_schema__

During handling of the above exception, another exception occurred:

UnboundLocalError                         Traceback (most recent call last)
Cell In[7], line 1
----> 1 pydantic.TypeAdapter(na)

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:257](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=256), in TypeAdapter.__init__(self, type, config, _parent_depth, module)
    252 if not self._defer_build():
    253     # Immediately initialize the core schema, validator and serializer
    254     with self._with_frame_depth(1):  # +1 frame depth for this __init__
    255         # Model itself may be using deferred building. For backward compatibility we don't rebuild model mocks
    256         # here as part of __init__ even though TypeAdapter itself is not using deferred building.
--> 257         self._init_core_attrs(rebuild_mocks=False)

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:135](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=134), in _frame_depth.<locals>.wrapper.<locals>.wrapped(self, *args, **kwargs)
    132 @wraps(func)
    133 def wrapped(self: TypeAdapterT, *args: P.args, **kwargs: P.kwargs) -> R:
    134     with self._with_frame_depth(depth + 1):  # depth + 1 for the wrapper function
--> 135         return func(self, *args, **kwargs)

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:282](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=281), in TypeAdapter._init_core_attrs(self, rebuild_mocks)
    275     core_config = config_wrapper.core_config(None)
    277     self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
    278     self._validator = create_schema_validator(
    279         schema=self._core_schema,
    280         schema_type=self._type,
    281         schema_type_module=self._module_name,
--> 282         schema_type_name=str(self._type),
    283         schema_kind='TypeAdapter',
    284         config=core_config,
    285         plugin_settings=config_wrapper.plugin_settings,
    286     )
    287     self._serializer = SchemaSerializer(self._core_schema, core_config)
    289 if rebuild_mocks and isinstance(self._core_schema, _mock_val_ser.MockCoreSchema):

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:101](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py#line=100), in NDArrayMeta.__str__(cls)
     97 def __str__(cls) -> str:
     98     shape, dtype = cls.__args__
     99     return (
    100         f"{cls.__name__}[{cls._shape_expression_to_str(shape)}, "
--> 101         f"{cls._dtype_to_str(dtype)}]"
    102     )

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/ndarray.py:155](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/ndarray.py#line=154), in NDArrayMeta._dtype_to_str(cls, dtype)
    153 elif isinstance(dtype, tuple):
    154     result = ", ".join([str(dt) for dt in dtype])
--> 155 return result

UnboundLocalError: cannot access local variable 'result' where it is not associated with a value

uellue · 2024-09-24T12:07:42Z

...and on the Python side I try to get hold of a validation function for that part and call it in my own validation function.

Integrate with numpydantic

bc124a7

Allow specifying dtype and array shapes The dtype is broken down to Python basic types. TODO test and probably implement support for all the nifty tools of pydantic et al, specifying value ranges etc

Support older Python

a0596df

Inherit from `Generic[...]`

uellue added 4 commits September 13, 2024 19:38

Remove inaccessible code

428fd83

Strings cannot be passed easily as type variables, they become `ForwardRef`

Test complex and upcasting from int

d9e2902

Exercise more code in tests

68974ab

Xfailing tests for complex

685a832

Remove conversion to float

sk1p reviewed Sep 16, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

uellue added 2 commits September 18, 2024 08:56

Make dependency more clear

9cc0555

Further improvements

bcf334a

* Reference quantity defines base unit for schema, allows for more flexibility * Interoperability with both pydantic-style and numpydantic-style numeric types for individual field values Types like numpydantic.dtype.Float still TBD, don't work with `Generic`

uellue force-pushed the numpydantic branch from d0787e0 to bcf334a Compare September 18, 2024 09:17

Use pydantic schema if possible

120e9c4

uellue mentioned this pull request Sep 18, 2024

Better annotations support hgrecco/pint#1166

Open

Remove base units conversion

9884563

...and add test for array that would have caught this

Don't generate types dynamically

3d3a360

Instead, use inheritance to specify the different variants. Make type checkers more happy, I hope! Drive-by: flake8 -- the way how vscode does it seems to have changed

Declare class variable in superclass

5c15230

sneakers-the-rat mentioned this pull request Sep 20, 2024

Roundtrip JSON serialization/deserialization p2p-ld/numpydantic#17

Closed

This was referenced Sep 21, 2024

Support pydantic annotated/constrained value types p2p-ld/numpydantic#18

Open

Better compound/generic dtypes with Union p2p-ld/numpydantic#19

Closed

sneakers-the-rat mentioned this pull request Sep 21, 2024

Ensure complex number support p2p-ld/numpydantic#21

Open

WIP use numpydantic 1.6 capabilities

426cfae

blakeNaccarato mentioned this pull request Oct 9, 2024

model_json_schema() fails to include annotated third-party types: "Default value is not JSON serializable [non-serializable-default]" pydantic/pydantic#9525

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate with numpydantic #7

Integrate with numpydantic #7

uellue commented Sep 13, 2024

uellue commented Sep 13, 2024

uellue commented Sep 13, 2024

codecov-commenter commented Sep 13, 2024 •

edited

Loading

uellue commented Sep 18, 2024

uellue commented Sep 18, 2024

uellue commented Sep 18, 2024

blakeNaccarato commented Sep 20, 2024

sneakers-the-rat commented Sep 20, 2024

uellue commented Sep 20, 2024

sneakers-the-rat commented Sep 20, 2024 •

edited

Loading

uellue commented Sep 20, 2024 •

edited

Loading

sneakers-the-rat commented Sep 20, 2024

sneakers-the-rat commented Sep 24, 2024

uellue commented Sep 24, 2024

sneakers-the-rat commented Sep 24, 2024

uellue commented Sep 24, 2024

uellue commented Sep 24, 2024

Integrate with numpydantic #7

Are you sure you want to change the base?

Integrate with numpydantic #7

Conversation

uellue commented Sep 13, 2024

uellue commented Sep 13, 2024

uellue commented Sep 13, 2024

codecov-commenter commented Sep 13, 2024 • edited Loading

Codecov Report

uellue commented Sep 18, 2024

uellue commented Sep 18, 2024

uellue commented Sep 18, 2024

blakeNaccarato commented Sep 20, 2024

sneakers-the-rat commented Sep 20, 2024

uellue commented Sep 20, 2024

sneakers-the-rat commented Sep 20, 2024 • edited Loading

uellue commented Sep 20, 2024 • edited Loading

sneakers-the-rat commented Sep 20, 2024

sneakers-the-rat commented Sep 24, 2024

uellue commented Sep 24, 2024

sneakers-the-rat commented Sep 24, 2024

uellue commented Sep 24, 2024

uellue commented Sep 24, 2024

codecov-commenter commented Sep 13, 2024 •

edited

Loading

sneakers-the-rat commented Sep 20, 2024 •

edited

Loading

uellue commented Sep 20, 2024 •

edited

Loading