Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with numpydantic #7

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

uellue
Copy link
Member

@uellue uellue commented Sep 13, 2024

Allow specifying dtype and array shapes

The dtype is broken down to Python basic types.

TODO test and probably implement support for all the nifty tools of pydantic et al, specifying value ranges etc

Allow specifying dtype and array shapes

The dtype is broken down to Python basic types.

TODO test and probably implement support for all the nifty tools of pydantic et al, specifying value ranges etc
@uellue
Copy link
Member Author

uellue commented Sep 13, 2024

Hm, it seems that older Python doesn't like this approach to type variables... Not sure how to fix this.

@uellue
Copy link
Member Author

uellue commented Sep 13, 2024

Apparently only 3.12 supports the current syntax

Inherit from `Generic[...]`
@codecov-commenter
Copy link

codecov-commenter commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 94.87179% with 6 lines in your changes missing coverage. Please review.

Project coverage is 95.34%. Comparing base (c096d53) to head (5c15230).

Files with missing lines Patch % Lines
src/libertem_schema/__init__.py 94.87% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main       #7      +/-   ##
===========================================
- Coverage   100.00%   95.34%   -4.66%     
===========================================
  Files            1        1              
  Lines           45      129      +84     
  Branches         4       15      +11     
===========================================
+ Hits            45      123      +78     
- Misses           0        3       +3     
- Partials         0        3       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Strings cannot be passed easily as type variables, they become `ForwardRef`
Remove conversion to float
pyproject.toml Outdated Show resolved Hide resolved
* Reference quantity defines base unit for schema, allows for more flexibility
* Interoperability with both pydantic-style and numpydantic-style numeric types for individual field values

Types like numpydantic.dtype.Float still TBD, don't work with `Generic`
@uellue
Copy link
Member Author

uellue commented Sep 18, 2024

This starts looking pretty neat. The machinery can easily be spun out to a separate package. Maybe pintdantic?

@uellue
Copy link
Member Author

uellue commented Sep 18, 2024

This starts looking pretty neat. The machinery can easily be spun out to a separate package. Maybe pintdantic?

... or pint-schema?

...and add test for array that would have caught this
@uellue
Copy link
Member Author

uellue commented Sep 18, 2024

...or part of pint? hgrecco/pint#1166 (comment)

@blakeNaccarato
Copy link

@uellue Hey thanks for the ping over in hgrecco/pint#1166, I'm glad to see numpydantic was a good lead! I'm wrangling with typing expressivity limitations myself over in one of my projects, as well.

it's definitely a slog to take something from "working" to "pleasant" in terms of proper type checking in the IDE, etc.

See p2p-ld/numpydantic#10 where @sneakers-the-rat identifies the following (@sneakers-the-rat please let me know if you'd prefer I not ping you out of the blue like this):

Fixing the linter errors seems like a first, low hanging fruit priority, and we may need to wait to python 3.14 for the functional type annotations to hit, but we should figure out a solid strategy for type checkers.

So @uellue if you're hitting against some crufty type expressivity issues, you're not alone 😅

@sneakers-the-rat
Copy link

please let me know if i can help, or if you'd like additional features in numpydantic, i have a good deal of fun working on it. adding additional interfaces/dtypes/etc. is relatively straightforward to do, but the docs are probably incomplete on that. so again just let me know how i can help!

Instead, use inheritance to specify the different variants. Make type checkers more happy, I hope!

Drive-by: flake8 -- the way how vscode does it seems to have changed
@uellue
Copy link
Member Author

uellue commented Sep 20, 2024

@sneakers-the-rat Thank you for your work on numpydantic and for offering to help!

Your remark to make type checkers more happy was a good inspiration, using inheritance is a bit more boilerplate but seems to work better now.

As specific items that can likely be improved I have the following:

Our next steps here would be to test the approach internally in LiberTEM first to smoothen it out. After that it is probably a good time to think about spinning it out. We "just" have to address a more fundamental open item with the metadata schema itself before we can use it more widely.

@sneakers-the-rat
Copy link

sneakers-the-rat commented Sep 20, 2024

How to handle complex numbers on the JSON side? Is that already working in numpydantic, am I missing something? For wave optics, which we are doing, complex numbers are important.

I'm working on better serialization now, roundtripping to/from json is the next thing on the agenda (see #17) and i'll make sure it works with complex numbers.

Interoperability between the numpydantic/NumPy dtypes and the annotated native types used by pydantic.

help me understand the goal here, the dtypes module has some historical crust from nptyping (those Capitalized types), but i've been trying to make a path away from that, so those compound/generic types like Float are just tuples of types that are checked as something like type in tuple(Float). There's a small bit of documentation here but i need to expand it: https://numpydantic.readthedocs.io/en/latest/api/dtype.html

The generic numpydantic types like numpydantic.dtype.Float don't work at all at the moment.

again see docs above and try dtype in Float. I should be clearer about those.

It would be nice to specify "some kind of floating point".

for that, following numpy's lead, just use float <3

In [1]: from numpydantic import NDArray, Shape

In [2]: from typing import Any

In [3]: import numpy as np

In [4]: NDArray[Any, float]([1.0,2.0])
Out[4]: array([1., 2.])

In [5]: NDArray[Any, float](np.array([1.0,2.0], dtype=np.float16))
Out[5]: array([1., 2.], dtype=float16)

In [6]: NDArray[Any, float](np.array([1.0,2.0], dtype=np.float32))
Out[6]: array([1., 2.], dtype=float32)

In [7]: NDArray[Any, float](np.array([1.0,2.0], dtype=np.float64))
Out[7]: array([1., 2.])

I get syntax error in forward annotation '2 x, 2 y' from flake8 for the shape specification. Can that be solved somehow?

yes, this is the major goal of v2. the string specification is a legacy of nptyping, which i used as a way to bridge from the major thing that existed in the space at the time, but has a bunch of really undesirable problems like this. TypeVarTuple is in python 3.12 and that along with the Generics system should solve a lot of these problems, but it will take a bit of work to excavate these out.

the labels are not necessary but yes the goal is to make the syntax for that something like

NDArray[(2, 2)]
NDArray[Shape[2,2]]
NDArray[(2,2), Any]

though as you can see there are some tradeoffs re: specificity and brevity and so feedback on the forms you would prefer would be useful :)

@uellue
Copy link
Member Author

uellue commented Sep 20, 2024

Thank you for the pointers!

Interoperability between the numpydantic/NumPy dtypes and the annotated native types used by pydantic.

help me understand the goal here,

When I try to build this model (edit: newer version plain NDArray):

from typing_extensions import Annotated
from pydantic import FiniteFloat, BaseModel
from numpydantic import Shape, NDArray

class T(BaseModel):
    t: NDArray[Shape['2, 2'], FiniteFloat]

I get this error

Cell In[4], line 1
----> 1 class T(BaseModel):
      2     t: NDArray[Shape['2, 2'], FiniteFloat]

Cell In[4], line 2, in T()
      1 class T(BaseModel):
----> 2     t: NDArray[Shape['2, 2'], FiniteFloat]

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:150, in SubscriptableMeta.__getitem__(cls, item)
    147 if getattr(cls, "_parameterized", False):
    148     raise NPTypingError(f"Type nptyping.{cls} is already parameterized.")
--> 150 args = cls._get_item(item)
    151 additional_values = cls._get_additional_values(item)
    152 assert hasattr(cls, "__args__"), "A SubscriptableMeta must have __args__."

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:76, in NDArrayMeta._get_item(cls, item)
     74 def _get_item(cls, item: Any) -> Tuple[Any, ...]:
     75     cls._check_item(item)
---> 76     shape, dtype = cls._get_from_tuple(item)
     77     return shape, dtype

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:118, in NDArrayMeta._get_from_tuple(cls, item)
    115 def _get_from_tuple(cls, item: Tuple[Any, ...]) -> Tuple[Shape, DType]:
    116     # Return the Shape Expression and DType from a tuple.
    117     shape = cls._get_shape(item[0])
--> 118     dtype = cls._get_dtype(item[1])
    119     return shape, dtype

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/ndarray.py:128, in NDArrayMeta._get_dtype(cls, dtype_candidate)
    126 elif cls._is_literal_like(dtype_candidate):  # pragma: no cover
    127     structure_expression = dtype_candidate.__args__[0]
--> 128     dtype = Structure[structure_expression]
    129     check_type_names(dtype, dtype_per_name)
    130 elif isinstance(dtype_candidate, tuple):  # pragma: no cover

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:150, in SubscriptableMeta.__getitem__(cls, item)
    147 if getattr(cls, "_parameterized", False):
    148     raise NPTypingError(f"Type nptyping.{cls} is already parameterized.")
--> 150 args = cls._get_item(item)
    151 additional_values = cls._get_additional_values(item)
    152 assert hasattr(cls, "__args__"), "A SubscriptableMeta must have __args__."

File ~/miniconda3/envs/ltschema312/lib/python3.12/site-packages/numpydantic/vendor/nptyping/base_meta_classes.py:216, in ContainerMeta._get_item(cls, item)
    214 def _get_item(cls, item: Any) -> Tuple[Any, ...]:
    215     if not isinstance(item, str):
--> 216         raise InvalidArgumentsError(
    217             f"Unexpected argument of type {type(item)}, expecting a string."
    218         )
    220     if item in cls._known_expressions:
    221         # No need to do costly validations and normalizations if it has been done
    222         # before.
    223         return (item,)

InvalidArgumentsError: Unexpected argument of type <class 'type'>, expecting a string.

Wouldn't it be great if one could use the pydantic constrained types also in numpydantic like this? Essentially, one could map the check for a single item on the whole array. :-)

again see docs above and try dtype in Float. I should be clearer about those.

I can't even pass it as an argument for the type: TypeError: Too many arguments for <class 'libertem_schema.Length'>; actual 5, expected 1. Is it meant to be used as an argument for a generic type, and if yes, how do I do that? Maybe a Union could be better than a tuple for that?

for that, following numpy's lead, just use float <3

Hm, Python float should usually mean specifically np.float64 since it is the double type. That's why I though numpydantic.dtype.Float could be a neat way to specify "any floating point type".

(...) the goal is to make the syntax for that something like (...)

Yes, that would be great! Looking much cleaner.

@sneakers-the-rat
Copy link

Wouldn't it be great if one could use the pydantic constrained types also in numpydantic like this?

Absolutely. I'll open an issue and see if there's a performant way to do this for the other non-numpy interfaces (eg. For hdf5 arrays loading the whole array to check this could be costly), but imo being able to do it at all is more important than it being fast. Ill open an issue shortly

Maybe a Union could be better than a tuple for that?

Yes, we need better abstraction around types than we have atm. Most of that was inherited from nptyping, censoring it in to start removing these problems is somewhat recent. I went with tuples as a first pass because the typing api is a bit of a pain, but the intention was to support them too. I'll raise another issue here.

Hm, Python float should usually mean specifically np.float64 since it is the double type. That's why I though numpydantic.dtype.Float could be a neat way to specify "any floating point type".

There's a little bit of an intuition impedance mismatch between python and numpy, especially when it comes to specifying "any float type" for multiple array backends, I went with a permissive definition where "float" validates against any float type and one can specify float64 to make it more specific, but if that's confusing I agree the Float compound type should work.

I have a bit of time to work on this in an hour or two. Will raise these as issues and try to resolve

@sneakers-the-rat
Copy link

Union types in 1.6.1:

working on docs (sorry it's so scratchy atm)

@uellue
Copy link
Member Author

uellue commented Sep 24, 2024

@sneakers-the-rat Awesome, thanks a lot!

When trying to use the new release I ran into an issue where pydantic complains about a forbidden extra field.

Code to get the JSON schema:

https://github.com/uellue/LiberTEM-schema/blob/426cfaee9093c1687af1688feaf73211cef45b01/src/libertem_schema/__init__.py#L157-L165

JSON schema that is generated

{'dtype': 'numpy.float64',
 'items_schema': {'items_schema': {'ge': -1.7976931348623157e+308,
                                   'le': 1.7976931348623157e+308,
                                   'type': 'float'},
                  'max_length': 2,
                  'metadata': {'name': 'y'},
                  'min_length': 2,
                  'type': 'list'},
 'max_length': 2,
 'metadata': {'name': 'x'},
 'min_length': 2,
 'type': 'list'}

Code to make it work, drop the 'dtype' key if present:

https://github.com/uellue/LiberTEM-schema/blob/426cfaee9093c1687af1688feaf73211cef45b01/src/libertem_schema/__init__.py#L194C1-L196C47

Could it be that the 'dtype' key should rather be in the 'metadata' sub-dictionary?

@sneakers-the-rat
Copy link

i am giving a good chuckle because you have certainly gotten outside the dotted lines of the public API, but i am very curious what you're up to here!

is there a target example you could point me to for the kind of array that should be validated here? I just ran into this problem and that's one of the major reasons for 1.6, to separate the core schema and the json schema generation, and needing to call in to dunder methods makes me think there's something big missing in the docs or implementation here

@uellue
Copy link
Member Author

uellue commented Sep 24, 2024

i am giving a good chuckle because you have certainly gotten outside the dotted lines of the public API, but i am very curious what you're up to here!

is there a target example you could point me to for the kind of array that should be validated here? I just ran into this problem and that's one of the major reasons for 1.6, to separate the core schema and the json schema generation, and needing to call in to dunder methods makes me think there's something big missing in the docs or implementation here

Well, the general purpose is to generate a schema for 'pint.Quantity'. In particular, it should also work in JSON form: Encoding it as JSON and also validating it as JSON as far as possible, for interoperability and serialization. For the JSON representation my idea is a tuple of "whatever the magnitude is" and a literal string for the units. So in order to get a schema for Pint quantities where the magnitude is an array, I try to get a schema for the array part from numpydantic and insert it into my tuple schema.

What would be the "proper" way to do that? For the basic types in pydantic I can use TypeAdapter, but that doesn't seem to work with numpydantic:

na = numpydantic.NDArray[numpydantic.Shape['2 x, 2 y'], numpydantic.dtype.Float64]
pydantic.TypeAdapter(na)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:270](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=269), in TypeAdapter._init_core_attrs(self, rebuild_mocks)
    269 try:
--> 270     self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
    271     self._validator = _getattr_no_parents(self._type, '__pydantic_validator__')

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:112](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=111), in _getattr_no_parents(obj, attribute)
    111 else:
--> 112     raise AttributeError(attribute)

AttributeError: __pydantic_core_schema__

During handling of the above exception, another exception occurred:

UnboundLocalError                         Traceback (most recent call last)
Cell In[7], line 1
----> 1 pydantic.TypeAdapter(na)

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:257](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=256), in TypeAdapter.__init__(self, type, config, _parent_depth, module)
    252 if not self._defer_build():
    253     # Immediately initialize the core schema, validator and serializer
    254     with self._with_frame_depth(1):  # +1 frame depth for this __init__
    255         # Model itself may be using deferred building. For backward compatibility we don't rebuild model mocks
    256         # here as part of __init__ even though TypeAdapter itself is not using deferred building.
--> 257         self._init_core_attrs(rebuild_mocks=False)

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:135](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=134), in _frame_depth.<locals>.wrapper.<locals>.wrapped(self, *args, **kwargs)
    132 @wraps(func)
    133 def wrapped(self: TypeAdapterT, *args: P.args, **kwargs: P.kwargs) -> R:
    134     with self._with_frame_depth(depth + 1):  # depth + 1 for the wrapper function
--> 135         return func(self, *args, **kwargs)

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py:282](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/pydantic/type_adapter.py#line=281), in TypeAdapter._init_core_attrs(self, rebuild_mocks)
    275     core_config = config_wrapper.core_config(None)
    277     self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
    278     self._validator = create_schema_validator(
    279         schema=self._core_schema,
    280         schema_type=self._type,
    281         schema_type_module=self._module_name,
--> 282         schema_type_name=str(self._type),
    283         schema_kind='TypeAdapter',
    284         config=core_config,
    285         plugin_settings=config_wrapper.plugin_settings,
    286     )
    287     self._serializer = SchemaSerializer(self._core_schema, core_config)
    289 if rebuild_mocks and isinstance(self._core_schema, _mock_val_ser.MockCoreSchema):

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py:101](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/vendor/nptyping/ndarray.py#line=100), in NDArrayMeta.__str__(cls)
     97 def __str__(cls) -> str:
     98     shape, dtype = cls.__args__
     99     return (
    100         f"{cls.__name__}[{cls._shape_expression_to_str(shape)}, "
--> 101         f"{cls._dtype_to_str(dtype)}]"
    102     )

File [~/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/ndarray.py:155](http://localhost:8888/home/weber/miniforge3/envs/ltschema312-2/lib/python3.12/site-packages/numpydantic/ndarray.py#line=154), in NDArrayMeta._dtype_to_str(cls, dtype)
    153 elif isinstance(dtype, tuple):
    154     result = ", ".join([str(dt) for dt in dtype])
--> 155 return result

UnboundLocalError: cannot access local variable 'result' where it is not associated with a value

@uellue
Copy link
Member Author

uellue commented Sep 24, 2024

...and on the Python side I try to get hold of a validation function for that part and call it in my own validation function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants