-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic side of reusing validators and serializers #10246
base: main
Are you sure you want to change the base?
Conversation
Deploying pydantic-docs with Cloudflare Pages
|
33c12f5
to
84c1e55
Compare
CodSpeed Performance ReportMerging #10246 will improve performances by 21.97%Comparing Summary
Benchmarks breakdown
|
84c1e55
to
34e2488
Compare
0085c9d
to
3777d9b
Compare
3777d9b
to
359dac5
Compare
pydantic/json_schema.py
Outdated
def nested_model_schema(self, schema: core_schema.NestedModelSchema) -> JsonSchemaValue: | ||
new_schema = cast('type[BaseModel]', schema['model']).__pydantic_core_schema__ | ||
json_schema = self.model_schema(new_schema) | ||
return json_schema | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the correct way of handling nested-model
schemas when generating json schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate more on what next steps we might take here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote about this in the PR description
|
||
def foo() -> tuple[CoreSchema, SchemaValidator, SchemaSerializer]: | ||
if obj.__pydantic_core_schema__ is None or MockCoreSchema: | ||
obj.model_rebuild() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a distinction between calling model_rebuild
vs calling rebuild()
on the actual Mock objects?
Regardless- we have to build the validator/serializer/schema when they're mocks as otherwise our model will not be able to be validated/serialized/etc as its schema would be incomplete in cases such as the following:
class A(BaseModel):
field_a: List['B']
class B(BaseModel):
field_b: A
# validate an instance of `B` before `A` gets a chance to be rebuilt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could just skip this optimization for cases like this to avoid calling model_rebuild
from core. @davidhewitt might know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future readers: cc pydantic/pydantic-core#1414 (comment)
359dac5
to
363c7a0
Compare
ref: boxy/validator_serializer_reuse | ||
path: pydantic-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, we should probably do this against pydantic-core
main sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely seems like the kind of thing that could make sense to do before a release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General questions :)
One thing that would be super helpful here is if you could write up an issue (on pydantic-core
probably) detailing the high level ideas associated with these PRs - what change are you making? Why does it help / matter? What challenges are notable? What are the most important next steps?
|
||
def foo() -> tuple[CoreSchema, SchemaValidator, SchemaSerializer]: | ||
if obj.__pydantic_core_schema__ is None or MockCoreSchema: | ||
obj.model_rebuild() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could just skip this optimization for cases like this to avoid calling model_rebuild
from core. @davidhewitt might know.
pydantic/json_schema.py
Outdated
def nested_model_schema(self, schema: core_schema.NestedModelSchema) -> JsonSchemaValue: | ||
new_schema = cast('type[BaseModel]', schema['model']).__pydantic_core_schema__ | ||
json_schema = self.model_schema(new_schema) | ||
return json_schema | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate more on what next steps we might take here?
# validating `MyModel` requires building `MyNestedModel` as its validator/serializer is reused. | ||
assert isinstance(MyNestedModel.__pydantic_validator__, SchemaValidator) | ||
assert isinstance(MyNestedModel.__pydantic_serializer__, SchemaSerializer) | ||
assert generate_schema_calls.count == 2, 'Should not build duplicated core schemas' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the removal of the defer_build
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating/Serializing MyModel
reuses the validator/serializer from MyNestedModel
, so if they are mocks then we automatically rebuild them for use in validation of MyModel
. This means that even though we defer the build of MyNestedModel
it gets built anyway later on.
This is a good example of a case where this PR can reduce amount of work we do. MyModel
always required the core schema of MyNestedModel
in order to validate but previously it was generated twice, once inline in MyModel
and once separately when rebuilding MyNestedModel
later. Now the schema is only generated once and reused between MyModel
and MyNestedModel
363c7a0
to
9ec9032
Compare
9ec9032
to
ab9b6c6
Compare
ab9b6c6
to
f2de5d5
Compare
cc @Viicos, perhaps going to pick up this work after your current feature progress |
Change Summary
Python side of pydantic/pydantic-core#1414, implements a new core schema variant for reusing schema, validators, and serializers for some sizeable performance wins right now and also lots of future possibilities for more wins.
Future work
We do not use the new
nested-model
schema in all applicable circumstances which leaves performance on the table. Below I've outlined what those cases are and why I did so:nested-model
as theapply_discriminators
logic requires being able to recurse into the models which is not possible withnested-model
.nested-model
for the usage of the inner model as if we reused schema for it via__pydantic_core_schema__
for a union choice, then once againapply_discriminators
logic would not be able to recurse into the model's fields.nested-model
to refer to generic schemas, I do not believe there to be any problem with usingnested-model
here conceptually. However, in practice this hitsKeyError
happening during schema cleaning with generic models #10279 so we fallback to the old generation strategy when encountering usages of generic schemas.BaseModel
, e.g:nested-model
for the field's schema, this caused some issues (that I can't quite remember, though it ought to be simple to disable the exception and see what breaks).nested-model
should be changed before anything is landed here)I would expect that removing these special cases is a good opportunity for performance improvements. My assumption would be that allowing
nested-model
in union choices, usingnested-model
to refer to generic schemas, and reusing schema/validators/serializers on dataclasses/typed dicts, would be the most important wins.I have opened an issue #10394 to track this future work
Before merging
Most of the test failures at this point are due to incorrect handling of json schema generation. The current implementation when encountering a
nested-model
schema recurses into the model as if the core schema was inlined. This is incorrect when encountering cycles as it has cycle check and also does not create any$def
s in the json schema.The solution here would be to search for any
nested-model
in the core schema and create a$def
in the json schema for the referenced model. This would allow us to correctly handle mutually recursive model definitions.Remove the commit that changes CI to build against `pydantic_core/boxy/validator_serializer_reuse
Related issue number
Checklist