Pydantic side of reusing validators and serializers #10246

BoxyUwU · 2024-08-27T16:25:58Z

Change Summary

Python side of pydantic/pydantic-core#1414, implements a new core schema variant for reusing schema, validators, and serializers for some sizeable performance wins right now and also lots of future possibilities for more wins.

Future work

We do not use the new nested-model schema in all applicable circumstances which leaves performance on the table. Below I've outlined what those cases are and why I did so:

When generating schema for union choices we fallback to the old schema generation strategy and do not use nested-model as the apply_discriminators logic requires being able to recurse into the models which is not possible with nested-model.
When generating schema for root models we do not use nested-model for the usage of the inner model as if we reused schema for it via __pydantic_core_schema__ for a union choice, then once again apply_discriminators logic would not be able to recurse into the model's fields.
We also do not use nested-model to refer to generic schemas, I do not believe there to be any problem with using nested-model here conceptually. However, in practice this hits KeyError happening during schema cleaning with generic models #10279 so we fallback to the old generation strategy when encountering usages of generic schemas.
The last exception is that when we have a field types as BaseModel, e.g:
```
class MyModel(BaseModel):
    field: BaseModel
```
We do not use a nested-model for the field's schema, this caused some issues (that I can't quite remember, though it ought to be simple to disable the exception and see what breaks).
The schema is only used for models but it should be capable of supporting arbitrary schema/validator/serializer reuse and then we can reuse these things for dataclasses and typed dicts. I avoided implementing this right now since it seemed like a lot of extra work. (Though the name nested-model should be changed before anything is landed here)

I would expect that removing these special cases is a good opportunity for performance improvements. My assumption would be that allowing nested-model in union choices, using nested-model to refer to generic schemas, and reusing schema/validators/serializers on dataclasses/typed dicts, would be the most important wins.

I have opened an issue #10394 to track this future work

Before merging

Most of the test failures at this point are due to incorrect handling of json schema generation. The current implementation when encountering a nested-model schema recurses into the model as if the core schema was inlined. This is incorrect when encountering cycles as it has cycle check and also does not create any $defs in the json schema.

The solution here would be to search for any nested-model in the core schema and create a $def in the json schema for the referenced model. This would allow us to correctly handle mutually recursive model definitions.
Remove the commit that changes CI to build against `pydantic_core/boxy/validator_serializer_reuse

Related issue number

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

cloudflare-workers-and-pages · 2024-08-28T10:28:43Z

Deploying pydantic-docs with Cloudflare Pages

Latest commit:	`f2de5d5`
Status:	✅ Deploy successful!
Preview URL:	https://a1b64343.pydantic-docs.pages.dev
Branch Preview URL:	https://boxy-validator-serializer-re.pydantic-docs.pages.dev

View logs

codspeed-hq · 2024-08-28T14:36:11Z

CodSpeed Performance Report

Merging #10246 will improve performances by 21.97%

_{Comparing boxy/validator_serializer_reuse (f2de5d5) with main (8b6d5fc)}

Summary

⚡ 3 improvements
✅ 44 untouched benchmarks
🆕 1 new benchmarks
⁉️ 1 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`boxy/validator_serializer_reuse`	Change
⚡	`test_fastapi_startup_perf`	254 ms	208.3 ms	+21.97%
⚡	`test_fastapi_startup_perf`	31 ms	28.3 ms	+9.84%
⚡	`test_construct_schema`	4.9 ms	4.6 ms	+5.99%
⁉️	`test_nested_model_schema_generation`	3.6 ms	N/A	N/A
🆕	`test_nested_schema_generation`	N/A	3.6 ms	N/A

pydantic/_internal/_generate_schema.py

BoxyUwU · 2024-09-11T13:03:55Z

pydantic/json_schema.py

+    def nested_model_schema(self, schema: core_schema.NestedModelSchema) -> JsonSchemaValue:
+        new_schema = cast('type[BaseModel]', schema['model']).__pydantic_core_schema__
+        json_schema = self.model_schema(new_schema)
+        return json_schema
+


This is not the correct way of handling nested-model schemas when generating json schemas

Could you elaborate more on what next steps we might take here?

Wrote about this in the PR description

BoxyUwU · 2024-09-11T13:10:45Z

pydantic/_internal/_generate_schema.py

+
+        def foo() -> tuple[CoreSchema, SchemaValidator, SchemaSerializer]:
+            if obj.__pydantic_core_schema__ is None or MockCoreSchema:
+                obj.model_rebuild()


Is there a distinction between calling model_rebuild vs calling rebuild() on the actual Mock objects?

Regardless- we have to build the validator/serializer/schema when they're mocks as otherwise our model will not be able to be validated/serialized/etc as its schema would be incomplete in cases such as the following:

class A(BaseModel): field_a: List['B'] class B(BaseModel): field_b: A # validate an instance of `B` before `A` gets a chance to be rebuilt

I wonder if we could just skip this optimization for cases like this to avoid calling model_rebuild from core. @davidhewitt might know.

For future readers: cc pydantic/pydantic-core#1414 (comment)

sydney-runkle · 2024-09-11T15:35:49Z

.github/workflows/codspeed.yml

+          ref: boxy/validator_serializer_reuse
+          path: pydantic-core


Cool, we should probably do this against pydantic-core main sometimes.

definitely seems like the kind of thing that could make sense to do before a release

sydney-runkle

General questions :)

One thing that would be super helpful here is if you could write up an issue (on pydantic-core probably) detailing the high level ideas associated with these PRs - what change are you making? Why does it help / matter? What challenges are notable? What are the most important next steps?

pydantic/_internal/_generate_schema.py

sydney-runkle · 2024-09-11T15:42:55Z

pydantic/_internal/_generate_schema.py

+
+        def foo() -> tuple[CoreSchema, SchemaValidator, SchemaSerializer]:
+            if obj.__pydantic_core_schema__ is None or MockCoreSchema:
+                obj.model_rebuild()


I wonder if we could just skip this optimization for cases like this to avoid calling model_rebuild from core. @davidhewitt might know.

pydantic/_internal/_generate_schema.py

sydney-runkle · 2024-09-11T15:43:58Z

pydantic/json_schema.py

+    def nested_model_schema(self, schema: core_schema.NestedModelSchema) -> JsonSchemaValue:
+        new_schema = cast('type[BaseModel]', schema['model']).__pydantic_core_schema__
+        json_schema = self.model_schema(new_schema)
+        return json_schema
+


Could you elaborate more on what next steps we might take here?

sydney-runkle · 2024-09-11T15:44:32Z

tests/test_config.py

+    # validating `MyModel` requires building `MyNestedModel` as its validator/serializer is reused.
+    assert isinstance(MyNestedModel.__pydantic_validator__, SchemaValidator)
+    assert isinstance(MyNestedModel.__pydantic_serializer__, SchemaSerializer)
+    assert generate_schema_calls.count == 2, 'Should not build duplicated core schemas'


Why the removal of the defer_build check?

Validating/Serializing MyModel reuses the validator/serializer from MyNestedModel, so if they are mocks then we automatically rebuild them for use in validation of MyModel. This means that even though we defer the build of MyNestedModel it gets built anyway later on.

This is a good example of a case where this PR can reduce amount of work we do. MyModel always required the core schema of MyNestedModel in order to validate but previously it was generated twice, once inline in MyModel and once separately when rebuilding MyNestedModel later. Now the schema is only generated once and reused between MyModel and MyNestedModel

sydney-runkle · 2024-09-16T15:04:09Z

cc @Viicos, perhaps going to pick up this work after your current feature progress

github-actions bot added the relnotes-fix Used for bugfixes. label Aug 27, 2024

davidhewitt force-pushed the boxy/validator_serializer_reuse branch 5 times, most recently from 33c12f5 to 84c1e55 Compare August 28, 2024 14:26

BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 84c1e55 to 34e2488 Compare August 28, 2024 15:12

BoxyUwU commented Aug 28, 2024

View reviewed changes

pydantic/_internal/_generate_schema.py Outdated Show resolved Hide resolved

BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 0085c9d to 3777d9b Compare September 2, 2024 15:41

BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 3777d9b to 359dac5 Compare September 9, 2024 15:55

BoxyUwU commented Sep 11, 2024

View reviewed changes

BoxyUwU mentioned this pull request Sep 11, 2024

Introduce a schema variant to reuse Validators, Serializers and CoreSchema pydantic/pydantic-core#1414

Open

BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 359dac5 to 363c7a0 Compare September 11, 2024 14:17

sydney-runkle reviewed Sep 11, 2024

View reviewed changes

BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 363c7a0 to 9ec9032 Compare September 11, 2024 16:18

BoxyUwU mentioned this pull request Sep 12, 2024

Future work for reusing core schema/validators/serializers #10394

Open

BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 9ec9032 to ab9b6c6 Compare September 12, 2024 14:52

davidhewitt and others added 2 commits September 12, 2024 15:56

override pydantic-core from git

cb8d101

WIP

f2de5d5

BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from ab9b6c6 to f2de5d5 Compare September 12, 2024 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pydantic side of reusing validators and serializers #10246

Pydantic side of reusing validators and serializers #10246

BoxyUwU commented Aug 27, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Aug 28, 2024 •

edited

Loading

codspeed-hq bot commented Aug 28, 2024 •

edited

Loading

BoxyUwU Sep 11, 2024

sydney-runkle Sep 11, 2024

BoxyUwU Sep 12, 2024

BoxyUwU Sep 11, 2024 •

edited

Loading

sydney-runkle Sep 11, 2024

BoxyUwU Sep 11, 2024

sydney-runkle Sep 11, 2024

BoxyUwU Sep 11, 2024

sydney-runkle left a comment

sydney-runkle Sep 11, 2024

sydney-runkle Sep 11, 2024

sydney-runkle Sep 11, 2024

BoxyUwU Sep 11, 2024

sydney-runkle commented Sep 16, 2024

Pydantic side of reusing validators and serializers #10246

Are you sure you want to change the base?

Pydantic side of reusing validators and serializers #10246

Conversation

BoxyUwU commented Aug 27, 2024 • edited Loading

Change Summary

Future work

Before merging

Related issue number

Checklist

cloudflare-workers-and-pages bot commented Aug 28, 2024 • edited Loading

Deploying pydantic-docs with Cloudflare Pages

codspeed-hq bot commented Aug 28, 2024 • edited Loading

Merging #10246 will improve performances by 21.97%

Summary

Benchmarks breakdown

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BoxyUwU Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sydney-runkle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sydney-runkle commented Sep 16, 2024

BoxyUwU commented Aug 27, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Aug 28, 2024 •

edited

Loading

codspeed-hq bot commented Aug 28, 2024 •

edited

Loading

BoxyUwU Sep 11, 2024 •

edited

Loading