Optimize internally tagged enums -- do not use internal buffer if tag is the first field #1922

Mingun · 2020-11-03T20:00:26Z

Also that change has one positive side-effect: if tag is the first field, negative effects from #1183 is eliminated, because buffering is not used

@RReverser, feel free to try to run your benchmarks against this branch

dtolnay

The discussion in #1495 focuses on whether this would be worth it for the cost in compile time. The biggest thing wrong with internally tagged enums is not when the deserializer is slow but that they take significantly long to compile. Overall I would rather optimize for lowering their compile time, not on performance or features at the expense of compile time.

Would you be able to provide some measurements showing the impact of this change on the time to compile internally tagged enums?

Mingun · 2021-02-28T10:17:30Z

I cannot agree with such a question. For me, the runtime performance is much more important thing that compile-time performance. Things are written not for the pleasure of developers, but for solving customer problems.

Would you be able to provide some measurements showing the impact of this change on the time to compile internally tagged enums?

I will try to study how to do performance measurements, but any guidance is welcome

RReverser · 2021-02-28T21:29:40Z

Overall I would rather optimize for lowering their compile time, not on performance or features at the expense of compile time.

That sounds odd tbh. Compile times are affecting only developers, while runtime affects every user of the library / application, which is way more impactful. Why choose compile-time over runtime perf here when we don't do that at any other levels of development (e.g. opt-level = 0 vs opt-level = 2 etc.)?

Mingun · 2021-03-06T16:36:19Z

I've made some research and there is the results. I've created a library project with 1000 types and I've measured compilation time.

I've noticed small increasing of the compilation time, about 0.01 sec per type (or 7-30%). I think it is acceptable worth for the bigger runtime performance.

Test code and raw data

serde-perf.zip

Created library cargo project with following lib.rs content:

use serde::{Deserialize};
macro_rules! generate {
  ($(#[$counter:meta])*) => {
    $(
      const _: () = {
        #[$counter]
        #[derive(Deserialize)]
        #[serde(tag = "tag")]
        enum Node {
          Unit,
          Struct {
            name: String,
            list: Vec<Node>,
          },
          // Uncomment for "big enum" tests
          /*
          Newtype1(std::collections::HashMap<String, String>),
          Newtype2(String),
          Newtype3(u32),
          Newtype4(f32),
          Unit1,
          Unit2,
          Unit3,
          Unit4,
          Struct1 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct2 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct3 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct4 { f1: String, f2: u32, f3: bool, f4: f64 },// */
        }
      };
    )*
  };
}
// Expanded manually for "expand" tests
generate!(
  /// ...
  /// 1000 lines
  /// ...
);

Tests run with command

cargo +nightly build -Ztimings

Test PC

OS version: Windows_NT x64 10.0.18363
CPUs: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 x 1992)

Summary table

Small enum (2 variants), 1000 types

Derived both Serialize and Deserialize, types generated with generate! macro

Deserialize + Serialize	master (`c261015`) (100%)	PR (`a169bee`)	Diff
all	47.37 = (46.96 + 47.10 + 48.06)/3	58.67 = (59.27 + 59.17 + 57.56)/3	+11.29 (+29%)
codegen	3.28 = (46.96-43.95 + 47.10-43.92 + 48.06-44.42)/3	4.81 = (59.27-53.93 + 59.17-53.85 + 57.56-53.80)/3	+1.53 (+10%)

Derived only Deserialize, types generated with generate! macro

Deserialize	master (`c261015`) (100%)	PR (`a169bee`)	Diff
all	39.25 = (39.16 + 39.66 + 38.94)/3	50.08 = (49.70 + 51.62 + 48.93)/3	+10.83 (+28%)
codegen	3.62 = (39.16-35.31 + 39.66-35.28 + 38.94-36.31)/3	4.53 = (49.70-46.17 + 51.62-45.29 + 48.93-45.21)/3	+0.91 (+25%)

Derived only Deserialize, types written manually

Deserialize + expanded	master (`c261015`) (100%)	PR (`a169bee`)	Diff
all	38.76 = (38.38 + 39.67 + 38.23)/3	50.36 = (50.34 + 50.59 + 50.14)/3	+11.60 (+30%)
codegen	3.24 = (38.38-35.41 + 39.67-35.65 + 38.23-35.50)/3	5.67 = (50.34-44.34 + 50.59-44.74 + 50.14-45.02)/3	+2.42 (+75%)

Big enum (14 variants), 1000 types

Derived only Deserialize, types generated with generate! macro

Deserialize	master (`c261015`) (100%)	PR (`a169bee`)	Diff
all	241.04 = (236.94 + 239.35 + 246.84)/3	257.01 = (257.75 + 258.76 + 254.70)/3	+16.03 (+7%)
codegen	40.90 = (236.94-197.11 + 239.35-199.61 + 246.84-203.70)/3	46.63 = (257.75-211.25 + 258.76-210.17 + 254.70-209.91)/3	+5.72 (+14%)

Derived only Deserialize, types written manually

Deserialize + expanded	master (`c261015`) (100%)	PR (`a169bee`)	Diff
all	238.41 = (238.49 + 236.58 + 240.16)/3	254.86 = (258.93 + 254.75 + 250.90)/3	+16.45 (+7%)
codegen	39.70 = (238.49-198.46 + 236.58-199.66 + 240.16-198.02)/3	45.05 = (258.93-209.91 + 254.75-211.19 + 250.90-208.34)/3	+5.35 (+13%)

pickfire · 2021-03-26T17:40:26Z

@Mingun What about runtime performance improvements?

Mingun · 2021-03-26T18:26:50Z

I didn't measure it, maybe I should

jarredholman · 2022-07-26T15:47:36Z

Would this also fix the incorrect error messages caused by the internal buffer? #1621

Mingun · 2022-07-26T16:36:12Z

For the optimized case yes, it should

Mingun

@dtolnay, @oli-obk , this PR ready for review again.

Mingun · 2024-08-24T16:47:24Z

serde/src/de/value.rs

+    fn deserialize_unit<V>(self, visitor: V) -> Result<V::Value, Self::Error>
+    where
+        V: de::Visitor<'de>,
+    {
+        // Covered by tests/test_enum_internally_tagged.rs
+        //      newtype_unit
+        visitor.visit_unit()
+    }
+
+    fn deserialize_unit_struct<V>(
+        self,
+        _name: &'static str,
+        visitor: V,
+    ) -> Result<V::Value, Self::Error>
+    where
+        V: de::Visitor<'de>,
+    {
+        // Covered by tests/test_enum_internally_tagged.rs
+        //      newtype_unit_struct
+        self.deserialize_unit(visitor)
+    }
+
+    fn deserialize_newtype_struct<V>(self, _name: &str, visitor: V) -> Result<V::Value, Self::Error>
+    where
+        V: de::Visitor<'de>,
+    {
+        visitor.visit_newtype_struct(self)
+    }
+


I'm not sure, should we change behavior of SeqAccessDeserializer and MapAccessDeserializer or introduce new private deserializers? From one hand those deserializers was created for support of various serde attributes. From the other hand, technically this is breaking change because those types are public.

Mingun · 2024-08-25T16:54:40Z

I realized, that the old Visitor::visit helper method actually a handwritten DeserializeSeed implementation. So derive now generates DeserializeSeed and all optimization stuff lives in the normal code now.

Mingun · 2024-08-30T19:33:47Z

As I remember, changes in this PR depends on changes in #2445.

As I already said, if you wish, it is possible to do not touch MapAccessDeserializer and SeqAccessDeserializer, but instead introduce new private deserializers.

@dtolnay, @oli-obk, please give your opinion, should I make these changes?

oli-obk · 2024-09-03T13:35:01Z

I didn't measure it, maybe I should

Did this happen?

Mingun · 2024-09-03T14:17:30Z

No. Any suggestions for the benchmark are welcome.

RReverser · 2024-09-03T15:02:23Z

No. Any suggestions for the benchmark are welcome.

It's pretty ancient by now, but in the original issue I referenced binast/binjs-ref@22103b9 where I did a manual implementation of this optimisation just for the types we used.

For testing, I checked out a commit right before that. binast/binjs-ref@53bd87a

The numbers before this PR:

test bench_parsing_reuse_parser       ... bench:  88,088,720 ns/iter (+/- 12,560,451)

The numbers with this PR applied:

test bench_parsing_reuse_parser       ... bench:  66,515,390 ns/iter (+/- 7,634,431)

Note that this is far from a pure JSON benchmark - it uses an external Node.js process to parse JS and produce JSON, and only then parses the output using serde-json, but in that context the -25% perf improvement is even more impressive.

It should be easy to save the JSON output and do a pure serde-json benchmark instead (in my original commit I suggested that showed 2x improvement, which seems realistic), but perhaps someone has more modern examples.

Anything touching JS AST represented as JSON (e.g. Deserialize for ESTree Program from https://swc.rs/) should work.

RReverser · 2024-09-03T16:08:20Z

serde/src/private/de.rs

        where
            S: SeqAccess<'de>,
        {
-            Ok(())
+            match tri!(seq.next_element()) {


This behaves quite differently from IgnoredAny.visit_map. I think the behaviour should be consistent, as in, iterate over the entire sequence and ignore its values instead of erroring out on non-empty sequence.

I tried that initially, but that failed other tests and in general not what you want. The unit / unit struct represented in sequence as nothing, so we need to ensure that sequence is empty. This is consistent with normal behavior where struct deserialization from a sequence expects exact number of values, and those fact that flattened unit / unit struct considered as equal to the struct without fields.

Actually, the unit_variant_with_unknown_fields is a test that failed if consume the whole sequence here.

serde/test_suite/tests/test_enum_internally_tagged.rs

Lines 1447 to 1456 in 3aca38d

// Unknown elements are not allowed in sequences

assert_de_tokens_error::<InternallyTagged>(

&[

Token::Seq { len: None },

Token::Str("Unit"), // tag

Token::I32(0),

Token::SeqEnd,

],

"invalid length 1, expected 0 elements in sequence",

);

The unit / unit struct represented in sequence as nothing

Hm but "nothing" should be pretty different conceptually from "ignored any". I'd expect a custom check just for the nothing case, whereas ignored any should be able to consume anything thrown at it silently.

This code tries to read something, doesn't matter what. We expect an empty sequence, so if it contains some element, we fail.

Nevermind, I'm sleepy - I thought you're changing how IgnoredAny works everywhere. I've expanded the context of the diff and I see this is a change on this one specific visitor.

Please disregard my original comment 🤦‍♂️

Although I now wonder if visit_map should be changed to check length as well.

By default maps in serde allows unknown keys and when unit is flattened, all keys become unknown. But you're right -- in case of #[serde(deny_unknown_fields)] we should return error if map not empty. That's idea for another PR!

Mingun · 2024-09-15T09:50:47Z

@oli-obk, how your review progressed?

Mingun · 2024-10-06T06:47:04Z

@dtolnay, @oli-obk, any feedback? Also for other my PRs, please.

oli-obk · 2024-10-06T06:48:55Z

Please refrain from pinging, it's already one of the only two PRs in my self review/assigned list

(review this commit with "ignore whitespace changes" option on)

… from sequence failures (2): newtype_unit_struct unit_variant_with_unknown_fields

failures (1): unit_variant_with_unknown_fields Fixed (1): newtype_unit_struct

When intermediate buffer is used, we can just ignore data, because it already was read from the original deserializer to the buffer and check for the emptiness was performed in another place. Now we reading directly from the original deserializer and should ensure empty sequence by self. Fixed (1): unit_variant_with_unknown_fields

…rst field failures (3): newtype_newtype newtype_unit newtype_unit_struct

…ccessDeserializer Fixed (3): newtype_newtype newtype_unit newtype_unit_struct

(review this commit with "ignore whitespace changes" option on)

Deserializer methods are only hints which deserializer is not obliged to follow. Both - TaggedContentVisitor - InternallyTaggedUnitVisitor accepts only visit_map and visit_seq and that is what derived implementation of Deserialize does for structs. Therefore it is fine to call deserialize_map here, as that already did in derived deserialize implementation. Because those structs officially not public, it is used only by derive macro

Mingun force-pushed the optimize-internal-tagged-enums branch from 84c311d to 94e15ef Compare November 3, 2020 20:06

Mingun changed the title ~~Optimize internal tagged enums -- do not use internal buffer if tag is the first field~~ Optimize internally tagged enums -- do not use internal buffer if tag is the first field Nov 3, 2020

Mingun force-pushed the optimize-internal-tagged-enums branch from 94e15ef to 5106111 Compare February 23, 2021 18:34

dtolnay requested changes Feb 28, 2021

View reviewed changes

Mingun requested a review from dtolnay March 6, 2021 17:10

Mingun force-pushed the optimize-internal-tagged-enums branch from a169bee to da641b1 Compare March 10, 2022 16:33

dtolnay force-pushed the master branch from 58c82f1 to d208762 Compare September 3, 2022 04:16

Mingun mentioned this pull request May 7, 2023

Add ability to deserialize enums from SeqAccessDeserializer #2445

Open

Mingun mentioned this pull request Aug 12, 2023

Exhaustive internally tagged tests + support of internally tagged enums in non self-describing formats #2569

Draft

Mingun mentioned this pull request Aug 6, 2024

Deserializing to variant vector fields fails tafia/quick-xml#288

Open

Mingun force-pushed the optimize-internal-tagged-enums branch from da641b1 to c22590d Compare August 24, 2024 15:54

Mingun commented Aug 24, 2024

View reviewed changes

Mingun force-pushed the optimize-internal-tagged-enums branch from c22590d to 8cd44cf Compare August 25, 2024 16:49

oli-obk self-assigned this Sep 3, 2024

oli-obk self-requested a review September 3, 2024 15:11

RReverser reviewed Sep 3, 2024

View reviewed changes

Mingun mentioned this pull request Sep 26, 2024

Cannot Deserialize a Serializable Enum tafia/quick-xml#808

Closed

Mingun force-pushed the optimize-internal-tagged-enums branch from 8cd44cf to 132dc81 Compare October 21, 2024 20:31

Mingun and others added 14 commits October 25, 2024 21:02

Generate final match inside DeserializeSeed implementation

108f1a9

(review this commit with "ignore whitespace changes" option on)

Move DeserializeSeed implementation above

70df1d8

Produce final result from TaggedContentVisitor

20c790f

Do not buffer content of the internally tagged enums when deserialize…

4ce376d

… from sequence failures (2): newtype_unit_struct unit_variant_with_unknown_fields

Allow to deserialize unit and unit structs from SeqAccessDeserializer

7de9499

failures (1): unit_variant_with_unknown_fields Fixed (1): newtype_unit_struct

Allow to deserialize newtype structs from SeqAccessDeserializer

960e58a

Extract first iteration - just copy body of loop

01af1f8

Replace if let by match

f246a57

Do not buffer content of the internally tagged enums if tag is the fi…

001f92d

…rst field failures (3): newtype_newtype newtype_unit newtype_unit_struct

Allow to deserialize unit, unit structs and newtype structs from MapA…

46a2fd6

…ccessDeserializer Fixed (3): newtype_newtype newtype_unit newtype_unit_struct

Do not create vector when tag is the first field

60bcfb8

(review this commit with "ignore whitespace changes" option on)

TagOrContent unnecessary public, remove pub

23f953d

Mingun force-pushed the optimize-internal-tagged-enums branch from 132dc81 to 92c1d80 Compare October 25, 2024 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize internally tagged enums -- do not use internal buffer if tag is the first field #1922

Optimize internally tagged enums -- do not use internal buffer if tag is the first field #1922

Mingun commented Nov 3, 2020 •

edited

Loading

dtolnay left a comment

Mingun commented Feb 28, 2021

RReverser commented Feb 28, 2021

Mingun commented Mar 6, 2021 •

edited

Loading

Test PC

Small enum (2 variants), 1000 types

Big enum (14 variants), 1000 types

pickfire commented Mar 26, 2021

Mingun commented Mar 26, 2021

jarredholman commented Jul 26, 2022

Mingun commented Jul 26, 2022

Mingun left a comment

Mingun Aug 24, 2024

Mingun commented Aug 25, 2024

Mingun commented Aug 30, 2024

oli-obk commented Sep 3, 2024

Mingun commented Sep 3, 2024

RReverser commented Sep 3, 2024

RReverser Sep 3, 2024 •

edited

Loading

Mingun Sep 3, 2024

Mingun Sep 3, 2024

RReverser Sep 3, 2024

Mingun Sep 3, 2024

RReverser Sep 3, 2024

Mingun Sep 3, 2024

Mingun commented Sep 15, 2024

Mingun commented Oct 6, 2024

oli-obk commented Oct 6, 2024

	// Unknown elements are not allowed in sequences
	assert_de_tokens_error::<InternallyTagged>(
	&[
	Token::Seq { len: None },
	Token::Str("Unit"), // tag
	Token::I32(0),
	Token::SeqEnd,
	],
	"invalid length 1, expected 0 elements in sequence",
	);

Optimize internally tagged enums -- do not use internal buffer if tag is the first field #1922

Are you sure you want to change the base?

Optimize internally tagged enums -- do not use internal buffer if tag is the first field #1922

Conversation

Mingun commented Nov 3, 2020 • edited Loading

dtolnay left a comment

Choose a reason for hiding this comment

Mingun commented Feb 28, 2021

RReverser commented Feb 28, 2021

Mingun commented Mar 6, 2021 • edited Loading

Test PC

Small enum (2 variants), 1000 types

Big enum (14 variants), 1000 types

pickfire commented Mar 26, 2021

Mingun commented Mar 26, 2021

jarredholman commented Jul 26, 2022

Mingun commented Jul 26, 2022

Mingun left a comment

Choose a reason for hiding this comment

Mingun Aug 24, 2024

Choose a reason for hiding this comment

Mingun commented Aug 25, 2024

Mingun commented Aug 30, 2024

oli-obk commented Sep 3, 2024

Mingun commented Sep 3, 2024

RReverser commented Sep 3, 2024

RReverser Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Mingun Sep 3, 2024

Choose a reason for hiding this comment

Mingun Sep 3, 2024

Choose a reason for hiding this comment

RReverser Sep 3, 2024

Choose a reason for hiding this comment

Mingun Sep 3, 2024

Choose a reason for hiding this comment

RReverser Sep 3, 2024

Choose a reason for hiding this comment

Mingun Sep 3, 2024

Choose a reason for hiding this comment

Mingun commented Sep 15, 2024

Mingun commented Oct 6, 2024

oli-obk commented Oct 6, 2024

Mingun commented Nov 3, 2020 •

edited

Loading

Mingun commented Mar 6, 2021 •

edited

Loading

RReverser Sep 3, 2024 •

edited

Loading