Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize internally tagged enums -- do not use internal buffer if tag is the first field #1922

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Nov 3, 2020

Fixes #1495

Also that change has one positive side-effect: if tag is the first field, negative effects from #1183 is eliminated, because buffering is not used

@RReverser, feel free to try to run your benchmarks against this branch

@Mingun Mingun force-pushed the optimize-internal-tagged-enums branch from 84c311d to 94e15ef Compare November 3, 2020 20:06
@Mingun Mingun changed the title Optimize internal tagged enums -- do not use internal buffer if tag is the first field Optimize internally tagged enums -- do not use internal buffer if tag is the first field Nov 3, 2020
@Mingun Mingun force-pushed the optimize-internal-tagged-enums branch from 94e15ef to 5106111 Compare February 23, 2021 18:34
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion in #1495 focuses on whether this would be worth it for the cost in compile time. The biggest thing wrong with internally tagged enums is not when the deserializer is slow but that they take significantly long to compile. Overall I would rather optimize for lowering their compile time, not on performance or features at the expense of compile time.

Would you be able to provide some measurements showing the impact of this change on the time to compile internally tagged enums?

@Mingun
Copy link
Contributor Author

Mingun commented Feb 28, 2021

I cannot agree with such a question. For me, the runtime performance is much more important thing that compile-time performance. Things are written not for the pleasure of developers, but for solving customer problems.

Would you be able to provide some measurements showing the impact of this change on the time to compile internally tagged enums?

I will try to study how to do performance measurements, but any guidance is welcome

@RReverser
Copy link

Overall I would rather optimize for lowering their compile time, not on performance or features at the expense of compile time.

That sounds odd tbh. Compile times are affecting only developers, while runtime affects every user of the library / application, which is way more impactful. Why choose compile-time over runtime perf here when we don't do that at any other levels of development (e.g. opt-level = 0 vs opt-level = 2 etc.)?

@Mingun
Copy link
Contributor Author

Mingun commented Mar 6, 2021

I've made some research and there is the results. I've created a library project with 1000 types and I've measured compilation time.

I've noticed small increasing of the compilation time, about 0.01 sec per type (or 7-30%). I think it is acceptable worth for the bigger runtime performance.

Test code and raw data

serde-perf.zip

Created library cargo project with following lib.rs content:

use serde::{Deserialize};
macro_rules! generate {
  ($(#[$counter:meta])*) => {
    $(
      const _: () = {
        #[$counter]
        #[derive(Deserialize)]
        #[serde(tag = "tag")]
        enum Node {
          Unit,
          Struct {
            name: String,
            list: Vec<Node>,
          },
          // Uncomment for "big enum" tests
          /*
          Newtype1(std::collections::HashMap<String, String>),
          Newtype2(String),
          Newtype3(u32),
          Newtype4(f32),
          Unit1,
          Unit2,
          Unit3,
          Unit4,
          Struct1 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct2 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct3 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct4 { f1: String, f2: u32, f3: bool, f4: f64 },// */
        }
      };
    )*
  };
}
// Expanded manually for "expand" tests
generate!(
  /// ...
  /// 1000 lines
  /// ...
);

Tests run with command

cargo +nightly build -Ztimings

Test PC

OS version: Windows_NT x64 10.0.18363
CPUs: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 x 1992)

Summary table

Small enum (2 variants), 1000 types

Derived both Serialize and Deserialize, types generated with generate! macro

Deserialize + Serialize master (c261015) (100%) PR (a169bee) Diff
all 47.37 = (46.96 + 47.10 + 48.06)/3 58.67 = (59.27 + 59.17 + 57.56)/3 +11.29 (+29%)
codegen 3.28 = (46.96-43.95 + 47.10-43.92 + 48.06-44.42)/3 4.81 = (59.27-53.93 + 59.17-53.85 + 57.56-53.80)/3 +1.53 (+10%)

Derived only Deserialize, types generated with generate! macro

Deserialize master (c261015) (100%) PR (a169bee) Diff
all 39.25 = (39.16 + 39.66 + 38.94)/3 50.08 = (49.70 + 51.62 + 48.93)/3 +10.83 (+28%)
codegen 3.62 = (39.16-35.31 + 39.66-35.28 + 38.94-36.31)/3 4.53 = (49.70-46.17 + 51.62-45.29 + 48.93-45.21)/3 +0.91 (+25%)

Derived only Deserialize, types written manually

Deserialize + expanded master (c261015) (100%) PR (a169bee) Diff
all 38.76 = (38.38 + 39.67 + 38.23)/3 50.36 = (50.34 + 50.59 + 50.14)/3 +11.60 (+30%)
codegen 3.24 = (38.38-35.41 + 39.67-35.65 + 38.23-35.50)/3 5.67 = (50.34-44.34 + 50.59-44.74 + 50.14-45.02)/3 +2.42 (+75%)

Big enum (14 variants), 1000 types

Derived only Deserialize, types generated with generate! macro

Deserialize master (c261015) (100%) PR (a169bee) Diff
all 241.04 = (236.94 + 239.35 + 246.84)/3 257.01 = (257.75 + 258.76 + 254.70)/3 +16.03 (+7%)
codegen 40.90 = (236.94-197.11 + 239.35-199.61 + 246.84-203.70)/3 46.63 = (257.75-211.25 + 258.76-210.17 + 254.70-209.91)/3 +5.72 (+14%)

Derived only Deserialize, types written manually

Deserialize + expanded master (c261015) (100%) PR (a169bee) Diff
all 238.41 = (238.49 + 236.58 + 240.16)/3 254.86 = (258.93 + 254.75 + 250.90)/3 +16.45 (+7%)
codegen 39.70 = (238.49-198.46 + 236.58-199.66 + 240.16-198.02)/3 45.05 = (258.93-209.91 + 254.75-211.19 + 250.90-208.34)/3 +5.35 (+13%)

@Mingun Mingun requested a review from dtolnay March 6, 2021 17:10
@pickfire
Copy link

@Mingun What about runtime performance improvements?

@Mingun
Copy link
Contributor Author

Mingun commented Mar 26, 2021

I didn't measure it, maybe I should

@Mingun Mingun force-pushed the optimize-internal-tagged-enums branch from a169bee to da641b1 Compare March 10, 2022 16:33
@jarredholman
Copy link

Would this also fix the incorrect error messages caused by the internal buffer? #1621

@Mingun
Copy link
Contributor Author

Mingun commented Jul 26, 2022

For the optimized case yes, it should

Copy link
Contributor Author

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtolnay, @oli-obk , this PR ready for review again.

Comment on lines +1083 to +1111
fn deserialize_unit<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
// Covered by tests/test_enum_internally_tagged.rs
// newtype_unit
visitor.visit_unit()
}

fn deserialize_unit_struct<V>(
self,
_name: &'static str,
visitor: V,
) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
// Covered by tests/test_enum_internally_tagged.rs
// newtype_unit_struct
self.deserialize_unit(visitor)
}

fn deserialize_newtype_struct<V>(self, _name: &str, visitor: V) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
visitor.visit_newtype_struct(self)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, should we change behavior of SeqAccessDeserializer and MapAccessDeserializer or introduce new private deserializers? From one hand those deserializers was created for support of various serde attributes. From the other hand, technically this is breaking change because those types are public.

@Mingun Mingun force-pushed the optimize-internal-tagged-enums branch from c22590d to 8cd44cf Compare August 25, 2024 16:49
@Mingun
Copy link
Contributor Author

Mingun commented Aug 25, 2024

I realized, that the old Visitor::visit helper method actually a handwritten DeserializeSeed implementation. So derive now generates DeserializeSeed and all optimization stuff lives in the normal code now.

@Mingun
Copy link
Contributor Author

Mingun commented Aug 30, 2024

As I remember, changes in this PR depends on changes in #2445.

As I already said, if you wish, it is possible to do not touch MapAccessDeserializer and SeqAccessDeserializer, but instead introduce new private deserializers.

@dtolnay, @oli-obk, please give your opinion, should I make these changes?

@oli-obk
Copy link
Member

oli-obk commented Sep 3, 2024

I didn't measure it, maybe I should

Did this happen?

@Mingun
Copy link
Contributor Author

Mingun commented Sep 3, 2024

No. Any suggestions for the benchmark are welcome.

@RReverser
Copy link

No. Any suggestions for the benchmark are welcome.

It's pretty ancient by now, but in the original issue I referenced binast/binjs-ref@22103b9 where I did a manual implementation of this optimisation just for the types we used.

For testing, I checked out a commit right before that. binast/binjs-ref@53bd87a

The numbers before this PR:

test bench_parsing_reuse_parser       ... bench:  88,088,720 ns/iter (+/- 12,560,451)

The numbers with this PR applied:

test bench_parsing_reuse_parser       ... bench:  66,515,390 ns/iter (+/- 7,634,431)

Note that this is far from a pure JSON benchmark - it uses an external Node.js process to parse JS and produce JSON, and only then parses the output using serde-json, but in that context the -25% perf improvement is even more impressive.

It should be easy to save the JSON output and do a pure serde-json benchmark instead (in my original commit I suggested that showed 2x improvement, which seems realistic), but perhaps someone has more modern examples.

Anything touching JS AST represented as JSON (e.g. Deserialize for ESTree Program from https://swc.rs/) should work.

@oli-obk oli-obk self-assigned this Sep 3, 2024
@oli-obk oli-obk self-requested a review September 3, 2024 15:11
where
S: SeqAccess<'de>,
{
Ok(())
match tri!(seq.next_element()) {
Copy link

@RReverser RReverser Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaves quite differently from IgnoredAny.visit_map. I think the behaviour should be consistent, as in, iterate over the entire sequence and ignore its values instead of erroring out on non-empty sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that initially, but that failed other tests and in general not what you want. The unit / unit struct represented in sequence as nothing, so we need to ensure that sequence is empty. This is consistent with normal behavior where struct deserialization from a sequence expects exact number of values, and those fact that flattened unit / unit struct considered as equal to the struct without fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the unit_variant_with_unknown_fields is a test that failed if consume the whole sequence here.

// Unknown elements are not allowed in sequences
assert_de_tokens_error::<InternallyTagged>(
&[
Token::Seq { len: None },
Token::Str("Unit"), // tag
Token::I32(0),
Token::SeqEnd,
],
"invalid length 1, expected 0 elements in sequence",
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit / unit struct represented in sequence as nothing

Hm but "nothing" should be pretty different conceptually from "ignored any". I'd expect a custom check just for the nothing case, whereas ignored any should be able to consume anything thrown at it silently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code tries to read something, doesn't matter what. We expect an empty sequence, so if it contains some element, we fail.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I'm sleepy - I thought you're changing how IgnoredAny works everywhere. I've expanded the context of the diff and I see this is a change on this one specific visitor.

Please disregard my original comment 🤦‍♂️

Although I now wonder if visit_map should be changed to check length as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default maps in serde allows unknown keys and when unit is flattened, all keys become unknown. But you're right -- in case of #[serde(deny_unknown_fields)] we should return error if map not empty. That's idea for another PR!

@Mingun
Copy link
Contributor Author

Mingun commented Sep 15, 2024

@oli-obk, how your review progressed?

@Mingun
Copy link
Contributor Author

Mingun commented Oct 6, 2024

@dtolnay, @oli-obk, any feedback? Also for other my PRs, please.

@oli-obk
Copy link
Member

oli-obk commented Oct 6, 2024

Please refrain from pinging, it's already one of the only two PRs in my self review/assigned list

@Mingun Mingun force-pushed the optimize-internal-tagged-enums branch from 8cd44cf to 132dc81 Compare October 21, 2024 20:31
Mingun and others added 14 commits October 25, 2024 21:02
(review this commit with "ignore whitespace changes" option on)
… from sequence

failures (2):
    newtype_unit_struct
    unit_variant_with_unknown_fields
failures (1):
    unit_variant_with_unknown_fields

Fixed (1):
    newtype_unit_struct
When intermediate buffer is used, we can just ignore data, because it already was read
from the original deserializer to the buffer and check for the emptiness was performed
in another place. Now we reading directly from the original deserializer and should
ensure empty sequence by self.

Fixed (1):
    unit_variant_with_unknown_fields
…rst field

failures (3):
    newtype_newtype
    newtype_unit
    newtype_unit_struct
…ccessDeserializer

Fixed (3):
    newtype_newtype
    newtype_unit
    newtype_unit_struct
(review this commit with "ignore whitespace changes" option on)
Deserializer methods are only hints which deserializer is not obliged to follow.

Both
- TaggedContentVisitor
- InternallyTaggedUnitVisitor

accepts only visit_map and visit_seq and that is what derived implementation of
Deserialize does for structs. Therefore it is fine to call deserialize_map here,
as that already did in derived deserialize implementation. Because those structs
officially not public, it is used only by derive macro
@Mingun Mingun force-pushed the optimize-internal-tagged-enums branch from 132dc81 to 92c1d80 Compare October 25, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Internally-tagged enum representation could be more efficient
6 participants