Fix incorrect representation of tuples with skipped fields #2520

Mingun · 2023-07-21T17:36:47Z

This PR fixes #2105 and give some insights about the current situation with skipped fields.

You can see what changed in the generated code by running the following commands:

place skip.txt into test_suite/tests folder and change it extension to .rs

run the following command:

...\serde\test_suite> cargo expand --all-features --test skip > skip-N.rs

where N is some number.

When commit message contains words "Changes in generated code" that means that output of the command above was changed. I recommend to diff it against the previous result and explore the difference. There will be 4 files -- baseline and three changed.

Overall changes over all commits:

Tuple1as0, Tuple1as0Default, Tuple1as0With:
- removed visit_newtype_struct
- changed *_newtype_struct -> *_tuple_struct(0)
Tuple2as1, Tuple2as1Default, Tuple2as1With:
- added visit_newtype_struct

The table below summarizes the current behavior and the new behavior, introduced in this PR. Empty cells in the last column means that the behavior wasn't changed.

The behavior columns shows which methods are used to serialize or deserialize a type. The number in parenthesis is a count of fields which passed to the corresponding method.

Code	Current behavior	Fixed behavior
Unit representations
struct Unit;	`serialize_unit_struct` `deserialize_unit_struct` `visit_unit`
Tuple representations
struct Tuple0();	`serialize_tuple_struct(0)` `deserialize_tuple_struct(0)` `visit_seq`	ℹ️ It would be better to use Unit representation (see below)
// Can be named // Newtype(u8) struct Tuple1(u8);	`serialize_newtype_struct` `deserialize_newtype_struct` `visit_newtype_struct` `visit_seq`
struct Tuple2(u8, u8);	`serialize_tuple_struct(2)` `deserialize_tuple_struct(2)` `visit_seq`
// The same as Tuple0 struct Tuple1as0( #[serde(skip)] u8, ); #2105	`serialize_newtype_struct` `deserialize_newtype_struct` `visit_newtype_struct` `visit_seq`	`serialize_tuple_struct(0)` `deserialize_tuple_struct(0)` `visit_seq` ℹ️ It would be better to use Unit representation (see below)
// The same as Tuple0 struct Tuple2as0( #[serde(skip)] u8, #[serde(skip)] u8, );	`serialize_tuple_struct(0)` `deserialize_tuple_struct(0)` `visit_seq`	ℹ️ It would be better to use Unit representation (see below)
// The same as Tuple1 struct Tuple2as1( #[serde(skip)] u8, u8, );	`serialize_tuple_struct(1)` `deserialize_tuple_struct(1)` `visit_seq`	`serialize_tuple_struct(1)` `deserialize_tuple_struct(1)` `visit_newtype_struct` `visit_seq` ℹ️ It would be better to use Newtype representation (see below)
Struct representations
struct Struct0 {}	`serialize_struct` `deserialize_struct` `visit_seq` `visit_map`
struct Struct1 { a: u8, }	`serialize_struct` `deserialize_struct` `visit_seq` `visit_map`
struct Struct2as0 { #[serde(skip)] a: u8, #[serde(skip)] b: u8, }	`serialize_struct` `deserialize_struct` `visit_seq` `visit_map`
struct Struct2as1 { #[serde(skip)] a: u8, b: u8, }	`serialize_struct` `deserialize_struct` `visit_seq` `visit_map`

Enum variant representations

The following changes also was implemented by I plan to ship them in separate PR due to some debating changes. I left my finding here to have the overall picture about skipped fields in one place.

The first debating change is to how to represent tuple(0) tuples? Should we represent them as a sequences with zero fields, or as a unit structs? Motivation for the unit structs is that that is the only way to skip all fields in a tuple and still represent it as a unit. This is currently done for Enum::Tuple1as0, but only for it. Enum::Tuple2as0 no longer follows this agreement as well as all tuple structs, which introduces more confusion. I am inclined to this option, since to have an always empty sequence is somewhat strange.

The second debating change, maybe we should make the behavior configurable? Introduce a new attribute like #[serde(tuple)] to produce tuple(0) and tuple(1) forms from TupleXas0 and TupleXas1 tuples?

Code	Current behavior	Suggested behavior
Enum::Unit representations
enum Enum { Unit, }	`serialize_unit_variant` `unit_variant`
Enum::Tuple representations
enum Enum { Tuple0(), }	`serialize_tuple_variant(0)` `tuple_variant(0)` `visit_seq`	`serialize_unit_variant` `unit_variant`
// Can be called // Enum::Newtype(u8) enum Enum { Tuple1(u8), }	`serialize_newtype_variant` `newtype_variant`
enum Enum { Tuple2(u8, u8), }	`serialize_tuple_variant(2)` `tuple_variant(2)` `visit_seq`
// The same as // Enum::Tuple0 enum Enum { Tuple1as0( #[serde(skip)] u8, ), }	`serialize_unit_variant` `unit_variant`
// The same as // Enum::Tuple0 enum Enum { Tuple2as0( #[serde(skip)] u8, #[serde(skip)] u8, ), }	`serialize_tuple_variant(0)` `tuple_variant(0)` `visit_seq`	`serialize_unit_variant` `unit_variant`
// The same as // Enum::Tuple1 enum Enum { Tuple2as1( #[serde(skip)] u8, u8, ), } #2224	`serialize_tuple_variant(1)` `tuple_variant(1)` `visit_seq`	`serialize_newtype_variant` `newtype_variant`
Enum::Struct representations
enum Enum { Struct0 {}, }	`serialize_struct_variant(0)` `struct_variant` `visit_seq` `visit_map`
enum Enum { Struct1 { a: u8, }, }	`serialize_struct_variant(1)` `struct_variant` `visit_seq` `visit_map`
enum Enum { // The same as // Enum::Struct0 Struct2as0 { #[serde(skip)] a: u8, #[serde(skip)] b: u8, }, }	`serialize_struct_variant(0)` `struct_variant` `visit_seq` `visit_map`
enum Enum { // The same as // Enum::Struct1 Struct2as1 { #[serde(skip)] a: u8, b: u8, }, }	`serialize_struct_variant(1)` `struct_variant` `visit_seq` `visit_map`

List of inconsistencies

As I said before, the current behavior contains a bunch of inconsistencies.

For example, an enum tuple variant with zero fields represented as a tuple with zero fields, but tuple variant with one skipped field represented as a unit. At the same time, tuple variant with 2 or more fields, all skipped, again represented as a tuple with zero fields. The same is applied to tuples with one effective field -- sometimes it is represented as a newtype, sometimes as a tuple with one field. I suggest to unify behavior across all cases that I summarized in the following table:

Value	Serialized form	Suggested form
Types with zero fields in serialized form
`Tuple0()`	Represented as a tuple with zero fields	unit
`Tuple1as0(Skipped)`	⚠️ Incorrectly represented as a newtype struct (bug #2105)	unit
`Tuple2as0(Skipped, Skipped)`	Represented as a tuple with zero fields	unit
Types with 1 field in serialized form
`Tuple1(x)`	Represented as a newtype `x`
`Tuple2as1(Skipped, x)`	❗ Represented as a tuple with 1 field	newtype
Enum variants with zero fields in serialized form
`Enum::Tuple0()`	Represented as a tuple with zero fields	unit
`Enum::Tuple1as0(Skipped)`	❗ Represented as a unit struct
`Enum::Tuple2as0(Skipped, Skipped)`	Represented as a tuple with zero fields	unit
Enum variants with 1 field in serialized form
`Enum::Tuple1(x)`	Represented as a newtype `x`
`Enum::Tuple2as1(Skipped, x)`	❗ Represented as a tuple with 1 field (#2224)	newtype

Legend:

⚠️ Bugs that leads to compilation errors
❗ Inconsistency in a block of similar table rows

dtolnay

Thanks for the PR and exemplary writeup.

Have you thought about how this direction relates to skip_serializing_if? For example in the following case:

#[derive(serde_derive::Serialize)]
struct Tuple(
    #[serde(skip_serializing_if = "...")]
    Option<usize>,
    #[serde(skip_serializing_if = "...")]
    Option<usize>,
    #[serde(skip_serializing_if = "...")]
    Option<usize>,
);

would you consider that the correct behavior needs to serialize using serialize_newtype_struct if exactly one of the functions returns false, and serialize_unit_struct if all true, and serialize_tuple_struct otherwise?

Mingun · 2023-07-31T05:21:34Z

It is hard to say. I never used skip_serializing_if. At first glance I think, that we should serialize that as a tuple in all cases. That behavior will also allow to keep ability to serialize tuples with all skipped fields / only one non-skipped field as tuple(0) / tuple(1) by using skip_serializing_if = "always_true", if default behavior by serializing them as units / newtypes would be undesirable.

In any case the behavior of skip should be carefully documented -- current documentation is unclear about expected results.

dtolnay · 2023-07-31T05:31:54Z

I am definitely on board with the Tuple1as0 change to fix #2105.

Not yet convinced about the Tuple2as1 change however — I think I need to revisit this later to make up my mind about that, especially if we are going to be making serde(skip) intentionally inconsistent with serde(skip_serializing_if = "always_true").

Mingun · 2023-07-31T07:28:01Z

The problem that right now this changes are coupled and decoupling them requires to replace simple and straightforward code with something more complex. This will be especially noticeable on the part that fixes enum representations (coming soon)

dtolnay

I am not interested in changing the Tuple2as1 case at this time. Individual data formats can opt to handle that by treating serialize_tuple_struct(1) the same as serialize_newtype_struct, and deserialize_tuple_struct(1) as deserialize_newtype_struct, if they prefer.

Do you see any way to separate that from the Tuple1as0 fix?

Mingun · 2023-07-31T18:53:20Z

Yes, that should be possible, just should introduce some checks and possible code duplication.

Just to clarify: you don't like changing behavior at all, or visit_newtype_struct could be added (it is side-effect from correct counting of non-skipped fields)?

Mingun · 2023-08-02T16:51:20Z

I dropped the last commit that was changing Tuple2as1 representation. Such structs, however, now supports deserialization using visit_newtype_struct method, which technically a side-effect of fixing bug and will allow to deserializers to handle deserialize_tuple_struct(1) requests as deserialize_newtype_struct, if they prefer
corrected tests accordingly
add a new commit that construct all skipped fields after all stored fields. So if construction default objects is expensive they will not be constructed until needed

dtolnay · 2023-08-07T03:55:29Z

I am not convinced about the last "this should increase performance" commit. Interleaving the skipped and non-skipped fields to construct them in the original source order still seems fine to me. Default objects which are expensive to construct seems like a very uncommon case to begin with, and the commit only makes any difference in the error case if I understand correctly, which is not something we micro-optimize like this.

Mingun · 2023-08-07T11:12:03Z

Ok, I removed that commit. I hope compiler will able to reorder operations.

Default objects which are expensive to construct seems like a very uncommon case to begin with

I think the code is generated also because to shift to generation taking into account all these boring uncommon cases and be sure that everything will implemented efficiently. Shifting this to the user seems to me bad practice, especially since it costs nothing.

Mingun · 2023-08-11T12:54:34Z

I am not convinced about the last "this should increase performance" commit. Interleaving the skipped and non-skipped fields to construct them in the original source order still seems fine to me.

Actually, this commit just unifies current behavior, because for structs deserialization using visit_map already does this:

use serde_derive::Deserialize;

#[derive(Deserialize)]
struct Skip {
    #[serde(skip_deserializing)]
    skipped: i32,
    #[serde(alias = "c")]
    aliased: i32,
}

will generate

#[inline]
fn visit_seq<__A>(
    self,
    mut __seq: __A,
) -> _serde::__private::Result<Self::Value, __A::Error>
where
    __A: _serde::de::SeqAccess<'de>,
{
    let __field0 = _serde::__private::Default::default();
    let __field1 = //...;
    _serde::__private::Ok(Skip {
        skipped: __field0,
        aliased: __field1,
    })
}
#[inline]
fn visit_map<__A>(
    self,
    mut __map: __A,
) -> _serde::__private::Result<Self::Value, __A::Error>
where
    __A: _serde::de::MapAccess<'de>,
{
    let mut __field1: _serde::__private::Option<i32> = _serde::__private::None;
    //...
    _serde::__private::Ok(Skip {
        skipped: _serde::__private::Default::default(),
        aliased: __field1,
    })
}

Mingun · 2023-08-15T10:32:17Z

@dtolnay , all your review comments was resolved, could you look again? I just want to remind you that all changes in the generated code are easy to see if you follow the instructions in the first message with comparing the generated code

Mingun · 2023-09-03T14:07:22Z

@dtolnay, yet another gentleman's reminder...

SeqRefDeserializer::deserialize_any has a special condition for empty sequence, which emits visit_unit. That condition assumes that type would be able to deserialized from unit, but: 1) struct variants was never able to deserialize from it (they expect only visit_map or visit_seq) 2) tuple variants even with zero fields expect visit_seq only. The suggestion to accept visit_unit instead was rejected in serde-rs#2520 Fixes (2): newtype_enum::tuple0 newtype_enum::empty_struct_from_seq

…tion (review this with whitespace changes ignored)

…ction and rename it

…n_place` Those functions too small and used only once (review this with whitespace changes ignored)

Changes in generated code (see the file attached to PR): Tuple1as0: Tuple1as0Default: Tuple1as0With: fixed visit_newtype_struct: use default value instead of deserializing it This fixes compilation error for Deserialize side of Tuple1as0(Skipped) tuple

Enums out of scope for now, because they have too many problems failures (1): tuple_struct::tuple2as1 compilation error (commented): tuple_struct::tuple1as0

…r to Tuple0() struct A side-effect: Tuple2as1(Skipped, x) now also can be deserialized using visit_newtype_struct Changes in generated code (see the file attached to PR): Tuple1as0: Tuple1as0Default: Tuple1as0With: removed visit_newtype_struct, *_newtype_struct -> *_tuple_struct(0) Tuple2as1: Tuple2as1Default: Tuple2as1With: added visit_newtype_struct This commit fixes compilation error and actually fixes the issue as it was reported in serde-rs#2105

Mingun force-pushed the fix-2105 branch from bbd264b to 9ea8899 Compare July 28, 2023 17:36

dtolnay reviewed Jul 31, 2023

View reviewed changes

Mingun force-pushed the fix-2105 branch from 9ea8899 to 1eac41c Compare July 31, 2023 18:30

Mingun mentioned this pull request Jul 31, 2023

Fix incorrect representation of tuple variants with skipped fields #2549

Draft

dtolnay requested changes Jul 31, 2023

View reviewed changes

Mingun force-pushed the fix-2105 branch from 1eac41c to 360b946 Compare August 2, 2023 16:39

Mingun force-pushed the fix-2105 branch from 360b946 to 53535d5 Compare August 6, 2023 16:05

Mingun requested a review from dtolnay August 6, 2023 16:08

Mingun force-pushed the fix-2105 branch from 53535d5 to edb53a2 Compare August 7, 2023 10:49

Mingun mentioned this pull request Aug 8, 2023

Disallow tuple variants with incorrect order of #[serde(default)] attributes #2561

Open

Mingun mentioned this pull request Mar 29, 2024

skip_serializing_if not works when an enum variant is a tuple and the enum has #[serde(tag = "type", content = "value")] #2721

Open

iTitus mentioned this pull request May 4, 2024

Externally tagged enum variants where all fields have #[serde(default)] should allow being written as a unit variant. #2691

Open

Mingun mentioned this pull request Jul 23, 2024

Make it possible to offer roundtrip guarantees for formats that need it #2778

Open

Mingun force-pushed the fix-2105 branch from edb53a2 to 9598ba1 Compare July 23, 2024 17:21

Mingun mentioned this pull request Aug 17, 2024

Fix deserialization of empty structs and tuples in untagged enums #2805

Merged

Mingun added 2 commits October 25, 2024 19:19

Move code that generates reading from a SeqAccess to a dedicated func…

13b56ee

…tion (review this with whitespace changes ignored)

Make deserialize_seq generic, parameterize it with data reading fun…

0e1df73

…ction and rename it

Mingun and others added 5 commits October 25, 2024 19:29

Inline deserialize_newtype_struct and `deserialize_newtype_struct_i…

fc5d8fe

…n_place` Those functions too small and used only once (review this with whitespace changes ignored)

Get rid of #deserializer_var, use more straightforward approach

8b68ced

Add tests for skipped fields in different kinds of structs

52be117

Enums out of scope for now, because they have too many problems failures (1): tuple_struct::tuple2as1 compilation error (commented): tuple_struct::tuple1as0

Mingun force-pushed the fix-2105 branch from 9598ba1 to c1f5859 Compare October 25, 2024 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect representation of tuples with skipped fields #2520

Fix incorrect representation of tuples with skipped fields #2520

Mingun commented Jul 21, 2023 •

edited

Loading

dtolnay left a comment

Mingun commented Jul 31, 2023

dtolnay commented Jul 31, 2023

Mingun commented Jul 31, 2023

dtolnay left a comment

Mingun commented Jul 31, 2023 •

edited

Loading

Mingun commented Aug 2, 2023

dtolnay commented Aug 7, 2023

Mingun commented Aug 7, 2023

Mingun commented Aug 11, 2023

Mingun commented Aug 15, 2023

Mingun commented Sep 3, 2023

Fix incorrect representation of tuples with skipped fields #2520

Are you sure you want to change the base?

Fix incorrect representation of tuples with skipped fields #2520

Conversation

Mingun commented Jul 21, 2023 • edited Loading

Enum variant representations

List of inconsistencies

dtolnay left a comment

Choose a reason for hiding this comment

Mingun commented Jul 31, 2023

dtolnay commented Jul 31, 2023

Mingun commented Jul 31, 2023

dtolnay left a comment

Choose a reason for hiding this comment

Mingun commented Jul 31, 2023 • edited Loading

Mingun commented Aug 2, 2023

dtolnay commented Aug 7, 2023

Mingun commented Aug 7, 2023

Mingun commented Aug 11, 2023

Mingun commented Aug 15, 2023

Mingun commented Sep 3, 2023

Mingun commented Jul 21, 2023 •

edited

Loading

Mingun commented Jul 31, 2023 •

edited

Loading