Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[System.Text.Json] Expose a setting disallowing duplicate JSON properties #108521

Open
eiriktsarpalis opened this issue Oct 3, 2024 · 9 comments
Assignees
Labels
api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Text.Json
Milestone

Comments

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 3, 2024

Background

JsonSerializer today tolerates JSON payloads that contain duplicate properties, following a last-write-wins strategy when binding property values. Duplicate properties can be problematic from a security standpoint since they introduce ambiguity, which can be exploited in the context of JSON interoperability vulnerabilities. We should expose an option that prevents duplicate properties from being accepted.

API Proposal

namespace System.Text.Json.Serialization;

public enum JsonDuplicatePropertyHandling
{
    LastWriteWins = 0, // the current default
    FirstWriteWins = 1,
    Error = 2
}

namespace System.Text.Json;

public partial class JsonSerializerOptions
{
    public JsonDuplicatePropertyHandling DuplicatePropertyHandling { get; set; } = JsonDuplicatePropertyHandling.LastWriteWins;
}

public partial class JsonSourceGenerationsOptionsAttribute
{
    public JsonDuplicatePropertyHandling DuplicatePropertyHandling { get; set; } = JsonDuplicatePropertyHandling.LastWriteWins;
}

API Usage

string json = """{ "Value": 1, "Value": -1 }""";
JsonSerializer.Deserialize<MyPoco>(json).Value; // -1

JsonSerializerOptions options = new () { DuplicatePropertyHandling = JsonDuplicatePropertyHandling.FirstWriteWins }
JsonSerializer.Deserialize<MyPoco>(json).Value; // 1

JsonSerializerOptions options = new () { DuplicatePropertyHandling = JsonDuplicatePropertyHandling.Error }
JsonSerializer.Deserialize<MyPoco>(json).Value; // JsonException

record MyPoco(int Value);

Additional Notes

The option should extend to JsonObject but is not applicable to JsonDocument which stores the full JSON payload. We might still be able to enforce lack of duplication which could be expressed as a boolean property on JsonDocumentOptions:

namespace System.Text.Json;

public partial struct JsonDocumentOptions
{
    public bool AllowDuplicateProperties { get; set; } = true;
}

cc @GrabYourPitchforks @JeffreyRichter

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Oct 3, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Oct 3, 2024
@eiriktsarpalis eiriktsarpalis added this to the 10.0.0 milestone Oct 3, 2024
@eiriktsarpalis eiriktsarpalis self-assigned this Oct 3, 2024
@eiriktsarpalis eiriktsarpalis added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Oct 3, 2024
@gregsdennis
Copy link
Contributor

This is surprising to me since both JsonElement and JsonNode explode when trying to access objects that even contain duplicates (even when you're not accessing the duplicate property itself).

Good change. 👍

@eiriktsarpalis
Copy link
Member Author

JsonElement should tolerate duplicates, it's JsonNode that does throw ungracefully and lazily.

@JeffreyRichter
Copy link

I don't see a need for LastWriteWins & FirstWriteWins. How would anyone decide which to use as it really depends on the JSON payload being parsed. If the current behavior is LastWriteWins then I would just keep that and not offer FirstWriteWins at all. Also, perhaps LastPropertyWins (or BottommostPropertyWins) is a better name as I'd like to think about the JSON payload and not how the deserializer works internally.

And I like your AllowDuplicateProperties proposal as this is simpler and then you can ignore the naming issue I mention above. However, usually you want the befault of something to be false and for back-compat, you might want **Disa**llowDuplicateProperties instead so that this defaults to current behavior.

@eiriktsarpalis
Copy link
Member Author

I came up with the distiction between LastWriteWins and FirstWriteWins for a few reasons:

  1. It makes it explicit what the default built-in behavior is (rather than just vaguely "allowing" duplicates)
  2. Adding FirstWriteWins unblocks the scenario where you want to be able to tolerate dupes but at the same time ensure that they are being skipped.

@JeffreyRichter
Copy link

If we were starting from scratch, we would never let dups be tolerated as the resulting behavior is unpredictable.
The ONLY reason to allow it now is for backward compatibility. Documentation could just make it clear what the default behavior is.

No one has asked for LastWriteWins functionality and never should - it is a clear indication that something is wrong somewhere.

So I, as a customer, would never know how to decide between FirstWriteWins & LastWriteWins. Picking one of these is tantamount to picking a random number seed - I'll get one value or the other but I can't predict which as the JSON payload producer can just flip the properties in the JSON payload before sending it to me to deserialize.

@gregsdennis
Copy link
Contributor

(Confirmed that element does handle as last wins.)

@eiriktsarpalis
Copy link
Member Author

If we were starting from scratch, we would never let dups be tolerated as the resulting behavior is unpredictable.
The ONLY reason to allow it now is for backward compatibility. Documentation could just make it clear what the default behavior is.

If we were starting from scratch, duplicate properties would be rejected by default. Aside from backward compatibility concerns however, there is still a case to be made about supporting reads from data sources that aren't perfect and are known to contain duplicates. In that context, a FirstWriteWins behaviour makes more sense from a performance perspective -- you deserialize the first occurrence and skip all other $n -1$ occurrences. This can be important assuming you're implementing non-trivial deserialization logics (e.g. side effects, reference preservation etc.).

@JeffreyRichter
Copy link

You could convince me that I have code that wants to see all values sharing a single property name and to accomplish this, I'd need to parse the json manually or maybe serialize each property to all array of values.

But you will not convince me that randomly selecting a single value from a group and returning that allows me to write code whose behavior I have confidence in and is predictable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

3 participants