Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImmutableArray<T> struct implements IEquatable<T> interface but does not stick to the rules of properly implementing value semantics #77183

Open
oliverzick opened this issue Oct 18, 2022 · 32 comments

Comments

@oliverzick
Copy link

oliverzick commented Oct 18, 2022

Description

The ImmutableArray<T> struct implements the IEquatable<T> interface but does not stick to the rules of implementing value semantics acoording to the IEquatable interface.

When comparing two instance of a ImmutableArray<T> both instance are considered equal only when both instances refer to the same contained array instance. Instead both instances should be considered equal when the contents of the contained collections are equal.

Reproduction Steps

using System.Collections.Immutable;

var collection = Enumerable.Range(1, 5);

var instance1 = collection.ToImmutableArray();
var instance2 = collection.ToImmutableArray();

// Result = False ---> Wrong result for equality in terms of value semantics
var result = instance1.Equals(instance2); 

Console.WriteLine($"Result: {result}"); // Outputs "Result: False"

Expected behavior

using System.Collections.Immutable;

var collection = Enumerable.Range(1, 5);

var instance1 = collection.ToImmutableArray();
var instance2 = collection.ToImmutableArray();

// Result = True ---> Should be true because both instances represent a collection of integers from 1 to 5 which are equal in terms of value semantics defined by IEquatable<T> interface
var result = instance1.Equals(instance2);

Console.WriteLine($"Result: {result}"); // Outputs "Result: True"

Actual behavior

Already described in Description and Reproduction Steps sections.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

The IEquatable<T> interface should by properly implemented according to its specification to ensure value semantics of ImmutableArray<T>. This also includes desired behavior of Equals and GetHashCode methods according to interface specification.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 18, 2022
@ghost
Copy link

ghost commented Oct 18, 2022

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

The ImmutableArray<T> struct implements the IEquatable<T> interface but does not stick to the rules of implementing value semantics acoording to the IEquatable interface.

When comparing two instance of a ImmutableArray<T> both instance are considered equal only when both instances refer to the same array instance. Instead both instances should be considered equal when the contents of the contained collections are equal.

Reproduction Steps

using System.Collections.Immutable;

var collection = Enumerable.Range(1, 5);

var instance1 = collection.ToImmutableArray();
var instance2 = collection.ToImmutableArray();

// Result = False ---> Wrong result for equality in terms of value semantics
var result = instance1.Equals(instance2); 

Console.WriteLine($"Result: {result}"); // Outputs "Result: False"

Expected behavior

using System.Collections.Immutable;

var collection = Enumerable.Range(1, 5);

var instance1 = collection.ToImmutableArray();
var instance2 = collection.ToImmutableArray();

// Result = True ---> Should be true because both instances represent a collection of integers from 1 to 5 which are equal in terms of value semantics defined by IEquatable<T> interface
var result = instance1.Equals(instance2);

Console.WriteLine($"Result: {result}"); // Outputs "Result: True"

Actual behavior

Already described in Description and Reproduction Steps sections.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

The IEquatable<T> interface should by properly implemented according to its specification to ensure value semantics of ImmutableArray<T>. This also includes desired behavior of Equals and GetHashCode methods according to interface specification.

Author: oliverzick
Assignees: -
Labels:

area-System.Collections

Milestone: -

@Clockwork-Muse
Copy link
Contributor

The ImmutableArray<T> struct implements the IEquatable<T> interface but does not stick to the rules of implementing value semantics acoording to the IEquatable interface.

IEquatable<T> doesn't require types implement deep value semantics. It implies no contract other than some loosely defined "equality" that's up for interpretation by types that implement it. It's not completely uncommon for either shallow equality or ignored properties to be common in implementations in C# or other languages.

@eiriktsarpalis
Copy link
Member

And even if we wanted to implement structural equality at this point, it would most likely be a breaking change.

@RikkiGibson
Copy link
Contributor

I think the typical guidance here is to use SequenceEqual, but the behavior can be a little troubling if you want to deeply compare records which contain collections, for example. (Records use EqualityComparer.Default to compare their fields.) SharpLab

using System;
using System.Collections.Generic;
using System.Linq;

var r1 = new R1(new List<R2> { new R2(new List<int> { 1 }) });
var r2 = new R1(new List<R2> { new R2(new List<int> { 1 }) });

Console.WriteLine(r1.Equals(r2)); // False
Console.WriteLine(Enumerable.SequenceEqual(r1.Items, r2.Items)); // False

Console.WriteLine(r1.Equals(r2)); // False
Console.WriteLine(Enumerable.SequenceEqual(r1.Items[0].NestedItems, r2.Items[0].NestedItems)); // True

record R1(List<R2> Items);
record R2(List<int> NestedItems);

That said, I'm not sure the problem is really "fixable", except that if you want these kinds of equality semantics, to take care to use SequenceEqual or equivalent methods when comparing collections, and perhaps do the same when you implement IEquatable.Equals.

@Sergio0694
Copy link
Contributor

(Partially unrelated) Just wanted to say that while I agree that of course changing the behavior of IEquatable<T> here is definitely a breaking change and not doable, the current behavior is indeed extremely unfortunate when using lots of records like @RikkiGibson mentioned. This is extremely common when writing source generators (as incrementa generators require you to define models for your incremental steps), and the current situation is that the second you have even a single record parameter that's an immutable array, you just fall off a cliff and have to go all the way down to implementing equality manually for the whole record type, which is a pain. It would be nice if there was some way to improve this scenario, as it's arguably pretty common and unfortunately (1) extremely verbose and (2) extremely error prone, as if you don't know this behavior or just forget when refactoring, you'll silently break your entire incremental pipeline after that step 😅

@eiriktsarpalis
Copy link
Member

@Sergio0694 Perhaps exposing an EqualityComparer combinator could help in that regard?

public static class EqualityComparer
{
     public static IEqualityComparer<TList> CreateListComparer<TList, TElement>(IEqualityComparer<TElement> elementComparer) 
        where TList : IList<TElement> { }
}

Related to #33873 and #19644

@oliverzick
Copy link
Author

oliverzick commented Oct 19, 2022

As @RikkiGibson and @Sergio0694 already mentioned the current equatable behavior of the ImmutableArray<T> struct - whether this was intended or not - is a pitfall, unfortunately.

At least I have experienced this on heavily utilizing records (e.g. when working with DDD or object-functional techniques that involve a lot of immutability and types that come with value semantics). So I was introducing new records representing composites that own their contained collections. Then I was looking for some out-of-the-box immutable collection and the ImmutableArray<T> was the only one I could find which is implementing the IEquatable<T> interface.
After replacing a custom immutable collection type that comes with value semantics the corresponding specs and tests were broken. So I digged a bit deeper and detected that the ImmutableArray<T> was actually not what I thought it is, according to its name and implementation of the IEquatable<T> interface. It looks like @Sergio0694 came across a similar situation. 😅

@eiriktsarpalis I know that changing this behavior afterwards would be a breaking change. In my opinion it would be great if there was an out-of-the-box immutable collection or bare enumerable implementation that comes with value semantics and can be used within records.

@Sergio0694
Copy link
Contributor

Sergio0694 commented Oct 19, 2022

"it would be great if there was an out-of-the-box immutable collection or bare enumerable implementation that comes with value semantics and can be used within records."

You know what, that's... Actually a very good idea. I reckon using something like that might very well make authoring records much much simpler, as there would be no more a need to handroll custom comparers every single time 🤔

I've hacked together really quick just to play around:

Old prototype (click to expand):
public readonly struct EquatableArray<T> : IEquatable<EquatableArray<T>>
    where T : IEquatable<T>
{
    private readonly T[] array;

    public bool Equals(EquatableArray<T> array)
    {
        return AsImmutableArray().SequenceEqual(array.AsImmutableArray());
    }

    public override bool Equals([NotNullWhen(true)] object? obj)
    {
        return obj is EquatableArray<T> array && Equals(this, array);
    }

    public override int GetHashCode()
    {
        if (this.array is not T[] array)
        {
            return 0;
        }

        HashCode hashCode = default;

        foreach (T item in array)
        {
            hashCode.Add(item);
        }

        return hashCode.ToHashCode();
    }

    public ImmutableArray<T> AsImmutableArray()
    {
        return Unsafe.As<T[], ImmutableArray<T>>(ref Unsafe.AsRef(in this.array));
    }

    public static EquatableArray<T> FromImmutableArray(ImmutableArray<T> array)
    {
        EquatableArray<T> array2 = default;

        Unsafe.AsRef(in array2.array) = Unsafe.As<ImmutableArray<T>, T[]>(ref array);

        return array2;
    }

    public static implicit operator EquatableArray<T>(ImmutableArray<T> array)
    {
        return FromImmutableArray(array);
    }

    public static implicit operator ImmutableArray<T>(EquatableArray<T> array)
    {
        return array.AsImmutableArray();
    }

    public static bool operator ==(EquatableArray<T> left, EquatableArray<T> right)
    {
        return left.Equals(right);
    }

    public static bool operator !=(EquatableArray<T> left, EquatableArray<T> right)
    {
        return !left.Equals(right);
    }
}

See updated code here.

And you'd use it like so:

record MyRecord(EquatableArray<string> Names, EquatableArray<int> Numbers);

ImmutableArray<string> names = GetNames();
ImmutableArray<int> numbers = GetNumbers();

MyRecord myRecord = new(names, numbers); // Implicit conversion

names = myRecord.Names;
numbers = myRecord.Numbers; // Also implicit conversion

I might actually try this out in a branch of the MVVM Toolkit and see how this goes 😄
It would seem like this would simplify a ton of code!

UPDATE: well this was an absolute success, this is amazing 🎉

image

@GSPP
Copy link

GSPP commented Oct 20, 2022

@oliverzick @Sergio0694 I had that problem as well. I wrote myself a struct wrapper that confers value semantics to any list. Something like:

struct ValueList<T> : IList<T>
{
    IList<T> inner;
    IEqualityComparer<T> elementComparer;

    //...
}

static class ValueList
{
    public static ValueList<T> Create<T>(IList<T> inner) => ...
}

var bytes = ValueList.Create(myByteArray);

That type can ease a lot of pain, especially in the context of LINQ. The main drawback is that the ValueList type leaks into the public surface area of APIs.

@Sergio0694
Copy link
Contributor

Yup this is not something I'd recommend for public APIs. In my scenario though I was just referring to models in incremental generator steps, which are all internal anyway, so that's not an issue. @eiriktsarpalis that experiment was really a success both in the MVVM Toolkit and ComputeSharp, and I've also discovered yesterday that @jkoritzinsky has also been using an approach much like this in all the various COM interop generators as well. I wonder if this same idea could also help you simplify things in the System.Text.Json generators? Just really feels like a much simpler approach for incremental models 😄

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Oct 20, 2022
@oliverzick
Copy link
Author

@Sergio0694 It's good to see that a discussion like this one creates new ideas. And even what you have posted looks great.
Beside this in our projects we mostly require only a bare immutable enumerable to use e.g. in composites by hiding the implementation detail and only want to have an fully equatable and immutable enumerable, something like:

public sealed class ImmutableEnumerable<T> : IEnumerable<T>, IEquatable<ImmutableEnumerable<T>>
{
    // This type is owner of the contained collection; no one can change this from outside.
    // Use normal array for performance and memory footprint reasons; could also be an ImmutableArray<T> to prevent changing items by mistake once it has been created.
    private readonly T[] items;

    private ImmutableEnumerable(T[] items)
    {
        this.items = items;
    }

    public static ImmutableEnumerable<T> Create(IEnumerable<T> items)
        => new(items.ToArray());

    // ToDo: Provide implicit type conversion, if required

    public static bool operator ==(ImmutableEnumerable<T> left, ImmutableEnumerable<T> right)
        => Equals(left, right);

    public static bool operator !=(ImmutableEnumerable<T> left, ImmutableEnumerable<T> right)
        => !(left == right);

    public override int GetHashCode()
    {
        var hashCode = new HashCode();

        foreach (var item in this.items)
        {
            hashCode.Add(item);
        }

        return hashCode.ToHashCode();
    }

    public override bool Equals(object? obj)
        => this.Equals(obj as ImmutableEnumerable<T>);

    public bool Equals(ImmutableEnumerable<T>? other)
    {
        if (ReferenceEquals(null, other))
        {
            return false;
        }

        if (ReferenceEquals(this, other))
        {
            return true;
        }

        return this.items.SequenceEqual(other.items);
    }

    public IEnumerator<T> GetEnumerator()
        => this.items.AsEnumerable().GetEnumerator();

    IEnumerator IEnumerable.GetEnumerator()
        => this.GetEnumerator();
}

We are not exposing the owned collection which in turn prevents access from outside. In addition anyone can create new collection instances by using corresponding ToXxx methods - but I'm telling not something new. 😉

And even there is no restriction on the generic type parameter that enforces to implement IEquatable<T> interface, this is completely up to the used type what kind of equality it comes with.

@mrpmorris
Copy link

I came here to report the same issue.

Without deep comparison, the automatic Equals implemented by record classes doesn't tell us two records are equal, it only tells us if they are equal and they share a common ancestor that created the Immutable* collection, and that's not really very useful.

The same problem exists for all the Immutable* types.

@Sergio0694
Copy link
Contributor

Just wanted to mention that this issue keeps popping up very often whenever I see people writing source generators (myself included). I had to copy-paste my EquatableArray<T> type in over 4 different projects for the same reason, and it's just not ideal. This feels like a common enough issue for source generator authors that it'd be nice if this could either be addressed in ImmutableArray<T> itself, or maybe we should propose a separate API that could be used in these cases? I know some other generator authors (eg. @jkoritzinsky) have alternative solutions to this, but it just seels like everyone is reinventing ways to solve the same issue, so maybe just having a proper solution available for everyone would be better? 🤔

@eatdrinksleepcode
Copy link

Ran into this today, but in my case the lack of structural equality by default didn't just break the caching in the incremental pipeline. My generation function is doing a GroupBy on an element of the model that contains an ImmutableArray<ISymbol>, which results in the generator producing incorrect output. Fortunately my tests caught it, but this is definitely a pit of failure for generators (one of many, unfortunately).

@ChrML
Copy link

ChrML commented Apr 8, 2023

How about special-casing ImmutableArray in the default equality implementation of records? It's not ideal, but just throwing in the idea.

@Sergio0694
Copy link
Contributor

Sergio0694 commented Apr 8, 2023

"My generation function is doing a GroupBy on an element of the model that contains an ImmutableArray<ISymbol>, which results in the generator producing incorrect output."

@eatdrinksleepcode Just read that message — this is a side point to the rest of this conversation, but definitely do not capture symbols in your incremental models, ever. It doesn't matter whether you use a custom comparer or whatever to fix incrementality, symbols should just never flow across incremental steps because they cause compilations to be rooted across incremental runs.

@mrpmorris
Copy link

@Sergio0694 aren't they just data structures?

What is the alternative?

@Joe4evr
Copy link
Contributor

Joe4evr commented Apr 9, 2023

@mrpmorris Create your own type(s) that capture the symbol info you want and copy those over.

@eiriktsarpalis
Copy link
Member

I find that this article captures good guidance on how to write good incremental source generators.

@mrpmorris
Copy link

@Joe4evr that's what I hoped you'd say :)

@drdamour
Copy link

Yup this is not something I'd recommend for public APIs.

why?

@eiriktsarpalis
Copy link
Member

FWIW I ended up defining a bunch of equatable collection types for my own pet project source generators:

https://github.com/eiriktsarpalis/typeshape-csharp/tree/main/src/TypeShape.Roslyn/IncrementalTypes

@ThadHouse
Copy link
Contributor

+1 to fixing this. Requiring 99% of uses of source generators to use a custom type to work at all is a terrible user experience.

@tannergooding
Copy link
Member

tannergooding commented Feb 16, 2024

None of the built-in collection types do equality across the entire sequence of elements for IEquatable<T> and it has never been part of the contract of IEquatable<T> to behave as such. IEquatable<T> in no way requires a specific behavior for value types, reference types, etc. The only thing IEquatable<T> requires is in relation to Equals(object obj) and GetHashCode() where it requires some them to behave consistently across the 3 to ensure that hash sets and other collection types work as expected.

It is an explicit design decision for them to be this way, going back to .NET Framework 1.0. It is non-desirable, potentially very expensive, and may even have security concerns/implications to do element-wise equality by default. -- Security concerns can come in from potentially unbounded or unknown processing time which can lead to things like DDoS opportunities or the like. It can lead to catastrophic performance when used with other collections, dictionaries, etc.

Operations like comparing or formatting collections should do the simple thing by default, because these operations can be expected to be frequently called. You should then have separate APIs, such as SequenceEquals, when you intentionally want to do element-wise operations between collections.

This is highly unlikely to be changed, but a different API/interface might be possible if there is sufficient justification.

@RikkiGibson
Copy link
Contributor

I am interested in the possibility of 'IStructuralEquatable'. I would want for records to be able to easily use that interface for equality checks on fields, and perhaps to add automatic implementation of the interface on record types as well.

However, I also feel like I have seen relatively few requests/upvotes to address the "half-structural" equality that records have in practice. So there's a lot of other features I would want to do first.

@Grastveit
Copy link

Grastveit commented Mar 14, 2024

I thought c# had gotten inherent deep value-equality through records + immutable-collections.
In fact a custom-collection is required? After reading this thread I see why. Still, I wish for some built-in way to complete the "half-structural" equality of records.

Background: While writing equals-assertion in a test, instead of using the assertion-library's slow, 'magic' value-comparison I wanted to utilise fast, well-defined behaviour in c# records. (Better to require c# knowledge.)
To get 'free/no-code' deep comparisons the plan was to use records all the way down. At first mindlessly stumbled with collection-properties that were regular arrays. Then took the immutability and value-equality of records as a queue and tried immutable-array. Was surprised to discover it only did reference-equals.
Then reasoned:
No collection with value-comparison exist -> Have to create custom collection to override Equals.
Don't want to wrap immutable-array to keep no-allocations-benefit -> Must extend it -> Oh no, it is sealed! -> came here.

@thomhurst
Copy link

Surprised at this. Has anyone created a package for an equatable collection?

@eiriktsarpalis
Copy link
Member

Has anyone created a package for an equatable collection?

I've published this package that includes a number of immutable equatable collections that I use in my source generators. Variations of the same types are also used by generators in this repo.

@charlesroddie
Copy link

charlesroddie commented Oct 3, 2024

Why not introduce a warning when an ImmutableArray is cast to IEquatable<'T>?

There seems to be consensus that:

  • There is a pitfall because IEquatable<'T> suggests to many users that equality is structural and will therefore cause bugs if users make this assumption.
  • Changing this to structural equality, or removing the interface, would be breaking changes and are ruled out.

A warning would remove the pitfall without any breaking changes. Even if it is off by default, it would allow users to remove the danger of using this interface by mistake.

@tannergooding
Copy link
Member

tannergooding commented Oct 3, 2024

structural equality is not the same as content equality or sequence equality or whatever other terminology you want to define. That is, structural equality does not mean that it deeply compares all members using structural equality.

Rather, reference equality is the default for reference types and simply means that the reference is compared, not the fields of the type. While structural equality is the default for value types and simply means that Equals is called on each field within the type. -- Put in other words, structural equality has always defaulted to being shallow, not deep. You can have reference types (like records) which do structural equality by default, but that is not the norm and there are potential downsides and even security considerations (such as potential for DDoS) when comparing every element in a sequence.

Thus, while comparing two T[] uses reference equality by default and comparing two struct S { T[] field; } uses structural equality by default, the latter is going to functionally behave the same as the former because the struct only contains a single field that is itself a reference type. This is how .NET has always worked going back to v1.0 over 20 years ago.

This is likewise exactly how types like Span<T> work, how some theoretical ValueList<T> would be expected to work, etc. We expose a separate set of APIs such as SequenceEquals where relevant that are appropriate for doing the type of deeper equality on collection types that's being asked about in this thread. -- It might be viable to request some new ISequenceEquatable interface that allows access to such a SequenceEquals member so that these types of comparisons can be done efficiently (which is a bit more direct and explicitly than simply checking if it implements IEnumerable, which may miss some things and may not be valid for all collection types).

@charlesroddie
Copy link

charlesroddie commented Oct 3, 2024

There are cases where reference equality is useful but these are extremely small in domain modelling. It's a 0.01% case among the code that I am familiar with (which is biased to be predominantly F#, where modelling of domains is prominent). Your "not the norm" includes records and tuples but also strings. Add structural equality on lists, DUs, and record types and you have the F# world in which no one needs to bother with reference equality. The shallow vs deep issue and the content/structural/sequence distinctions go away assuming everything inside has structural equality.

Having equality (semantics) depend on whether something is a struct or a reference type (implementation details) is a horrible idea and something dotnet1.0 clearly got wrong, and something that the generic IEquatable<'T> interface (which dotnet1.0 didn't have) can help to fix. Clearly C# users are also taking IEquatable<'T> to mean semantic equality (i.e. some type of structural) and that is why these users post here that this a pitfall.

For F# and domain modelling the IEquatable<'T> interface has the potential to make equality efficient but preserving structural semantics. So it's important not to have these pitfalls.

@tannergooding
Copy link
Member

Such equality is not a viable default in many (if not most) domains due to the potential for security issues, such as DDoS style attacks. It’s something that already exists has to be explicitly considered and specially handled for strings and which causes numerous potential for issues to otherwise unassuming users.

you ultimately want to support multiple types of equality and need them to be differentiated by some form of contract or interface.

IEquatable is, by design, intended for allowing use of types with collections and sets. This in part also implies having bounded defaults so that otherwise basic operations do not present pita of failure by default.

Sequence equality is a different concept that may warrant its own separate interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests