-
-
Notifications
You must be signed in to change notification settings - Fork 420
What are Discriminated Unions?
Consider that all types in programming languages represent sets of possible values. For example, bool
is a set that has two possible values: true
and false
; uint
represents the set of natural numbers from 0
to UInt32.MaxValue
; when you use a generic parameter T
in a type, then T
represents every possible value in all types (unless its refined by a constraint clause - in fact, constraints could be seen as a 'filter' operation on the set of all values).
And so, if types are sets, then what happens if we create a union of sets?
Let's say we have the following enum:
public enum Colour
{
Red,
Green,
Blue
}
And then create a union of Colour
and bool
. Then the set of possible values are:
true
false
Red
Green
Blue
But what happens if we create a union between bool
and bool
? Well, we just get:
true
false
Because duplicate items become one in a union operation. However, if we tag each set (each type) with an additional unique value then we get what's called a Discriminated Union (also known as: tagged union, variant, variant record, choice-type, disjoint union, sum-type or co-product). The easiest way to show that would be to use a tuple, with the first element being the tag and the second element being the value of the type. So, if we try to create a union of Colour
and bool
then we get:
(1, true)
(1, false)
(2, Red)
(2, Green)
(2, Blue)
The integers being the tag in this case.
If we try to union bool
and bool
, we get:
(1, true)
(1, false)
(2, true)
(2, false)
They are discriminated by their 'tag'.
In most functional-first programming languages discriminated-unions are a first-class concept. This is an example from Haskell:
data Maybe a = Just a
| Nothing
Here Just
and Nothing
are type constructing functions that construct a Maybe a
(like Maybe<A>
in C#), but we can think of them as the tag for each set of values the various cases represent.
data Example = First Bool
| Second Bool
This simple example above would look like this in the tuple form shown earlier in this article:
(First, True)
(First, False)
(Second, True)
(Second, False)
The total number of values that are represented by a discriminated union is the sum of all the possible values in the individual cases. Which is why they are often referred to as sum types.
In C# we don't have sum-types yet. And so, the easiest way to represent them is to create a base-class or interface, and then a finite set of derived types:
public interface Maybe<A>
{
public static Maybe<A> Just(A value) => new Just<A>(value);
public static Maybe<A> Nothing() => new Nothing<A>();
}
public class Just<A> : Maybe<A>
{
public readonly A Value;
public Just(A value) =>
Value = value;
}
public class Nothing<A> : Maybe<A>
{
}
This is obviously less convenient than the terse declarations in Haskell, and so we can achieve something similar by using the LanguageExt.CodeGen
:
[Union]
public interface Maybe<A>
{
Maybe<A> Just(A value);
Maybe<A> Nothing();
}
NOTE: The big caveat here is that C# can't do completeness checking when using union-types with C# pattern-matching. However, the code-gen generates a
Match
function that does allow completeness checking.
Another way, which is slightly less powerful than the code-gen solutionm is to use C# records (available from C# 9):
public abstract record Maybe<A>();
public record Just<A>(A value) : Maybe<A>();
public record Nothing<A>() : Maybe<A>();
This is also nice and terse, but also suffers from the lack of completeness checking, but you could add your own Match
function:
public abstract record Maybe<A>()
{
public B Match<B>(Func<Just<A>, B> Just, Func<Nothing<A>, B> Nothing) =>
this switch
{
Just<A> j => Just(j),
Nothing<A> j => Nothing(j),
_ => throw new NotSupportedException()
};
}
So, what are they useful for? Well, the key idea is that they represent a finite set of states, and the mapping of that finite set to another (via pattern matching) is more powerful and easier to reason about than the abstract/extend everything approach of the OO-world. Most programming is actually working on more concrete concepts than we're lead to believe by the OOP purists, and so creating a data model that more closely represents the number of states your application can be in is a good thing.
The above is true most of the time, but the reader should familiarise her or himself with The Expression Problem