Releases: louthy/language-ext
Breaking change: Case matching
The Case
feature of the collection and union-types has changed, previously it would wrap up the state of the collection or union type into something that could be pattern-matched with C#'s new switch
expression. i.e.
var result = option.Case switch
{
SomeCase<A> (var x) => x,
NoneCase<A> _ => 0
}
The case wrappers have now gone, and the raw underlying value is returned:
var result = option.Case switch
{
int x => x,
_ => 0
};
The first form has an allocation overhead, because the case-types, like SomeCase
needed allocating each time. The new version has an allocation overhead only for value-types, as they are boxed. The classic way of matching, with Match(Some: x => ..., None: () => ...)
also has to allocate the lambdas, so there's a potential saving here by using this form of matching.
This also plays nice with the is
expression:
var result = option.Case is string name
? $"Hello, {name}"
: "Hello stranger";
There are a couple of downsides, but but I think they're worth it:
object
is the top-type for all types in C#, so you won't get compiler errors if you match with something completely incompatible with the bound value- For types like
Either
you lose the discriminator ofLeft
andRight
, and so if both cases are the same type, it's impossible to discriminate. If you need this, then the classicMatch
method should be used.
Collection types all have 3 case states:
- Empty - will return
null
Count == 1
will returnA
Count > 1
will return(A Head, Seq<A> Tail)
For example:
static int Sum(Seq<int> values) =>
values.Case switch
{
null => 0,
int x => x,
(int x, Seq<int> xs) => x + Sum(xs),
};
NOTE: The tail of all collection types becomes
Seq<A>
, this is becauseSeq
is much more efficient at walking collections, and so all collection types are wrapped in a lazy-Seq. Without this, the tail would be rebuilt (reallocated) on every match; for recursive functions like the one above, that would be very expensive.
Massive improvements to Traverse and Sequence
An ongoing thorn in my side has been the behaviour of Traverse
and Sequence
for certain pairs of monadic types (when nested). These issues document some of the problems:
The Traverse
and Sequence
functions were previously auto-generated by a T4 template, because for 25 monads that's 25 * 25 * 2 = 1250
functions to write. In practice it's a bit less than that, because not all nested monads should have a Traverse
and Sequence
function, but it is in the many hundreds of functions.
Because the same issue kept popping up I decided to bite the bullet and write them all by hand. This has a number of benefits:
- The odd rules of various monads when paired can have bespoke code that makes more sense than any auto-generated T4 template could ever build. This fixes the bugs that keep being reported and removes the surprising nature of
Traverse
andSequence
working most of the time, but not in all cases. - I'm able to hand-optimise each function based on what's most efficient for the monad pairing. This is especially powerful for working with
Traverse
andSequence
on list/sequence types. The generic T4 code-gen had to create singleton sequences and the concat them, which was super inefficient and could cause stack overflows. Often now I can pre-allocate an array and use a much faster imperative implementation with sequential memory access. Where possible I've tried to avoid nesting lambdas, again in the quest for performance but also to reduce the amount of GC objects created. I expect a major performance boost from these changes. - The lazy stream types
Seq
andIEnumerable
when paired withasync
types likeTask
,OptionAsync
, etc. can now have bespoke behaviour to better handle the concurrency requirements (These types now haveTraverseSerial
andSequenceSerial
which process tasks in a sequence one-at-a-time, andTraverseParallel
andSequenceParallel
which processes tasks in a sequence concurrently with a window of running tasks - that means it's possible to stop theTraverse
orSequence
operation from thrashing the scheduler.
Help
Those are all lovely things, but the problem with writing several hundred functions manually is that there's gonna be bugs in there, especially as I've implemented them in the most imperative way I can to get the max performance out of them.
I have just spent the past three days writing these functions, and frankly, it was pretty soul destroying experience - the idea of writing several thousand unit tests fills me with dread; and so if any of you lovely people would like to jump in and help build some unit tests then I would be eternally grateful.
Sharing the load on this one would make sense. If you've never contributed to an open-source project before then this is a really good place to start!
I have...
- Released the updates in
3.4.14-beta
- so if you have unit tests that useTraverse
andSequence
then any feedback on the stability of your tests would be really helpful. - Created a github project for managing the cards of each file that needs unit tests. It's the first time using this, so not sure of its capabilities yet, but it would be great to assign a card to someone so work doesn't end up being doubled up.
- The code is in the hand-written-traverse branch.
- The folder with all the functions is transformers/traverse
Things to know
Traverse
andSequence
take a nested monadic type of the formMonadOuter<MonadInner<A>>
and flips it so the result isMonadInner<MonadOuter<A>>
- If the outer-type is in a fail state then usually the inner value's fail state is returned. i.e.
Try<Option<A>>
would returnOption<Try<A>>.None
if the outerTry
was in aFail
state. - If the inner-type is in a fail state then usually that short-cuts any operation. For example
Seq<Option<A>>
would return anOption<Seq<A>>.None
if any of theOptions
in theSeq
wereNone
. - Where possible I've tried to rescue a fail value where the old system returned
Bottom
. For example:Either<Error, Try<A>>
. The new system now knows that the language-extError
type contains anException
and can therefore be used when constructingTry<Either<Error, A>>
- All async pairings are eagerly consumed, even when using
Seq
orIEnumerable
.Seq
andIEnumerable
do have windows for throttling the consumption though. Option
combined with other types that have an error value (likeOption<Try<A>>
,Option<Either<L, R>>
, etc.) will putNone
into the resulting type (Try<Option<A>>(None)
,Either<L, Option<A>>(None)
if the outer type isNone
- this is because there is no error value to construct anException
orL
value - and so the only option is to either returnBottom
or a success value withNone
in it, which I think is slightly more useful. This behaviour is different from the old system. This decision is up for debate, and I'm happy to have it - the choices are: remove the pairing altogether (so there is noTraverse
orSequence
for those types) or returnNone
as described above
Obviously, it helps if you understand this code, what it does and how it should work. I'll make some initial tests over the next few days as guidance.
Free monads come to C#
Free monads allow the programmer to take a functor and turn it into a monad for free.
The [Free]
code-gen attribute provides this functionality in C#.
Below, is a the classic example of a Maybe
type (also known as Option
, here we're using the Haskell naming parlance to avoid confusion with the language-ext type).
[Free]
public interface Maybe<A>
{
[Pure] A Just(A value);
[Pure] A Nothing();
public static Maybe<B> Map<B>(Maybe<A> ma, Func<A, B> f) => ma switch
{
Just<A>(var x) => Maybe.Just(f(x)),
_ => Maybe.Nothing<B>()
};
}
Click here to see the generated code
The Maybe<A>
type can then be used as a monad:
var ma = Maybe.Just(10);
var mb = Maybe.Just(20);
var mn = Maybe.Nothing<int>();
var r1 = from a in ma
from b in mb
select a + b; // Just(30)
var r2 = from a in ma
from b in mb
from _ in mn
select a + b; // Nothing
And so, in 11 lines of code, we have created a Maybe
monad that captures the short-cutting behaviour of Nothing
.
But, actually, it's possible to do this in fewer lines of code:
[Free]
public interface Maybe<A>
{
[Pure] A Just(A value);
[Pure] A Nothing();
}
If you don't need to capture bespoke rules in the Map
function, the code-gen will build it for you.
A monad, a functor, and a discriminated union in 6 lines of code. Nice.
As with the discriminated-unions, [Free]
types allow for deconstructing the values when pattern-maching:
var txt = ma switch
{
Just<int> (var x) => $"Value is {x}",
_ => "No value"
};
The type 'behind' a free monad (in Haskell or Scala for example) usually has one of two cases:
Pure
Free
Pure
is what we've used so far, and that's why Just
and Nothing
had the Pure
attribute before them:
[Pure] A Just(A value);
[Pure] A Nothing();
They can be considered terminal values. i.e. just raw data, nothing else. The code generated works in exactly the same way as the common types in language-ext, like Option
, Either
, etc. However, if the [Pure]
attribute is left off the method-declaration then we gain an extra field in the generated case type: Next
.
Next
is a Func<*, M<A>>
- the *
will be the return type of the method-declaration.
For example:
[Free]
public interface FreeIO<T>
{
[Pure] T Pure(T value);
[Pure] T Fail(Error error);
string ReadAllText(string path);
Unit WriteAllText(string path, string text);
}
Click here to see the generated code
If we look at the generated code for the ReadAllText
case (which doesn't have a [Pure]
attribute), then we see that the return type of string
has now been injected into this additional Next
function which is provided as the last argument.
public sealed class ReadAllText<T> : FreeIO<T>, System.IEquata...
{
public readonly string Path;
public readonly System.Func<string, FreeIO<T>> Next;
public ReadAllText(string Path, System.Func<string, FreeIO<T>> Next)
{
this.Path = Path;
this.Next = Next;
}
Why is all this important? Well, it allows for actions to be chained together into a continuations style structure. This is useful for building a sequence of actions, very handy for building DSLs.
var dsl = new ReadAllText<Unit>("I:\\temp\\test.txt",
txt => new WriteAllText<Unit>("I:\\temp\\test2.txt", txt,
_ => new Pure<Unit>(unit)));
You should be able to see now why the [Pure]
types are terminal values. They are used at the end of the chain of continuations to signify a result.
But that's all quite ugly, so we can leverage the monadic aspect of the type:
var dsl = from t in FreeIO.ReadAllText("I:\\temp\\test.txt")
from _ in FreeIO.WriteAllText("I:\\temp\\test2.txt", t)
select unit;
The continuation itself doesn't do anything, it's just a pure data-structure representing the actions of the DSL. And so, we need an interpreter to run it (which you write). This is a simple example:
public static Either<Error, A> Interpret<A>(FreeIO<A> ma) => ma switch
{
Pure<A> (var value) => value,
Fail<A> (var error) => error,
ReadAllText<A> (var path, var next) => Interpret(next(Read(path))),
WriteAllText<A> (var path, var text, var next) => Interpret(next(Write(path, text))),
};
static string Read(string path) =>
File.ReadAllText(path);
static Unit Write(string path, string text)
{
File.WriteAllText(path, text);
return unit;
}
We can then run it by passing it the FreeIO<A>
value:
var result = Interpret(dsl);
Notice how the result type of the interpreter is Either
. We can use any result type we like, for example we could make the interpreter asynchronous:
public static async Task<A> InterpretAsync<A>(FreeIO<A> ma) => ma switch
{
Pure<A> (var value) => value,
Fail<A> (var error) => await Task.FromException<A>(error),
ReadAllText<A> (var path, var next) => await InterpretAsync(next(await File.ReadAllTextAsync(path))),
WriteAllText<A> (var path, var text, var next) => await InterpretAsync(next(await File.WriteAllTextAsync(path, text).ToUnit())),
};
Which can be run in a similar way, but asynchronously:
var res = await InterpretAsync(dsl);
And so, the implementation of the interpreter is up to you. It can also take extra arguments so that state can be carried through the operations. In fact it's very easy to use the interpreter to bury all the messy stuff of your application (the IO, maybe some ugly state management, etc.) in one place. This then allows the code itself (that works with the free-monad) to be referentialy transparent.
Another trick is to create a mock interpreter for unit-testing code that uses IO without having to ever do real IO. The logic gets tested, which is what is often the most important aspect of unit testing, but not real IO occurs. The arguments to the interpreter can be the mocked state.
Some caveats though:
- The recursive nature of the interpreter means large operations could blow the stack. This can be dealt with using a functional co-routines/trampolining trick, but that's beyond the scope of this doc.
- Although it's the perfect abstraction for IO, it does come with some additional performance costs. Generating the DSL before interpreting it is obviously not as efficient as directly calling the IO functions.
Caveats aside, the free-monad allows for complete abstraction from side-effects, and makes all operations pure. This is incredibly powerful.
Rollback to netstandard2.0 and CodeGeneration.Roslyn 0.6.1
Unfortunately, the previous release with the latest CodeGeneration.Roslyn
build caused problems due to possible bugs in the CodeGeneration.Roslyn
plugin system. These issues only manifested in the nuget package version of the LanguageExt.CodeGen
and not in my project-to-project tests, giving a false sense of security.
After a lot of head-scratching, and attempts at making it work, it seems right to roll it back.
This also means rolling back to netstandard2.0
so that the old code-gen can work. And so, I have had to also remove the support for IAsyncEnumerable
with OptionAsync
and EitherAsync
until this is resolved.
Apologies for anyone who wasted time on the last release and who might be inconvenienced by the removal of IAsyncEnumerable
support. I tried so many different approaches and none seemed to be working.
Issues resolved:
Improvements:
- Performance improvements for
Map
andLst
- Performance improvements for all hashing of collections
Any further issues, please feel free to shout on the issues page or gitter.
Migrate to `net461` and `netstandard2.1`
NOTE: I am just investigating some issues with this release relating to the code-gen, keep an eye out for 3.4.3 tonight or tomorrow (12/Feb/2020)
In an effort to slowly get language-ext to the point where .NET Core 3 can be fully supported (with all of the benefits of new C# functionality) I have taken some baby steps towards that world:
Updated the references for CodeGeneration.Roslyn
to 0.7.5-alpha
This might seem crazy, but the CodeGeneration.Roslyn
DLL doesn't end up in your final build (if you set it up correctly), and doesn't get used live even if you do. So, if the code generates correctly at build-time, it works. Therefore, including an alpha
is low risk.
I have been testing this with my TestBed and unit-tests and working with the CodeGeneration.Roslyn
team and the alpha
seems stable.
A release of CodeGeneration.Roslyn
is apparently imminent, so, if you're not happy with this, then please wait for subsequent releases of language-ext when I've upgraded to the full CodeGeneration.Roslyn
release. I just couldn't justify the code-gen holding back the development of the rest of language-ext any more.
Updated the minimum .NET Framework and .NET Standard versions
Ecosystem | Old | New |
---|---|---|
.NET Framework | net46 |
net461 |
.NET Standard | netstandard2.0 |
netstandard2.1 |
OptionAsync<A>
and EitherAsync<A>
support IAsyncEnumerable<A>
The netstandard2.1
release supports IAsyncEnumerable<A>
for OptionAsync<A>
and EitherAsync<A>
. This is the first baby-step towards leveraging some of the newer features of C# and .NET Core.
pipe
prelude function
Allow composition of single argument functions which are then applied to the initial argument.
var split = fun((string s) => s.Split(' '));
var reverse = fun((string[] words) => words.Rev().ToArray());
var join = fun((string[] words) => string.Join(" ", words));
var r = pipe("April is the cruellest month", split, reverse, join); //"month cruellest this is April"
Added Hashable<A>
and HashableAsync<A>
type-classes
Hashable<A>
and HashableAsync<A>
provide the methods GetHashCode(A x)
and GetHashCodeAsync(A x)
. There are lots of Hashable*<A>
class-instances that provide default implementations for common types.
Updates to the [Record]
and [Union]
code-gen
The GetHashCode()
code-gen now uses Hashable*<A>
for default field hashing. Previously this looked for Eq*<A>
where the *
was the type of the field to hash, now it looks for Hashable*<A>
.
By default Equals
, CompareTo
, and GetHashCode
use:
// * == the type-name of the field/property
default(Eq*).Equals(x, y);
default(Ord*).CompareTo(x, y);
default(Hashable*).GetHashCode(x);
To provide the default structural functionality for the fields/properties. Those can now be overridden with The Eq
, Ord
, and Hashable
attributes:
[Record]
public partial struct Person
{
[Eq(typeof(EqStringOrdinalIgnoreCase))]
[Ord(typeof(OrdStringOrdinalIgnoreCase))]
[Hashable(typeof(HashableStringOrdinalIgnoreCase))]
public readonly string Forename;
[Eq(typeof(EqStringOrdinalIgnoreCase))]
[Ord(typeof(OrdStringOrdinalIgnoreCase))]
[Hashable(typeof(HashableStringOrdinalIgnoreCase))]
public readonly string Surname;
}
The code above will generate a record where the fields Forename
and Surname
are all structurally part of the equality, ordering, and hashing. However, the case of the strings is ignored, so:
{ Forename: "Paul", Surname: "Louth" } == { Forename: "paul", Surname: "louth" }
NOTE: Generic arguments aren't allowed in attributes, so this technique is limited to concrete-types only. A future system for choosing the structural behaviour of generic fields/properties is yet to be designed/defined.
Bug fixes
`Non*` attributes respected on `[Record]` and `[Union]` types
The attributes:
NonEq
- to opt out of equalityNonOrd
- to opt out of orderingNonShow
- to opt out ofToString
NonHash
- to opt out ofGetHashCode
NonSerializable
,NonSerialized
- to opt out of serialisationNonStructural == NonEq | NonOrd | NonHash
NonRecord == NonStructural | NonShow | NonSerializable
Can now be used with the [Record]
and [Union]
code-gen.
For [Union]
types you must put the attributes with the arguments:
[Union]
public abstract partial class Shape<NumA, A> where NumA : struct, Num<A>
{
public abstract Shape<NumA, A> Rectangle(A width, A length, [NonRecord] A area);
public abstract Shape<NumA, A> Circle(A radius);
public abstract Shape<NumA, A> Prism(A width, A height);
}
On the [Record]
types you put them above the fields/properties as normal:
[Record]
public partial struct Person
{
[NonOrd]
public readonly string Forename;
public readonly string Surname;
}
Both the [Union]
case-types and the [Record]
types now have a New
static function which can be used to construct a new object of the respective type. This can be useful when trying to construct types point-free.
Some minor bug fixes to Try.Filter
and manyn
in Parsec. Thanks to @bender2k14 and @StefanBertels
Important update: Fix for performance issue in `Lst<A>`
A bug had crept into the Lst<A>
type which would cause a complete rebuild of the data-structure when performing a transformation operation (like Add(x)
). This was caught whilst building benchmarks for comparisons with Seq<A>
and the .NET ImmutableList<T>
type.
The performance gets exponentially worse as more items are added to the collection, and so if you're using Lst<A>
for anything at all then it's advised that you get this update.
Luckily, there are now benchmarks in the LanguageExt.Benchmarks project that will pick up issues like these if they arise again in the future.
Collection `ToString` and various fixes
Collections ToString
All of the collection types now have a default ToString()
implementation for small list-like collections:
"[1, 2, 3, 4, 5]"
And for maps: (HashMap
and Map
):
"[(A: 1), (B: 2), (C: 3), (D: 4), (E: 5)]"
Larger collections will have CollectionFormat.MaxShortItems
and then an ellipsis followed by the number of items remaining. Unless the collection is lazy, in which case only the ellipsis will be shown:
"[1, 2, 3, 4, 5 ... 50 more]"
CollectionFormat.MaxShortItems
can be set directly if the default of 50
items in a ToString()
isn't suitable for your application.
In addition to this there's two extra methods per collection type:
string ToFullString(string separator = ", ")
This will build a string from all of the items in the collection.
string ToFullArrayString(string separator = ", ")
This will build a string from all of the items in the collection and wrap with brackets [ ]
.
Fixes
HashMap and Map equality consistency
HashMap
and Map
had inconsistent equality operators. HashMap
would compare keys and values and Map
would compare keys only. I have now unified the default equality behaviour to keys and values. This may have breaking changes for your uses of Map
.
In addition the Map
and HashMap
types now have three typed Equals
methods:
Equals(x, y)
- usesEqDefault<V>
to compare the valuesEquals<EqV>(x, y) where EqV : struct, Eq<V>
EqualsKeys(x, y) - which compares the keys only (equivalent to
Equals<EqTrue>(x, y)`
Map
has also had similar changes made to CompareTo
ordering:
CompareTo(x, y)
- usesOrdDefault<V>
to compare the valuesCompareTo<OrdV>(x, y) where OrdV : struct, Ord<V>
CompareKeysTo(x, y) - which compares the keys only (equivalent to
CompareTo<OrdTrue>(x, y)`
On top of this HashSet<A>
now has some performance improvements due to it using a new backing type of TrieSet<A>
rather than the TrieMap<A, Unit>
.
Finally, there's improvements to the Union
serialisation system for code-gen. Thanks @StefanBertels
Happy new year!
Paul
Support for C# pattern-matching
Language-ext was created before the C# pattern-matching feature existed. The default way to match within lang-ext is to use the Match(...)
methods provided for most types.
There have been requests for the struct
types to become reference-types so sub-types can represent the cases of types like Option<A>
, Either<L, R>
, etc. I don't think this is the best way forward for a number of reasons that I won't go in to here, but it would obviously be good to support the C# in-built pattern-matching.
So, now most types have a Case
property, or in the case of delegate
types like Try<A>
, or in-built BCL types like Task<T>
: a Case()
extension method.
For example, this is how to match on an Option<int>
:
var option = Some(123);
var result = option.Case switch
{
SomeCase<int>(var x) => x,
_ => 0 // None
};
Next we can try matching on an Either<string, int>
:
var either = Right<string, int>(123);
var result = either.Case switch
{
RightCase<string, int>(var r) => r,
LeftCase<string, int>(var _) => 0,
_ => 0 // Bottom
};
This is where some of the issues of C#'s pattern-matching show up, they can get quite verbose compared to calling the Match
method.
For async
types you simply have to await
the Case
:
var either = RightAsync<string, int>(123);
var result = await either.Case switch
{
RightCase<string, int>(var r) => r,
LeftCase<string, int>(var _) => 0,
_ => 0 // Bottom
};
The delegate types need to use Case()
rather than Case
:
var tryOption = TryOption<int>(123);
var result = tryOption.Case() switch
{
SuccCase<int>(var r) => r,
FailCase<int>(var _) => 0,
_ => 0 // None
};
All collection types support Case
also, they all work with the same matching system and so the cases are always the same for all collection types:
static int Sum(Seq<int> seq) =>
seq.Case switch
{
HeadCase<int>(var x) => x,
HeadTailCase<int>(var x, var xs) => x + Sum(xs),
_ => 0 // Empty
};