JavaScript enables iterators (producers) to receive inputs from their consumers by passing an argument to the next
call, which results in a bi-directional flow of data and control (e.g. triggering a yield break
in C# lingo, or injecting a throw
). These are also known as generators.
C# iterators are uni-directional for data from producers to consumers (yield return
) and rigidly bi-directional for control flow. A producer can throw (or finish), thus injecting control flow at the consumer site. A consumer can influence control flow in the producer, obviously to produce the next element, but also to trigger a yield break
from the producer's current location (e.g. by calling Dispose
).
Could we enable bi-directional data flow and control flow in C# iterators?
First, let's sketch what possible interfaces would look like. In all that follows, we'll assume that language syntax for consuming (foreach
) and producing (iterator methods) is not tied to interface types, but rather to the members on types classified as iteratable-like or iterator-like, so any type of these "shapes" could be made to work with such language features.
Interfaces for bi-directional synchronous iterators could look as follows:
interface IBiterable<in TInput, out TOutput>
{
IBiterator<TInput, TOutput> GetBiterator();
}
interface IBiterator<in TInput, out TOutput> : IDisposable
{
#if OPTION1
bool MoveNext(TInput input);
TOutput Current { get; }
#else
TOutput TryGetNext(TInput input, out bool success);
#endif
}
Note that uni-directional iterators can be implemented using bi-directional iterators by using a Boolean
input type that encodes a shouldContinue
flag sent from the consumer to the producer. This allows the iterator to be cleaned up by injecting a yield break
branch at the current position in the iterator, in a way similar to how Dispose
works on iterators today. Note that passing a Boolean
for this purpose is akin to passing a CancellationToken
.
Besides passing data to the producer, it is trivially possible to influence control flow in the producer by using a suitable type for TInput
. For example, one could use a Func<T>
which can send either values to the producer, or cause the producer to throw an exception upon evaluating the function. Another option is to pass a ValueOrError<T>
discriminated union type.
Interfaces for asynchronous bi-directional iterators could look as follows:
interface IAsyncBiterable<in TInput, out TOutput>
{
IAsyncBiterator<TInput, TOutput> GetAsyncBiterator();
}
interface IAsyncBiterator<in TInput, out TOutput> : IAsyncDisposable
{
#if OPTION1
Task<bool> MoveNextAsync(TInput input);
TOutput Current { get; }
#else
Task<bool> WaitForNextAsync();
TOutput TryGetNext(TInput input, out bool success);
#endif
}
Note that asynchronous iterators readily would get exception or cancellation injection by merely using Task<T>
for the TInput
type, and by having producers await
the received input, which enables injecting an exception by passing a task in the Faulted
state. Cancellation can be injected as well, by passing a task in the Canceled
state. Or, if cancellation is the only irregular control flow to be injected in the producer, one could use CancellationToken
for the the TInput
parameter type.
How would one consume a bi-directional iterable or iterator type? Let's focus on iterables, which immediately shows how iterators can be consumed. Let's also assert that the expressive power of OPTION1
suffices for the discussion; the way a consumer works is not fundamentally impacted if the alternative interface choice is made.
First, let's look at a uni-directional foreach
loop, containing break
and continue
statements to illustrate all possible branches:
foreach [await] (TOutput output in source)
{
if (b)
break;
if (c)
continue;
Use(output);
}
This translates more or less to:
using [await] (var e in source.Get[Async]Enumerator())
{
while ([await] e.MoveNext[Async]())
{
if (b)
break;
if (c)
continue;
TOutput output = e.Current;
Use(output);
}
}
For enumeration over a bi-directional iterable, we require some way to send a value (assignable to TInput
) to the producer. This is required for all continue
statements, but also upon starting the loop, which has an implicit continue
statement. This smells like definite assignment, because we need to have a TInput
value available each time we wish to push the consumer forward. Given that the consumer receives values of type TOutput
, it feels like there's an implicit in
for values received, as if we can write:
foreach (in TOutput output in source)
{
...
}
In fact, C# does not allow assignment to an iteration variable, which is treated as a readonly variable, which is what the new C# 7.2 feature of in
represents as well.
For a bi-directional iterable, data is flowing out
of the consumer into the producer, so it seems logical to use out
for this:
foreach ((in TOutput output, out TInput input) in source)
{
...
}
Now, we get definite assignment requirements at the consumer side, requiring input
to be assigned before control flow leaves some scope. Regularly, that scope is the entire method body for a method with an out
parameter. In this case, this scope is the body of the foreach
loop. However, that still brings with it a further complication as to how to provide an initial value. On way to circumvent this would be to require the output parameter to be declared outside the loop:
TInput input = default;
foreach ((in TOutput output, out input) in source)
{
...
}
That would work just fine and it would nicely work with parameters passed to the containing method, for example to flow a CancellationToken
down into the iterator. We have to assign it before entering the loop, we could set it to something else while advancing through the loop, but the callee (the producing iterator) cannot assign to it, because the direction flips from their perspective (what's out
for us, is in
for them).
One thing is unsettling about this, namely that iteration variables cannot be declared outside the foreach
loop, i.e. one can't write the following today:
TOutput output;
foreach (TOutput output in source)
{
...
}
This leaves an asymmetry in foreach
for its in
and out
parameters. One option would be to lift this restriction, but note that output
will not be definitely assigned after exiting from the foreach
loop, thus definite assignment may have to become flow-based, which is already the case for the analysis for C# 8.0 nullable reference types.
Alternatively, we can allow the declaration site of the out
parameter to have an initializer, to restore the symmetry the other way. All variables have to be declared in the foreach
loop, but assignments can be made:
foreach ((in TOutput output, out TInput input = default) in source)
{
...
}
Note this would still allow to copy in some value from another value, such as the CancellationToken
passed to the containing method. One merely ends up with another variable, which could even be desirable to contain the effects of assignment to this variable. If this route is viable, we can entertain the thought of having default values for out
parameters, which could be handy for bail-out style code:
bool TryGetValue(string s, out int x = 0)
{
if (...)
{
x = 42;
return true;
}
// don't bother me with definite assignment rules here
return false;
}
Dreaming even further, the whole method could have a default return value, because the return type is akin to an out
parameter anyway:
bool TryGetValue(string s, out int x = 0) = false
{
if (...)
{
x = 42;
return true;
}
}
If one squints, this is not too dissimilar from read-only property initializer syntax, but the default value for a return type could be constrained to the same restrictions imposed on default parameter values (i.e. constant).
With all of this, we get the following as a bi-directional loop construct, at least conceptually:
foreach ((in TOutput output, out TInput input = default) in source)
{
...
}
The scope of output
and input
are both constrained to the loop. The use of a continue
statement is obviously allowed and would not require re-assignment of the input
variable. After all, when entering the loop, input
will have been definitely assigned. However, the consumer could re-assign the value during the iteration, which seems totally fine as well, and enables bi-directional communication simply by assigning to the input, either before implicit iteration (falling from the bottom of the loop is an implicit continue
) or at every continue
site:
foreach ((in StockTick tick, out Priority priority = Priority.Normal) in stockTicks)
{
if (stock.Change < -0.05)
{
// The market is crashing, we dropped more than 5 percent. Act quickly.
...
priority = Priority.High;
}
else if (stock.Change > 0.05)
{
// The market is booming, relax and cash in.
...
priority = Priority.Low;
}
else
{
// Just another day on the market, stay alert.
...
priority = Priority.Medium;
}
}
The example shown here shows a use case that's not uncommon with cloud resources, where one wants to throttle request rates or adjust priorities (often correlated to the cloud consumption bill at the end of the month). For example, services like CosmosDB have request units (RUs), and for each continuation of a request to enumerate the results of a query, the amount of RUs spent is made available to the consumer. With a pattern like the one shown above, the consumer could not only receive these RUs but possibly influence the producer based on calculations over RUs.
This shows a dial which is different from a Boolean flag to keep going or to cancel, but one can see that just using a shouldContinue
Boolean value would readily turn break
into continue
after setting shouldContinue
to false
:
foreach ((in int x, out bool shouldContinue = true) in xs)
{
// I could do this...
if (x < 0)
{
break;
}
// ...or this...
if (x < 0)
{
shouldContinue = false;
continue;
}
// ...or even this...
shouldContinue = x >= 0;
...
}
Note that the last form of breaking is almost akin to pushing a loop condition down to the producer.
One possible (syntactic) ambiguity here is with tuple syntax and loop variables allowing for decomposition syntax. A thought could be to detect in
and out
modifiers to disambiguate. Also, if the iterable produces tuples, decomposition would still work, but just require more parentheses:
// Unidirectional
foreach ((int x, int y) in points)
{
...
}
// Bidirectional
foreach await ((in (int x, int y), out (int w, int h) resolution = (1024, 768)) in renderer)
{
...
}
The second example shows an image rendering iterable that allows for dynamic re-adjustment of the resolution being rendered, using a tuple that's passed in. If resolution changes during iteration, the producer can adjust its logic by returning more or less points than before. The consumer could decide to dial up a notch if the enumeration of pixels is going fast, effectively getting to progressive rendering. I've thrown in await
in the example above, just to show the pattern should extend to asynchronous iterables as well, which makes sense for such an example where rendering a scene could be computationally intensive (or involve I/O to specialized hardware or a cloud service). I'll leave it to the reader to trivially extend the example to accept a cancellation token as well.
What could the lowered form of such a foreach
loop look like?
foreach [await] ((in TOutput x, out TInput y = z) in xs)
{
if (b)
break;
if (c)
continue;
Use(x);
}
Can be lowered into the following form:
{
TInput y = z;
using [await] (var e in xs.Get[Async]Biterator())
{
while ([await] e.MoveNext[Async](y))
{
TOutput x = e.Current;
if (b)
break;
if (c)
continue;
Use(x);
}
}
}
Note that out
parameters can't be captured in anonymous methods or lambda expressions, so the issue around the scope where the iteration variable is introduced (previuously requiring a breaking change to C#) does not apply here. The out
parameter used by the iteration can be declared outside the loop.
In all of the above, an alternative to out
could be ref
but it feels strange the producer could then assign to the value as well, in addition to the already existing bi-directional flow enabled by the in
and out
pair.
On the production side, iterator methods can be used, but these now require a means to accept input that's passed in from the consumer. This could be done by making yield return
an expression rather than a statement.
[async] I[Async]Biterable<TInput, TOutput> Iterator<TInput, TOutput>(...)
{
...
TInput y = yield return x;
...
}
However, this poses the same challenge as for the consumer side with foreach
. How does it get the value required for the initial iteration? In the sample above, the body of the iterator does not have any means to get a value for y
without calling yield return
. This follows the JavaScript design which seems unsettling.
Another approach could be to make value
a contextual keyword (i.e. similar to setters for properties and indexers) and have the value come in that way, while retaining yield return
as a statement. By reading value
at the start and after every yield return
, the producer can receive the values received from the consumer.
[async] I[Async]Biterable<TInput, TOutput> Iterator<TInput, TOutput>(...)
{
TInput y0 = value;
...
yield return x;
TInput y1 = value;
...
}
Another keyword could be used of course. A concrete example would be to flow cancellation into an async iterator by using a bi-directional asynchronous iterator:
async IAsyncBiterable<CancellationToken, TOutput> Iterator<TOutput>(...)
{
if (value.IsCancellationRequested)
{
yield break;
}
...
yield return x;
value.ThrowIfCancellationRequested();
...
}
Obviously, the return type could have a generic arity of 1 and be a "cancellable async iterable" type, which would be a framework notion rather than a language construct, thus retaining the language's inawareness of cancellation constructs:
interface IAsyncEnumerable[WithCancellation]<out T>
{
IAsyncEnumerator<T> GetAsyncEnumerator();
}
interface IAsyncEnumerator[WithCancellation]<out T> : IAsyncDisposable
{
Task<bool> MoveNextAsync([CancellationToken token = default]);
T Current { get; }
}
Note that an optional cancellation token could be provided on a single such interface. When writing an iterator, that'd simply result in the producer observing a default
value for a cancellation token through the value
variable. On the consumer side, one could relate optional input parameters (on MoveNext[Async]
) to an optional output parameter (the out
parameter in the variable declaration list for foreach [await]
), so the consumer doesn't have to write it either.
For example, let's assume an async enumerator with fine-grained cancellation, doing away with a specialized interface to support this:
interface IAsyncEnumerator<out T> : IAsyncDisposable
{
Task<bool> MoveNextAsync(CancellationToken token = default);
T Current { get; }
}
An iterator could be written as follows:
async IAsyncEnumerable<int> GetNumbersAsync()
{
value.ThrowIfCancellationRequested();
await Task.Delay(1000);
yield return 42;
}
The type of value
is derived from the single (optional) parameter type on MoveNextAsync
. An iterator-like type would have at most one parameter for its MoveNext[Async]
method. If it has no parameter, it's a one-directional iterator, and value
has no special meaning. If it has a single parameter, value
represents this parameter.
On the consuming side, one can write the following:
foreach await ((in int x, out CancellationToken token = ...) in GetNumbersAsync())
{
...
}
but one can also omit the out
parameter altogether, because the MoveNextAsync
method has declared the parameter to be optional:
foreach await (int x in GetNumbersAsync())
{
...
}
The iterator will always see default(CancellationToken)
on its value
variable.
All of this enables to dial in to the degree of cancellation involvement one likes. If you don't want to see any of it, and simply perform cancellation out of band (e.g. inside the loop body or by sending it to sources upon initiating the iteration), you can simply forget about it. But if one wants cancellation (or some other concern) to be deeply wired between consumer and producer, one can opt in to it by using an out
parameter on the consumer side.
An alternative to using value
(or some other keyword, contextual or otherwise) would be to introduce an iterator
method that offers a declaration site for these input parameters, that may not be passed to the parent iterable (if any) because they are scoped to a single consumer:
IAsyncEnumerable<int> GetNumbersAsync(int min, int max)
{
if (min > max)
{
throw new ArgumentOutOfRangeException(nameof(min));
}
return Iterator();
async iterator IAsyncEnumerable<int> Iterator(in CancellationToken token)
{
value.ThrowIfCancellationRequested();
await Task.Delay(1000);
yield return 42;
}
}
Note that the use of local functions is already common for iterators, in order to perform eager argument validation rather than seeing these effects deferred until iteration time. With the code above, it feels very natural that the types that occur in the foreach
variable declaration list are CancellationToken
and int
, but with the in
and out
positions flipped:
var xs = GetNumbersAsync(0, 10);
foreach await ((in int x, out CancellationToken token = ...) in xs)
{
...
}
That is, iterator
and foreach
are each other's mirror images, and all typing is inferred from the MoveNext[Async]
method and the Current
property:
class SomeIterator
{
bool MoveNext(/* in */ Foo foo);
/* out */ Bar Current { get; }
}
transposes to the corresponding consumer site declaration:
foreach ((in Bar bar, out Foo foo = ...) in someIterator)
{
...
}
This would even be more direct when using the alternative TryGetNext
method, where the order of output and input would be the same as on the consumer side. Allowing some relaxation of syntax to drive this point home:
class SomeIterator
{
out Bar TryGetNext(
in Foo foo , out bool success);
}
foreach ((in Bar bar, out Foo foo = ...) in someIterator) { ... }
As illustrated earlier, this bi-directional pattern enables all sorts of communication from the consumer to the producer. Examples are plenty. Let's just entertain the thought of using a Task
as an input type to an asynchronous iterator. That alone would enable injection of cancellation but also injection of failure using an exception type.
foreach ((in int x, out Task t = Task.CompletedTask) x in xs)
{
if (b)
{
// Hey, stop (if you please)!
t = Task.FromException(new Exception("Stop!"));
continue;
}
if (c)
{
// Hey, cancel (I can't give you my token, but can wrap it)!
t = Task.FromCanceled(token);
continue;
}
if (d)
{
// Hey, take a break (for a while)!
t = Task.FromDelay(1000);
continue;
}
...
}
The last example of injecting a delay seems like something the caller could do just as well. However, nobody is stating how such a received task is to be used. That's solely a contract between a producer and a consumer that happen to share the iterable and iterator pattern amongst them. For example, a bi-directional asynchronous iterator accepting tasks could use them to WhenAny
some asynchronous activity with the task the consumer passed in. That would allow such a producer/consumer pair to have a contract of liveness where the consumer dictates how long it is willing to wait for the outcome of the next operation, while being able to dynamically change this over time.
Other real-world examples include passing compute credit from consumer to producer (e.g. for a cloud service being invoked), injection of timeouts, refreshing of tokens used in long-running computations, etc. With some imagination, one can see generalized co-routine style programming as well, and all sorts of channels.
LINQ query operators for asynchronous iterable sequences (using "iterable" to refer to a bigger class of types that work with foreach await
and iterators, including "enumerable" sequences) could naturally support deep cancellation this way, which would help with deep cancellation on a per-iteration basis, without requiring injection of cancellation at the source or at each callback (e.g. a predicate or selector function). Cancellation can be wired by query operators to their sources, but could also be observed in possible run-away computations that are eager (e.g. OrderBy
) or bridge with non-cooperative sources (e.g. SelectMany
with synchronous inner sequences).
An example is shown below, assuming the value
contextual keyword providing access to the cancellation token, as shown earlier (and assuming that MoveNextAsync
accepts such a parameter):
public static IAsyncEnumerable<R> SelectMany<T, R>(this IAsyncEnumerable<T> source,
Func<T, IEnumerable<R>> selector)
{
if (source == null)
throw new ArgumentNullExcecption(nameof(source));
if (selector == null)
throw new ArgumentNullExcecption(nameof(selector));
return Iterator();
async IAsyncEnumerable<R> Iterator()
{
value.ThrowIfCancellationRequested();
foreach await ((in T outer, out CancellationToken token = value) in source)
{
foreach (R inner in selector(outer))
{
value.ThrowIfCancellationRequested(); // or could use token
yield return inner;
}
}
}
}
Note we can pass the cancellation token down to the enumeration over the outer source, which may support cancellation based on inductive reasoning (the type says it supports it, and a concrete iterator might do it). However, the inner sequence does not support cancellation, so we can inject it ourselves here.
Eager query operators such as OrderBy
(with potentially any number of ThenBy
n-ary orderings applied on top) could benefit greatly from this because both the source enumeration and the sorting logic could be interrupted for cancellation.