procrastination is a modular, zero-dependency library for Java 11 that provides:
- lazily evaluated, memoizing, purely functional data structures
- algebraic data types with ad hoc pattern matching over data constructors
- a comprehensive, extensible, replayable alternative to the
Stream
API - stack-safe tail-recursive anonymous functions via trampolines and fixed points
The goal is to help you write better code by making it harder to introduce bugs. To that end, this project borrows ideas from functional programming languages but strives to make them accessible. If you're comfortable with streams and optionals, then you already know enough about functional programming to be productive with this library!
Lazy evaluation means that the data structures procrastinate, doing the minimum amount of work required and putting it off for as long as possible, only computing each of their elements on demand. They can also be memoized such that each element is computed at most once, the first time it's requested, and then cached.
And because the data structures are purely functional, they are persistent. Instead of mutators, methods are provided that return a new version of a given data structure reflecting the desired changes, leaving the original intact. This is implemented efficiently via structural sharing, which is safe because the data structures are structurally immutable (elements cannot be added, removed, or replaced).
The data structures are designed to emulate algebraic data types, with basic "data constructor" static factory methods
mirrored by abstract match
instance methods simulating pattern matching. This is not a general pattern-matching
facility; there is no matching against literals, no wildcard patterns, no nested patterns. It simply makes it possible
to distinguish which data constructor was used and extract the components in a single step.
Everything else is ultimately defined in terms of the data constructors and match
methods. None of the classes hide
anything interesting. They don't have any instance fields. They each have just one constructor, declared private,
taking no arguments, with an empty body, only used by the core static factory methods. If some useful operation is
missing, anyone can define it externally as a static method just as easily as writing an instance method inside the
class.
While it is easy to define new operations on these types, it is impossible to add new cases. Since the classes only
expose static factory methods and not their constructors, they are effectively sealed types, so the match
methods
will always exhaustively cover every case.
None of the data structures allow null elements. Although they can't determine up front whether a lazy element is null,
they will always throw NullPointerException
before returning a null element to a caller.
Sequence
is an ordered collection of zero or more elements. It is recursively defined: either a sequence
is empty
, or it's constructed
from a head element and a tail sequence. Conversely, the instance
method match(BiFunction,Supplier)
pulls a sequence apart: if the sequence is non-empty, it applies the given
binary function to its head and tail and returns the result, otherwise it returns a default value produced by the given
supplier.
Because sequences are lazy, it's easy to work with very large or even infinite sequences. But beware eager operations!
For example, hashing
a sequence requires traversing the entire thing and will never return if it's infinite.
Other operations can short-circuit, like determining if a sequence contains
a given element, so for an
infinite sequence they might return or they might not. Even if a sequence is finite, eager methods like these can still
throw OutOfMemoryError
if the sequence is memoized and can't be garbage-collected.
Maybe
is a container that either holds a single element or is empty. It can be thought of as a value that
may or may not exist, or as a sequence with at most one element. It is often used to model potential failure. Maybe
is a lazy alternative to Optional
.
Either
is a container with exactly one element that can take on one of two possible values, labeled
left
and right
, which may have different types. Either
can also be used to model failure, by
convention failing on the left and succeeding on the right (mnemonically, right is correct). But unlike an empty
Maybe
, it allows information to be attached to the failure case (for example, an exception, or a string error
message). In that sense, Either
is the data-structure analogue of Java's checked exceptions.
Pair
is a 2-tuple: an ordered collection with exactly two elements, which may have different types. Pair
is
similar to Map.Entry
, but by contrast it is persistent and conceptually more general.
Sequence
offers an alternative to the Stream
API introduced in Java 8. Like streams, there
are a variety of methods to go back and forth between sequences and other representations, including collections,
arrays, and streams. Unlike streams, sequences can be replayed any number of times (although this does mean that
sequences derived from one-shot sources like iterators must be memoized). Sequences provide a much more comprehensive
API, and it's significantly easier to define new functionality for sequences.
One of the biggest goals of the Stream API was parallel processing, which is why streams were designed around
Spliterator
. So, processing a given stream in a way that isn't covered by the API means working
directly with its spliterator, and creating a new stream in a way that isn't covered by the API means implementing a
spliterator. Let's define the scanLeft
operator, which returns every intermediate result of a left fold (a left fold
iteratively applies a binary function to combine all of the elements into a single value, accumulating from left to
right). Here's a basic implementation for streams:
import java.util.Spliterator;
import java.util.Spliterators.AbstractSpliterator;
import java.util.function.BiFunction;
import java.util.function.Consumer;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
static <T, R> Stream<R> scanLeft(Stream<T> stream, R initial, BiFunction<R, T, R> function) {
var estimatedSize = Long.MAX_VALUE;
var characteristics = 0;
var spliterator = new AbstractSpliterator<R>(estimatedSize, characteristics) {
Spliterator<T> source = null;
R result = initial;
@Override
public boolean tryAdvance(Consumer<? super R> action) {
if (source == null) {
source = stream.spliterator();
} else if (!source.tryAdvance(element -> result = function.apply(result, element))) {
return false;
}
action.accept(result);
return true;
}
};
var parallel = false;
return StreamSupport.stream(spliterator, parallel);
}
It's imperative, stateful, and the control flow is a little hard to follow; not exactly pleasant. (Note that the
characteristics and estimated size of the source stream's spliterator are ignored.) On the other hand, because
sequences are recursively defined, they admit a concise, natural recursive implementation of scanLeft
that
anyone could easily write if it weren't already included:
import io.github.gdejohn.procrastination.Sequence;
import java.util.function.BiFunction;
static <T, R> Sequence<R> scanLeft(Sequence<T> sequence, R initial, BiFunction<R, T, R> function) {
return Sequence.cons(
initial,
() -> sequence.match(
(head, tail) -> scanLeft(tail, function.apply(initial, head), function),
() -> Sequence.empty()
)
);
}
Compared to the stream implementation, it's declarative, straightforward, and a lot less code; not bad! (With a little
bit of code golfing, the method body could even fit comfortably on a single line!) The trade-off is overhead from the
extra allocation and indirection, and there's no parallel processing. The priority here is developer ergonomics. If you
ever find yourself reaching for something in the Stream
API that isn't there, and you don't need parallelism, give
Sequence
a try!
Applying imperative idioms to sequences is ugly and error prone; recursive data structures call for recursive
algorithms. Unfortunately, Java isn't very recursion friendly: deep call stacks soon run into StackOverflowError
, and
tail recursion doesn't help because tail calls aren't eliminated. That wasn't a problem for scanLeft
because it's
lazy, but whenever a large number of elements must be eagerly traversed, stack overflow is waiting to pounce. Enter
trampolines.
Trampoline
transforms tail recursion into a stack-safe loop. To trampoline a tail-recursive method with
return type R
, change the return type to Trampoline<R>
, wrap the expressions returned in base cases with the static
factory method Trampoline::terminate
, suspend recursive calls in Supplier
lambda
expressions, and wrap the suspended recursive calls with the static factory method Trampoline::call
. For
example:
/** True if and only if the sequence contains at least one element that satisfies the predicate. */
static <T> Trampoline<Boolean> any(Sequence<T> sequence, Predicate<T> predicate) {
return sequence.match(
(head, tail) -> predicate.test(head) ? terminate(true) : call(() -> any(tail, predicate)),
() -> terminate(false)
);
}
Use the instance method Trampoline::evaluate
to get the result:
boolean result = any(sequence, predicate).evaluate();
Tail recursion often requires additional "accumulator" parameters, and trampolining means that the result must be unwrapped. These are irrelevant and burdensome implementation details that shouldn't be exposed to the caller, so the usual practice is to delegate to a private helper method. Alternatively, the recursive computation can be defined inline!
Functions::fix
returns the fixed point of a unary operator on functions, enabling anonymous recursion. Lambda
expressions by definition are unnamed, making explicit recursion impossible. The trick here is to abstract the
recursive call by taking the function itself as an argument and letting fix
tie the knot. For example:
Function<Integer, Integer> factorial = fix(f -> n -> n == 0 ? 1 : n * f.apply(n - 1));
And fix
is implemented like this:
static <T, R> Function<T, R> fix(UnaryOperator<Function<T, R>> function) {
return function.apply(argument -> fix(function).apply(argument));
}
This is almost, but not quite, the fabled Y combinator. Technically, combinators aren't allowed to use explicit recursion. But it doesn't need to be a combinator, it only needs to output fixed points. And it does!
fix
works on trampolined and curried functions as well. Trampoline::evaluate
isn't just an instance
method, it's also overloaded as an all-in-one static helper method that accepts a trampolined anonymous function in
unfixed form and an appropriate number of arguments, fixes the function to make it recursive, applies it to the
arguments, evaluates the resulting trampoline, and returns the unwrapped value. And to complement this pattern, the
static factory method Trampoline::call
is overloaded to accept curried functions and matching
arguments. For example:
static int length(Sequence<?> sequence) {
return Trampoline.evaluate(
sequence,
0,
f -> seq -> n -> seq.match(
(head, tail) -> call(f, tail, n + 1),
() -> terminate(n)
)
);
}
That's a stack-safe tail-recursive local helper function taking full advantage of type inference and pattern matching! You might just forget that you're writing Java!
If you use this library, I want to hear from you! Tell me what you think should be changed, what should be added, what should be removed, what needs better documentation. Come chat with me on Gitter, and check the issues to see what's already on my radar.
The examples below build from the latest commit; to use a particular version, replace master-SNAPSHOT
with a version
number. Check the releases for the available versions, links to their Javadocs, changelogs, and known issues. See
JitPack for instructions on using other build tools.
Add JitPack at the end of the repositories in your root build.gradle
:
allprojects {
repositories {
// ...
maven { url 'https://jitpack.io' }
}
}
And add the dependency:
dependencies {
implementation 'io.github.gdejohn:procrastination:master-SNAPSHOT'
}
Add JitPack to the repositories in your pom.xml
:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
And add the dependency:
<dependency>
<groupId>io.github.gdejohn</groupId>
<artifactId>procrastination</artifactId>
<version>master-SNAPSHOT</version>
</dependency>
The JShell script procrastination.jsh
makes it easy to play around with this library, assuming JDK 11 and a
recent version of Maven are installed and present on your PATH
. Just clone the repository (or download and extract),
and from its root directory run mvn compile
and jshell procrastination.jsh
. The
script adds the module to the JShell environment and imports all of the top-level types and their public static
members.