The humble comma. In haskell it does many jobs, for values, types, and
constraints. The category theorists will tell you that the simple type (a,b)
is a product, justifying the metaphor by if type a
has n
values, and type
b
has m
values, then type (a,b)
has exactly n * m
values.
The same goes for constraints. The constraint (c,d)
is also a product, but since
we think of constraints in more binary terms (they either hold or they don't), we can
also say that it's an intersection of those two constraints.
In addition to product types, Haskell also has sum types, sometimes called
tagged unions in other languages. Either a b
is the canonical example -
justifying the "sum" metaphor by if type a
has n
values, and type b
had
m
values, then type Either a b
has exactly n + m
.
Either
can be used as a building block for other sum types:
Bool
is isomorphic toEither () ()
Maybe a
is isomorphic toEither () a
Ordering
is isomorphic toEither () (Either () ())
What I'm going to do in this article is define a constraint-level equivalent
of Either
to describe the union of two constraints.
Since this is a literate haskell file, we need to specify all our language extensions and imports up front.
{-# OPTIONS_GHC -Wall -Werror -Wno-name-shadowing #-}
{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleInstances #-}
module ConstraintUnions where
import Prelude hiding (Either(..), either) -- only for didactic reasons
import Data.Typeable -- only used for one of the examples
Haskell's algebraic data types inherently support sum and product types,
so Either
is defined as a normal type:
data Either a b = Left a | Right b
Either a b
values can be consumed via pattern matching, but there's also
the handy either
function in Prelude
that can do the matching for you:
either :: (a -> r) -> (b -> r) -> Either a b -> r
either f _ (Left a) = f a
either _ g (Right b) = g b
If we swap the argument order around, we can convert Either
to its church
encoding and back again:
type ChurchEither a b = forall r. (a -> r) -> (b -> r) -> r
toChurchEither :: Either a b -> ChurchEither a b
toChurchEither e f g = either f g e
ofChurchEither :: ChurchEither a b -> Either a b
ofChurchEither c = c Left Right
We'll use this church encoding of Either
to define our constraint level
union, the ||
class.
class c || d where
resolve :: (c => r) -> (d => r) -> r
infixr 2 ||
The name may seem a little cute over something a bit more verbose like EitherC
, but
it leads to clearly understood code in context, so I like it.
In addition, using ||
borrows from the same boolean metaphor that I normally
read constraints with. It's also commonly associated with a left-to-right bias,
which will also apply to our typeclass as we'll see below.
At this point, the tempting thing is to define two instances and be done with it:
instance {-# OVERLAPPABLE #-} d => (c || d) where resolve _ r = r
instance {-# OVERLAPPING #-} c => (c || d) where resolve r _ = r
The logical statements d => (c || d)
and c => (c || d)
are fairly well
understood, and letting the latter override the former is consistent
with the common use of ||
in many programming languages from the C-family
and beyond. Why require d
if you've got c
?
GHC, however, would have none of this. When resolving instances it has no
notion of what's on the left-hand side of the =>
in the instance declarations
so the two instances it sees are for (c || d)
and (c || d)
. These aren't
merely overlapping, they're duplicates, and it has no idea how to choose
between them.
With that road closed off, it's not immediately obvious what clever trick to
use to define sensible instances for ||
, so let's table instances for a
moment, and consider how we might use the class instead.
Suppose we're feeling nostalgic for perl, and we want to compare a string and a value for equality. There's two obvious ways to define such equality:
- use
show
to convert the value to aString
, and useString
's definition of equality, - use
read
to convert theString
to a value, and use the value type's definition of equality.
Either one would work, and ||
lets us say so:
equalsString0 :: forall a. (Show a || (Eq a, Read a)) => a -> String -> Bool
equalsString0 = resolve @(Show a) @(Eq a, Read a)
(\a s -> show a == s)
(\a s -> a == read s)
GHC needed some help figuring out what instance of resolve
we needed, so we
used TypeApplications
to tell what constraints we were choosing between.
With a little light testing, equalsString0
seems to give us the results
we expect on good inputs, and not horrible error messages on bad ones:
{-$-----------------------------------------------------------------------------
>>> (5 :: Int) `equalsString0` "5"
True
>>> (5 :: Int) `equalsString0` "5.0"
False
>>> (+) @Int `equalsString0` "(+)"
...
• No instance for (Eq (Int -> Int -> Int))
arising from a use of ‘equalsString0’
...
-}
Of course, to actually call it, we had to define some instances for ||
:
instance (Show Int || d) where resolve = \r _ -> r
instance d => (Show (a -> b) || d) where resolve = \_ r -> r
Structurally, these are very similar to the duplicate instances attempted above
except that c
is fully specified here.
In fact, these two fit the boilerplate we'll be using for instances c || d
for unqualified c
. If c
always holds, we'll use the former, and if c
never holds, we'll use the latter. Note that neither specifies d
, so we
avoid needing a number of instances of ||
quadratic in the number of
constraints.
For those of you already sucking on your teeth at the word "boilerplate", I promise to return to this later.
Returning to the equalsString0
, I may have spoke too soon when it comes to
testing:
instance (Show Float || d) where resolve = \r _ -> r
{-$-----------------------------------------------------------------------------
>>> (5 :: Float) `equalsString0` "5.0"
True
>>> (5 :: Float) `equalsString0` "5"
False
-}
Float
has both a Show
instance as well as Eq
and Read
instances, but
due to the ||
instances' left-to-right bias, we used the Show
branch to
compare "5.0" and "5" in the second example above.
We can define a different function that prefers to compare values to strings:
equalsString1 :: forall a. ((Eq a, Read a) || Show a) => a -> String -> Bool
equalsString1 = resolve @(Eq a, Read a) @(Show a)
(\a s -> a == read s)
(\a s -> show a == s)
And now we get different results on Float
values than before:
instance (Eq Float || d) where resolve = \r _ -> r
instance (Read Float || d) where resolve = \r _ -> r
{-$-----------------------------------------------------------------------------
>>> (5 :: Float) `equalsString1` "5"
True
>>> (5 :: Float) `equalsString1` "5.000"
True
-}
While still getting sensible errors:
instance d => (Eq (a -> b) || d) where resolve = \_ r -> r
instance d => (Read (a -> b) || d) where resolve = \_ r -> r
{-$-----------------------------------------------------------------------------
>>> (+) @Int `equalsString1` "(+)"
...
• No instance for (Show (Int -> Int -> Int))
arising from a use of ‘equalsString1’
...
-}
Though we do need to define our first non-boilerplate instance of ||
in
order to tell GHC that we can derive the union of an intersection from the
intersection of unions:
inLeft :: forall c d r. c => (c => r) -> (d => r) -> r
inLeft r _ = r
inRight :: forall c d r. d => (c => r) -> (d => r) -> r
inRight _ r = r
instance ((c0 || d), (c1 || d)) => (c0, c1) || d where
resolve = resolve @c0 @d (resolve @c1 @d inLeft inRight) inRight where
instance ((c0 || d), ((c1,c2) || d)) => (c0, c1, c2) || d where
resolve = resolve @c0 @d (resolve @(c1,c2) @d inLeft inRight) inRight where
-- ... and so on for all reasonable intersection sizes
Returning to equalsString1
, more testing reveals that it has a new
bug that equalsString0
didn't:
instance (Eq Int || d) where resolve = \r _ -> r
instance (Read Int || d) where resolve = \r _ -> r
{-$-----------------------------------------------------------------------------
>>> (5 :: Int) `equalsString1` "5"
True
>>> (5 :: Int) `equalsString1` "5.000"
*** Exception: Prelude.read: no parse
-}
Curse of the partial read
!
Well, we can fix this. We'll just add a Show a
constraint to the left-hand-side
of the ||
. If there's a bad parse, we'll default to string comparison.
equalsString2 :: forall a. ((Eq a, Read a, Show a) || Show a) => a -> String -> Bool
equalsString2 = resolve @(Eq a, Read a, Show a) @(Show a)
(\a s -> case reads s of [(a', "")] -> a == a' ; _ -> show a == s)
(\a s -> show a == s)
And this seems to pass all our smoke tests.
{-$-----------------------------------------------------------------------------
>>> (5 :: Float) `equalsString2` "5"
True
>>> (5 :: Float) `equalsString2` "5.000"
True
>>> (+) @Int `equalsString2` "(+)"
...
• No instance for (Show (Int -> Int -> Int))
arising from a use of ‘equalsString2’
...
>>> (5 :: Int) `equalsString2` "5"
True
>>> (5 :: Int) `equalsString2` "5.000"
False
-}
And we didn't have to define any new instances. That's nice.
If we look at the constraint for equalsString2
algebraically, as sums and
products, we have:
Eq * Read * Show + Show
Which, by highschool algebra we know is:
(Eq * Read + 1) * Show
The sum type a + 1
has a more familiar name in Haskell - Maybe a
, so we
just want the constraint-level equivalent, MaybeC
:
equalsStringX :: forall a. (MaybeC (Eq a, Read a), Show a) => a -> String -> Bool
Just as we can define Maybe
in terms of Either
, we can define MaybeC
in terms
of ||
.
type MaybeC c = c || ()
GHC needs a little help to figure out that () => a
is isomorphic to a
, so
we provide a helper function that does just that:
given :: forall c r. MaybeC c => (c => r) -> r -> r
given = resolve @c @() inJust inNothing where
inJust :: forall c r. c => (c => r) -> r -> r
inJust r _ = r
inNothing :: forall c r. (c => r) -> r -> r
inNothing _ r = r
MaybeC
is pretty often used with only one typeclass, so we can inject a
little syntactical sugar for that case:
type p? a = MaybeC (p a)
(If you thought ||
was too cute a name, you're going to hate ?
)
Here's a simple idea - a function that yields a string for any value, even if
its type doesn't have a Show
instance. Obviously we want to use the Show
instance if we can, but if none is defined, we'll just print some sensible
default value.
showAny0 :: forall a. Show? a => a -> String
showAny0 = given @(Show a) show (const "_")
{-$-----------------------------------------------------------------------------
>>> showAny0 (5 :: Int)
"5"
>>> showAny0 $ (+) @Int
"_"
-}
(No new instances required! Our previous boilerplate suffices.)
That _
is rather lackluster, especially since for many types
we can use Typeable
to print the type itself, so let's add that:
showTypeOf :: forall a. Typeable a => a -> String
showTypeOf = const (show . typeRep $ Proxy @a)
showAny1 :: forall a. (Show? a, Typeable? a) => a -> String
showAny1 = given @(Show a) show
. given @(Typeable a) (showString "_ :: " . showTypeOf)
$ const "_"
instance (Typeable Int || d) where resolve = \r _ -> r
instance (Typeable a || d, Typeable b || d) => (Typeable (a -> b) || d) where
resolve = resolve @(Typeable a, Typeable b) @d
{-$-----------------------------------------------------------------------------
>>> showAny1 (5 :: Int)
"5"
>>> showAny1 $ (+) @Int
"_ :: Int -> Int -> Int"
-}
Here, the Typeable (a -> b) || d
instance deviates from the boilerplate
instances we've seen so far. This is the boilerplate for a conditional
constraint - one that depends on other constraints (the ones we've seen thus
far have been unqualified). a -> b
is Typeable
only if a
and b
are,
and this instance reflects that by reusing the "intersection of unions"
instance from above.
Nobody likes boilerplate, and that is a major drawback. But as complaints go, the monotony's only the tip of the iceberg.
Though we may have avoided being quadratic in the number of constraints by never specifying the second constraint in the boilerplate, and avoided needing to define countably infinite instances by breaking products of constraints down, to be complete, we're still linear in the number of possible constraints - not just the ones that hold.
This means, for completeness, for just the single-parameter typeclasses, we need to define one instance for every possible datatype. For multi-parameter typeclasses it's worse.
And that's even before you consider the question of where these instances live.
If they all live in the module that defines ||
, then the package containing
that module has a dependency on every other package that defines a datatype or
a typeclass.
Sigh.
We can avoid defining a quadratic number of boilerplate instances for the
~
-equality constraints by using overlapping instances:
instance {-# OVERLAPPABLE #-} d => (a ~ b) || d where resolve = \_ a -> a
instance {-# OVERLAPPING #-} (a ~ a) || d where resolve = \a _ -> a
{-$-----------------------------------------------------------------------------
>>> given @(Int ~ Char) True False
False
>>> given @(Char ~ Char) True False
True
-}
The alternative to generating the boilerplate instances by hand is to have GHC generate them for us. After all, writing code we don't want to write is what a compiler's for, and it's in a position to see which instances are actually needed, so it doesn't need to blithely implement all possible instances.
However, this is some new LANGUAGE
pragma type dreaming, and I'm still not
certain that the only boilerplate instances are the three used above:
-- constraint `C` holds unconditionally, i.e. we have `instance C where...`
instance (C || d) where resolve = \a _ -> a
-- constraint `C` holds conditionally, i.e. we have `instance C' => C where...`
instance (C' || d) => (C || d) where resolve = resolve @C' @d
-- constraint `C` holds under no conditions
instance d => (C || d) where resolve = \_ a -> a
This README.md file is a literate haskell file, for use with markdown-unlit
.
To allow GHC to recognize it, it's softlinked as ConstraintUnions.lhs
, which you can compile with
$ ghc -pgmL markdown-unlit ConstraintUnions.lhs
Many of the above examples are doctest
-compatible, and can be run with
$ doctest -pgmL markdown-unlit ConstraintUnions.lhs
Alternately, you can have cabal manage the dependencies and compile and test this with:
$ cabal install --dependencies-only --enable-tests
$ cabal build
$ cabal test