Skip to content

andy-morris/hashcons

Repository files navigation

hashcons: Hash-consing and memoisation for Haskell

This library provides hash-consing (a.k.a. interning) and memoisation, with a mostly-clean interface. This library does all the dirty tricks so you don't have to.

Quick start

  1. Make instances of HashCons (and its superclasses Eq and Hashable) for your types. Be aware that HashCons instances can't have type variables or contexts.
  2. Wrap types which might have large values (Text, ASTs, etc.) in HC.
  3. Use memo (memo2, memo3, memo4) to memoise functions.

Tutorial

Imagine you have some recursive datatype like this one:

type Id = Text

data Expr =
    Var !Id
  | App !Expr !Expr
  | Lam !Id   !Expr
  deriving (Eq, Generic, Hashable)

Clearly, checking equality of an expression might require traversing the whole tree. We may also have duplicates of large data structures taking up lots of memory.

Hash-consing

We can solve both of these problems at once by storing all Expr values in a global table and tagging them, so that equality of tags coincides with equality of the values. This is what the HC type does. Using HC requires the inner type to be an instance of HashCons, which in turn requires Eq and Hashable

{-# LANGUAGE DeriveAnyClass, DeriveGeneric, PatternSynonyms, ViewPatterns #-}

import Data.HashCons (HashCons, HC, hc, getVal)

type Id = HC Text

type Expr = HC Expr'

data Expr' =
    Var' !Id
  | App' !Expr !Expr
  | Lam' !Id   !Expr
  deriving (Eq, Generic, Hashable, HashCons)

{-# COMPLETE Var, App, Lam #-}
pattern Var :: Id -> Expr
pattern Var x = Var' (HC x)
-- and similarly for App & Lam

Equality of Expr (i.e., HC Expr') now amounts to checking the equality of two Ints, and likewise for hashing. Notice also that the subterms of an Expr' are individually hash-consed, so that:

  • the benefits of hash-consing are kept while traversing a term;
  • constructing an Expr only requires hashing at most three atoms—one for the constructor, and one for each field—rather than a whole tree.

Memoisation

Now you've started to use your datatype in some recursive functions:

subst :: Expr -> Id -> Expr -> Expr
subst a@(Var y)   x e = if x == y then e else a
subst   (App s t) x e = App (subst s x e) (subst t x e)
subst a@(Lam y s) x e = if x == y then a else Lam y (subst s x e)

You can also use this library to memoise functions. This keeps the results of previous function calls, so they can be reused. Memoising a function is simple:

import Data.HashCons.Memo (memo3)

subst :: Expr -> Id -> Expr -> Expr
subst = memo3 $ \a x e -> case a of
  Var y   -> if x == y then e else a
  App s t -> App (subst s x e) (subst t x e)
  Lam y s -> if x == y then a else Lam y (subst s x e)

Functions memo, memo2, up to memo4 are provided by the library. Functions with higher numbers of arguments can be memoised by chaining the existing functions.

The type of memo might look a little strange:

memo :: (MemoArg a, CanFinalize a ~ True) => (a -> b) -> a -> b

The memo table uses finalisers to prune old entries from the memo table. The type family CanFinalize, part of MemoArg, is used to ensure that the argument actually can run finalisers reliably, since most datatypes can't. (The only data type currently declared to run finalisers is HC.) If you don't want this check, and don't mind if the memo table continues to grow forever, you can use the uncheckedMemo family of functions, which doesn't care about the value of CanFinalize.

The memoN functions only check the first argument—as long as it can run finalisers, the table will be pruned.

Requested contributions

I didn't really know what instances of HashCons would be useful. If you need an instance for a type from a package this one depends on, open an issue.

About

don't use this, it doesn't work

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published