This talk covers a novel approach to "name binding" in syntax trees for programming languages that makes it much easier to write compilers and interpreters with a higher degree of assurance.
2. We use names in lots of contexts, but any
program that deals with names has to deal
with a number of issues such as
capture avoidance
deciding alpha equivalence
… and others that will come up as we go.
3. The dumbest thing that could possibly work:
type Name = String
data Exp
= Var Name
| Exp :@ Exp
| Lam Name Exp
Var “x”
Lam “x” (Var “x”)
Lam “x” (Lam “y” (Var “x”))
4. Blindly Substituting Lam “x” (Var “y”) into
Lam “y”( Var “z”)
for “z” would yield
Lam “y” (Lam “x” (Var “y”))
which now causes the free variable to reference
the “y” bound by the outer lambda.
5. Lam “x” (Var “x”)
and
Lam “y” (Var “y”)
both mean the same thing and it’d be nice to be
able to check this easily, make them hash the
same way for CSE, etc.
6. There is a cottage industry of solutions ot the naming
problem.
Naïve substitution
Barendregt Convention
HOAS
Weak HOAS / PHOAS
“I am not a Number: I am a Free Variable!”
Locally Nameless Syntax with de Bruijn Indices
Unbound, mixing Barendregt and Locally Nameless.
etc.
I will not be addressing all of these here, just a few.
7. Just go look for names that avoid capture.
Pros:
Pretty syntax trees
Easy to get started with
Cons:
Easy even for experts to make mistakes!
Alpha Equivalence checking is tedious.
REALLY SLOW
8. subst :: Name -> Exp -> Exp -> Exp
subst x s = sub where
sub e@(Var v)
| v == x = s
| otherwise = e
sub e@(Lam v e')
| v == x = e
| v `elem` fvs = Lam v' (sub e'’)
| otherwise = Lam v (sub e’)
where v' = newId vs
e'' = subst v (Var v') e’
sub (f :@ a) = sub f :@ sub a
fvs = freeVars s
vs = fvs `union` allVars b
newId :: [Name] -> Name
newId vs = head (someEnormousPoolOfNames vs)
– go find a name that isn’t taken!
(based on code by Lennart Augustsson)
9. Make sure that every binder binds a globally unique name.
Pros:
“Secrets of the GHC Inliner” describes ‘the Rapier’ which can make
this Fast.
Cons:
Easy even for experts to screw up
Alpha Equivalence is tedious
Need a globally unique variable supply (e.g. my concurrent-supply)
The obvious implementation technique chews through a
scarily large number of variable IDs.
10. Borrow substitution from the host language!
data Exp a
= Var a
| Lam (Exp a -> Exp a)
| Exp a :@ Exp a
11. Pros:
Provides _really_ fast substitution
Cons:
Doesn’t work in theorem provers
(Exp occurs in negative position)
Hard to work under Binders!
Exotic terms
Alpha equivalence checking is tedious
Variants such as Weak HOAS/PHOAS exist to address some
of these issues at the expense of other problems.
12. M’colleague Bob Atkey once memorably described the
capacity to put up with de Bruijn indices as a Cylon
detector, the kind of reverse Turing Test that the humans in
Battlestar Galactica invent, the better to recognize one
another by their common inadequacies. He had a point.
—Conor McBride
“I am not a number, I am a classy hack”
13. Split variables into Bound and Free.
data Exp a
= Free a
| Bound !Int
| Exp a :@ Exp a
| Lam (Exp a)
Bound variables reference the variable being bound by the
lambda n lambdas out. Substitution has to renumber all the
variables.
abstract :: Eq a => a -> Exp a -> Exp a
instantiate :: Exp a -> Exp a -> Exp a
14. Split variables into Bound and Free.
newtype Scope f a = Scope (f a)
data Exp a
= Free a
| Bound !Int
| Exp a :@ Exp a
| Lam (Scope Exp a)
Bound variables reference the variable being bound by the
lambda n lambdas out. Substitution has to renumber all the
variables.
abstract :: Eq a => a -> Exp a -> Scope Exp a
instantiate :: Exp a -> Scope Exp a -> Exp a
15. abstract :: Eq a => a -> Exp a -> Scope Exp a
abstract me expr = Scope (letmeB 0 expr) where
letmeB this (F you)
| you == me = B this
| otherwise = F you
letmeB this (B that) = B that
letmeB this (fun :@ arg) =
letmeB this fun :@ letmeB this arg
letmeB this (Lam (Scope body)) =
Lam (Scope (letmeB (succ this) body))
(Based on code by Conor McBride from “I am not a number: I am a free variable”)
16. instantiate :: Exp a -> Scope Exp a -> Exp a
instantiate what (Scope body) = what'sB 0 body
where
what'sB this (B that)
| this==that = what
| otherwise = B that
what'sB this (F you) = F you
what'sB this (fun :@ arg) =
what'sB this fun :@ what'sB this arg
what'sB this (Lam (Scope body)) =
Lam (Scope (what'sB (succ this) body))
(Based on code by Conor McBride from “I am not a number: I am a free variable”)
17. newtype Scope f a = Scope (f a)
data Exp a
= Free a
| Bound !Int
| Exp a :@ Exp a
| Lam (Scope a)
deriving (Functor, Foldable,Traversable)
We can make an instance of Monad for Exp, but it is an
awkward one-off experience.
18. Pros:
Scope, abstract, and instantiate make it harder
to screw up walking under binders.
Alpha equivalence is just (==)
We can make a Monad for Exp.
We can use Traversable to find free variables,
close terms, etc.
Cons:
This succ’s a lot. (Slow)
Illegal terms such as Lam (Scope (Bound 2))
Have to define abstract/instantiate for each
type.
The Monad for Exp is a one-off deal.
19. data Exp a
= Var a
| Exp a :@ Exp a
| Lam (Exp (Maybe a))
(based on Bird and Paterson)
20. data Incr a = Z | S a
data Exp a
= Var a
| Exp a :@ Exp a
| Lam (Exp (Incr a))
(based on Bird and Paterson)
21. data Incr a = Z | S a
newtype Scope f a = Scope (f (Incr a))
data Exp a
= Var a
| Exp a :@ Exp a
| Lam (Scope Exp a)
instance MonadTrans Scope where
lift = Scope . fmap Just
-- Scope is just MaybeT a Monad transformer in its own
right, but lift is slow.
22. instance Monad Exp where
Var a >>= f = f a
x :@ y >>= f = (x >>= f) :@ (y >>= f)
Lam b >>= f = Lam (b >>= lift . f)
You can derive Foldable and Traversable.
Then Data.Foldable.toList can obtain the free
variables in a term, and (>>=) does capture
avoiding substitution!
23. Pros:
The Monad is easy to define
Foldable/Traversable for free variables
Capture avoiding substitution for free
Cons:
It still succs a lot. lift is O(n).
24. If we could succ an entire expression instead of on each
individual variable we would succ less.
Instantiation wouldn’t have to walk into that expression
at all, and we could lift an Exp into Scope in O(1) instead
of O(n).
This requires polymorphic recursion, but we support
that. Go Haskell!
This is the ‘generalized de Bruijn’ as described by Bird
and Paterson without the rank-2 types mucking up the
description and abstracted into a monad transformer.
25. data Incr a = Z | S a
newtype Scope f a = Scope { unscope :: f (Incr (f a) }
instance Monad f => Monad (Scope f) where
return = Scope . return . S . return
Scope e >>= f = Scope $ e >>= v -> case v of
Z -> return Z
S ea -> ea >>= unscope . f
instance MonadTrans Scope where
lift = Scope . return . S
26. Pros:
The Monad is easy to define
Foldable/Traversable for Free Variables
Capture avoiding substitution for free
Cons:
Alpha equivalence is slightly harder,
because you have to quotient out the position
of the ‘Succ’s.
27. abstract :: (Monad f, Eq a) => a -> f a -> Scope f a
abstract x e = Scope (liftM k e) where
ky | x == y =Z
| otherwise = S (return y)
instantiate :: Monad f => f a -> Scope f a -> f a
instantiate r (Scope e) = e >>= v -> case v of
Z -> r
Sa -> a
We can define these operations once and for all, independent
of our expression type!
28. Not every language is the untyped lambda
calculus. Sometimes you want to bind multiple
variables at the same time, say for a pattern or
recursive let binding, or to represent all the
variables boundby a single quantifier in a single
pass.
So lets go back and enrich our binders so they
an bind multiple variables by generalizing
generalized de Bruijn.
29. data Var b a = B b | F a
data Scope b f a = Scope { unscope :: f (Var b (f a) }
instance Monad f => Monad (Scope b f)
instance MonadTrans (Scope b)
abstract :: Monad f => (a -> Maybe b) -> f a -> Scope b f a
instantiate :: Monad f => (b -> f a) -> Scope b f a -> f a
fromScope :: Monad f => Scope b f a -> f (Var b a)
toScope :: Monad f => f (Var b a) -> Scope b f a
substitute :: (Monad f, Eq a) => a -> f a -> f a -> f a
class Bound t where
(>>>=) :: Monad m => t m a -> (a -> m b) -> a -> t m b
instance Bound (Scope b)
30. data Exp a
=V a
| Exp a :@ Exp a
| Lam (Scope () Exp a)
| Let [Scope Int Exp a] (Scope Int Exp a)
deriving (Eq,Ord,Show,Read,Functor,Foldable,Traversable)
Instance Monad Exp where
Va >>= f = f a
(x :@ y) >>= f = (x >>= f) :@ (y >>= f)
Lam e >>= f = Lam (e >>>= f)
Let bs b >>= f = Let (map (>>>= f) bs) (b >>>= f)
31. abstract1 :: (Monad f, Eq a) => a -> f a -> Scope () f a
abstract :: Monad f => (a -> Maybe b) -> f a -> Scope b f a
lam :: Eq a => a -> Exp a -> Exp a
lam v b = Lam (abstract1 v b)
let_ :: Eq a => [(a,Exp a)] -> Exp a -> Exp a
let_ bs b = Let (map (abstr . snd) bs) (abstr b)
where abstr = abstract (`elemIndex` map fst bs)
infixr 0 !
(!) :: Eq a => a -> Exp a -> Exp a
(!) = lam
32. instantiate :: Monad f => (b -> f a) -> Scope b f a -> f a
instantiate1 :: Monad f => f a -> Scope () f a -> f a
whnf :: Exp a -> Exp a
whnf e@V{} = e
whnf e@Lam{} = e
whnf (f :@ a) = case whnf f of
Lam b -> whnf (instantiate1 a b)
f' -> f' :@ a
whnf (Let bs b) = whnf (inst b)
where es = map inst bs
inst = instantiate (es !!)
33. fromScope :: Monad f => Scope b f a -> f (Var b a)
toScope :: Monad f => f (Var b a) -> Scope b f a
nf :: Exp a -> Exp a
nf e@V{} = e
nf (Lam b) = Lam $ toScope $ nf $ fromScope b
nf (f :@ a) = case whnf f of
Lam b -> nf (instantiate1 a b)
f' -> nf f' :@ nf a
nf (Let bs b) = nf (inst b)
where es = map inst bs
inst = instantiate (es !!)
34. closed :: Traversable f => f a -> Maybe (f b)
closed = traverse (const Nothing)
A closed term has no free variables, so you can
Treat the free variable type as anything you
want.
37. data Exp a
=V a
| Exp a :@ Exp a
| Lam !Int (Pat Exp a) (Scope Int Exp a)
| Let !Int [Scope Int Exp a] (Scope Int Exp a)
| Case (Exp a) [Alt Exp a]
deriving (Eq,Ord,Show,Read,Functor,Foldable,Traversable)
data Pat f a
= VarP
| WildP
| AsP (Pat f a)
| ConP String [Pat f a]
| ViewP (Scope Int f a) (Pat f a)
deriving (Eq,Ord,Show,Read,Functor,Foldable,Traversable)
data Alt f a = Alt !Int (Pat f a) (Scope Int f a)
deriving (Eq,Ord,Show,Read,Functor,Foldable,Traversable)
38. instance Monad Exp where
return = V
Va >>= f = f a
(x :@ y) >>= f = (x >>= f) :@ (y >>= f)
Lam n p e >>= f = Lam n (p >>>= f) (e >>>= f)
Let n bs e >>= f = Let n (map (>>>= f) bs) (e >>>= f)
Case e as >>= f = Case (e >>= f) (map (>>>= f) as)
instance Bound Pat where
VarP >>>= _ = VarP
WildP >>>= _ = WildP
AsP p >>>= f = AsP (p >>>= f)
ConP g ps >>>= f = ConP g (map (>>>= f) ps)
ViewP e p >>>= f = ViewP (e >>>= f) (p >>>= f)
instance Bound Alt where
Alt n p b >>>= f = Alt n (p >>>= f) (b >>>= f)
39. data P a = P { pattern :: [a] -> Pat Exp a, bindings :: [a] }
varp :: a -> P a
varp a = P (const VarP) [a]
wildp :: P a
wildp = P (const WildP) []
conp :: String -> [P a] -> P a
conp g ps = P (ConP g . go ps) (ps >>= bindings)
where
go (P p as:ps) bs = p bs : go ps (bs ++ as)
go [] _ = []
lam :: Eq a => P a -> Exp a -> Exp a
lam (P p as) t = Lam (length as) (p []) (abstract (`elemIndex` as) t)
ghci> lam (varp "x") (V "x”)
Lam 1 VarP (Scope (V (B 0)))
ghci> lam (conp "Hello" [varp "x", wildp]) (V "y”)
Lam 1 (ConP "Hello" [VarP,WildP]) (Scope (V (F (V "y"))))
40. Deriving Eq, Ord, Show and Read requires some tomfoolery. The issue is
that Scope uses polymorphic recursion.
So the most direct way of implementing Eq (Scope b f a) would require
Instance (Eq (f (Var b (f a)), Eq (Var b (f a), Eq (f a), Eq a) => Eq (Scope b f a)
And then Exp would require:
instance (Eq a, Eq (Pat Exp a), Eq (Scope Int Exp a), Eq (Alt
Exp a)) => Eq (Exp a)
Plus all the things required by Alt, Pat, and Scope!
Moreover, these would require flexible contexts, taking us out of Haskell
98/2010.
Blech!
41. My prelude-extras package defines a number of boring typeclasses like:
class Eq1 f where
(==#) :: Eq a => f a -> f a -> Bool
(/=#) :: Eq a => f a -> f a -> Bool
class Eq1 f => Ord1 f where
compare1 :: Ord a => f a -> f a -> Ordering
class Show1 f where
showsPrec1 :: Show a => Int -> f a -> ShowS
class Read1 f where
readsPrec1 :: Read a => Int -> ReadS (f a)
readList1 :: Read a => ReadS [f a]
42. Bound defines:
instance (Functor f, Show b, Show1 f, Show a) => Show (Scope b f a)
instance (Functor f, Read b, Read1 f, Read a) => Read (Scope b f a)
instance (Monad f, Ord b, Ord1 f, Ord a) => Ord (Scope b f a)
instance (Monad f, Eq b, Eq1 f, Eq a) => Eq (Scope b f a)
So you just need to define
instance Eq1 Exp where (==#) = (==)
instance Ord1 Exp where compare1 = compare
instance Show1 Exp where showsPrec1 = showsPrec
instance Read1 Exp where readsPrec1 = readsPrec
Why do some use Monad? Ord and Eq perform a non-structural equality
comparison so that (==) is alpha-equality!
43. We can define languages that have strongly typed variabes by
moving to much scarier types. =)
type Nat f g = forall x. f x -> g x
class HFunctor t where
hmap :: Nat f g -> Nat (t f) (t g)
class HFunctor t => HTraversable t where
htraverse :: Applicative m => (forall x. f x -> m (g x)) -> t f a -> m (t g
a)
class HFunctor t => HMonad t where
hreturn :: Nat f (t f)
(>>-) :: t f a -> Nat f (t g) -> t g a
44. data Equal a b where
Refl :: Equal a a
class EqF f where
(==?) :: f a -> f b -> Maybe (Equal a b)
data Var b f a where
B :: b a -> Var b f a
F :: f a -> Var b f a
newtype Scope b t f a = Scope { unscope :: t (Var b (t f)) a }
abstract :: HMonad t =>
(forall x. f x -> Maybe (b x)) -> Nat (t f) (Scope b t f)
instantiate :: HMonad t => Nat b (t f) -> Nat (Scope b t f) (t f)
class HBound s where
(>>>-) :: HMonad t => s t f a -> Nat f (t g) -> s t g a
45. Dependently typed languages build up a lot of
crap in memory. It’d be nice to share memory
for it, since most of it is very repetitive.
46. Bound provides a small API for dealing with
abstraction/instantiation for complex binders
that combines the nice parts of “I am not a
number: I am a free variable” with the “de Bruijn
notation as a nested data type” while avoiding
the complexities of either.
You just supply it a Monad and Traversable
No variable supply is needed, no pool of names
Substitution is very efficient
Introduces no exotic or illegal terms
Simultaneous substitution for complex binders
Your code never sees a de Bruijn index
47.
48.
49. data Ix :: [*] -> * -> * where
Z :: Ix (a ': as) a
S :: Ix as b -> Ix (a ': as) b
data Vec :: (* -> *) -> [*] -> * where
HNil :: Vec f '[]
(:::) :: f b -> Vec f bs -> Vec f (b ': bs)
data Lit t where
Integer :: Integer -> Lit Integer
Double :: Double -> Lit Double
String :: String -> Lit String
data Remote :: (* -> *) -> * -> * where
Var :: f a -> Remote f a
Lit :: Lit a -> Remote f a
Lam :: Scope (Equal b) Remote f a -> Remote f (b -> a)
Let :: Vec (Scope (Ix bs) Remote f) bs -> Scope (Ix bs) Remote f a -> Remote f a
Ap :: Remote f (a -> b) -> Remote f a -> Remote f b
50. lam_ :: EqF f => f a -> Remote f b -> Remote f (a -> b)
lam_ v f = Lam (abstract (v ==?) f)
-- let_ actually winds up becoming much trickier to define
-- requiring a MonadFix and a helper monad.
two12121212 = let_ $ mdo
x <- def (cons 1 z)
z <- def (cons 2 x)
return z