Ornery Types and Their Associated Grammars

My goals are simple: Represent an AST in Haskell and have the type system accurately describe what is permitted in the language. Almost immediately I was foiled.

Consider this deceptively simple language:

L ::= A | B | C | D
A ::= 'a'
B ::= 'b'
C ::= 'c' '[' {B} ']'
D ::= 'd' '[' {L} ']'

Strings in this language:

c [ b ]
c [ b b b ]
d [ b a b ]
d [ a b c [ b b ] ]

Strings NOT in this language:

c [ a b a ]
a b

Now that I know what language I’m dealing with I can postulate a few Haskell types that might fit the AST bill. I can start by trying the most direct and obvious approach using an algebraic data type:

type L

To which I will add a constructor for A:

    = A

And a constructor for B:

    | B

I’ll skip C for a moment and add D:

    | D [L]

So far I have:

type L
    = A
    | B
    | D [L]

Which leaves me with C and as you have probably surmised this is where the trouble begins. I could add a constructor for C similar to D:

    | C [L]

But this violates the stated goal of having the AST types tightly match what is actually permitted in the language. In this case the AST would permit the forbidden string “c [ a b a ]”. It would be lovely if I could say:

    | C [B]

But B isn’t a type it’s a constructor… Well I could fix that by pulling B out of L and adding in C:

type B = B

type L
    = A
    | D [L]
    | C [B]

Except that I’ve lost the ability to represent strings such as “d [ b a b ]”. Another option would be to parameterize our type L like so:

type B = B
type L b
    = A
    | B' b
    | D [L]
    | C [b]

Which when using B’ B and C [B] permits me to represent all the strings I’m looking for. It introduces an oddity however. The elements of C’s list won’t be directly comparable to those of B’. While I’ve no proof of this, I suspect this will lead to some painful acrobatics later on when trying to write functions that manipulate the AST. Similarly, I could ‘wrap’ all the constructors of L:

type A = A
type B = B
type C = C [B]
type D = D [L]
type L = LA A | LB B | LC C | LD D

Like before I cannot directly compare B with LB, but I can always count on unpacking an L when doing comparisons and so should be easier. Overall I rank these solutions ‘so so’. Can I do better? Maybe if I use existential types and type classes?

{-# LANGUAGE ExistentialQuantification #-}

class Show a => L a

data A = A deriving Show
instance L A

data B = B deriving Show
instance L B

data C = C [B] deriving Show
instance L C

data D = D [AnyL] deriving Show
instance L D

data AnyL = forall a. L a => AnyL a
instance L AnyL
instance Show AnyL where
    show (AnyL a) = show a

Not much better really since I can’t compare AnyL to B directly. Actually, its worse since I can’t unpack AnyL. Not all that surprising as that’s sort of the whole premise of existential types, but very very inconvenient. This solution has the dubious honor of being viewed by some as an anti-pattern. Using GADTs I end up with a similar problem:


data A = A deriving Show
data B = B deriving Show
data C = C deriving Show
data D = D deriving Show

data L a where
    LA :: A -> L A
    LB :: B -> L B
    LC :: C -> [L B] -> L C
    LD :: D -> [L b] -> L D

instance Show (L a) where
    show (LA a) = show a
    show (LB a) = show a
    show (LC a b) = show (a, b)
    show (LD a b) = show (a, b)

I have successfully restricted C to type L B (which is different from all previous examples actually), but now D is restricted to only one type of L. It won’t accept the string “d [ a b a ]”. Without reintroducing AnyL I also can’t see a way to make D general to all L. Which leaves me with two mediocre solutions and a headache.

Monad. Monad.

> import Control.Monad
> cartesian xs ys =
> 	liftM2 pair xs ys
> 	where pair a b = (a, b)
$ cartesian [1,2,3] [4,5,6]

Consider my imperative brain just eaten by a monad. I had come to understand monads to be something like a way to string computations together so that they would occur in order. For example:

> printme z = do
> 	print "One"
> 	print "Two"
> 	print $ "Many " ++ z

May the Count and XKCD forgive me, this does just what you’d expect:

$ printme "ah ah ah"
"Many ah ah ah"

Now consider this doozy from the Monad wiki article [1]:

>  a = do x <- [3..4]
> 	[1..2]
> 	return (x, 42)

What do you suppose this does? Straight away it appears that the [1..2] does nothing right? Regardless of the meaning of the x <- [3..4] … right? It’s not so:

$ a

Cue confusion.

Let’s simplify that example a bit:

> b = do x <- [3..4]
> 	return (x, 42)
$ b

Mmm hmm, yeah. x <- [3..4] is really doing something weird here. According to the wiki article [1] the do notation is really just sugar for this:

> a2 = [3..4] >>= (\x -> [1..2] >>= (\_ -> return (x, 42)))

Ok, combine that with the definition:

instance Monad [] where
	m >>= f  = concatMap f m
	return x = [x]
	fail s   = []

(Wait, List is a monad? Yup.)

The article then helpfully goes on to show that replacing >>= with concatMap produces just the list we saw for a. Which is to say, when you use lists in monads you are mapping the values over the remainer of the monad: ala foreach.

To illustrate we have a cartesian product:

> cartesian0 xs ys = do		-- given a list of xs and ys
> 	x <- xs			-- for each x
> 	y <- ys			--   for each y
> 	return $ (x, y)		--     append the tuple (x, y)

On that same page though we have the definition for liftM2 which looks an awful lot like what we just wrote for cartesian0:

liftM2 :: Monad m => (a -> b -> c) -> m a -> m b -> m c
liftM2 op mx my = do
	x <- mx
	y <- my
	return (op x y)

Thus the definition from the top:

cartesian xs ys =
	liftM2 pair xs ys
	where pair a b = (a, b)

But the fun doesn’t stop there! Looks like a pretty slick way to perform a computation that typically would require an accumulator eh? Tada! I give you list comprehensions in disguise. Or rather list comprehensions are monads in disguise [2].

> cartesian1 xs ys = [(x, y) | x <- xs, y <- ys]

Under Construction

Site rebirth take-N. The short of it: Google Sites fails as a blog.


Get every new post delivered to your Inbox.