## Ornery Types and Their Associated Grammars

My goals are simple: Represent an AST in Haskell and have the type system accurately describe what is permitted in the language. Almost immediately I was foiled.

Consider this deceptively simple language:

``````L ::= A | B | C | D
A ::= 'a'
B ::= 'b'
C ::= 'c' '[' {B} ']'
D ::= 'd' '[' {L} ']'``````

Strings in this language:

``````a
b
c [ b ]
c [ b b b ]
d [ b a b ]
d [ a b c [ b b ] ]``````

Strings NOT in this language:

``````c [ a b a ]
a b``````

Now that I know what language I’m dealing with I can postulate a few Haskell types that might fit the AST bill. I can start by trying the most direct and obvious approach using an algebraic data type:

``type L``

To which I will add a constructor for A:

``    = A``

And a constructor for B:

``    | B``

I’ll skip C for a moment and add D:

``    | D [L]``

So far I have:

``````type L
= A
| B
| D [L]``````

Which leaves me with C and as you have probably surmised this is where the trouble begins. I could add a constructor for C similar to D:

``    | C [L]``

But this violates the stated goal of having the AST types tightly match what is actually permitted in the language. In this case the AST would permit the forbidden string “c [ a b a ]”. It would be lovely if I could say:

``    | C [B]``

But B isn’t a type it’s a constructor… Well I could fix that by pulling B out of L and adding in C:

``````type B = B

type L
= A
| D [L]
| C [B]``````

Except that I’ve lost the ability to represent strings such as “d [ b a b ]”. Another option would be to parameterize our type L like so:

``````type B = B
type L b
= A
| B' b
| D [L]
| C [b]``````

Which when using B’ B and C [B] permits me to represent all the strings I’m looking for. It introduces an oddity however. The elements of C’s list won’t be directly comparable to those of B’. While I’ve no proof of this, I suspect this will lead to some painful acrobatics later on when trying to write functions that manipulate the AST. Similarly, I could ‘wrap’ all the constructors of L:

``````type A = A
type B = B
type C = C [B]
type D = D [L]
type L = LA A | LB B | LC C | LD D``````

Like before I cannot directly compare B with LB, but I can always count on unpacking an L when doing comparisons and so should be easier. Overall I rank these solutions ‘so so’. Can I do better? Maybe if I use existential types and type classes?

``````{-# LANGUAGE ExistentialQuantification #-}

class Show a => L a

data A = A deriving Show
instance L A

data B = B deriving Show
instance L B

data C = C [B] deriving Show
instance L C

data D = D [AnyL] deriving Show
instance L D

data AnyL = forall a. L a => AnyL a
instance L AnyL
instance Show AnyL where
show (AnyL a) = show a``````

Not much better really since I can’t compare AnyL to B directly. Actually, its worse since I can’t unpack AnyL. Not all that surprising as that’s sort of the whole premise of existential types, but very very inconvenient. This solution has the dubious honor of being viewed by some as an anti-pattern. Using GADTs I end up with a similar problem:

``````{-# LANGUAGE GADTs #-}

data A = A deriving Show
data B = B deriving Show
data C = C deriving Show
data D = D deriving Show

data L a where
LA :: A -> L A
LB :: B -> L B
LC :: C -> [L B] -> L C
LD :: D -> [L b] -> L D

instance Show (L a) where
show (LA a) = show a
show (LB a) = show a
show (LC a b) = show (a, b)
show (LD a b) = show (a, b)``````

I have successfully restricted C to type L B (which is different from all previous examples actually), but now D is restricted to only one type of L. It won’t accept the string “d [ a b a ]”. Without reintroducing AnyL I also can’t see a way to make D general to all L. Which leaves me with two mediocre solutions and a headache.

• Luke
• December 2nd, 2010

I liked your final solution before you started trying to get all magical:

type A = A
type B = B
type C = C [B]
type D = D [L]
type L = LA A | LB B | LC C | LD D

I find this solution to very directly and elegantly model the problem description, not ugly at all.

• Trying and failing ;o] That solution is ok, but as the grammar’s complexity increases more boilerplate L* constructors will be needed (and continues to get worse the more collection types you need).

1. Yet another solution to add to the mix:

data A
data B
data C
data D

data L

data AST t where
A :: AST A
B :: AST B
C :: [AST B] -> AST C
D :: [AST L] -> AST D
L :: AST t -> AST L

I didn’t think it was possible, but apparently a GADT ‘forall’ can be unpacked (as opposed to the existential which could not). Which makes this possible:

tostr :: AST L -> [Char]
tostr (L A) = “A”
tostr (L B) = “B”
tostr (L _) = “C or D”

> map tostr \$ [L A, L B, L A, L \$ C [B, B]]
[“A”,”B”,”A”,”C or D”]