Geeky things and other research from a work in progress

2009-02-19

The strict danger of switching from pure to monadic Template Haskell

Due to several issues with the Template Haskell (TH) deriving in EMGM (which I knew were there, but needed significant motivation to fix), I realized it was time to convert much of my code from pure to monadic. From the beginning, I developed it to be as non-monadic (i.e. not in Q) as possible for all the reasons one is supposed to. This led to several challenges that were not too difficult to surmount at the time. Namely, reify and newName had to be used at the top level of my function hierarchy, and their values had be passed down. But now I really need reify in a few places in the depths of the code, and it is quite impractical to do otherwise. For example, I need to expand type synonyms, and if I do that at the top level, every name is now either a name or some adaptation of a type declaration.

Most of the refactoring was surprisingly easy. I simply changed uppercase constructors to their lowercase, monadic equivalents (see the docs for the difference). However, I ran into a problem in which a function specific to bifunctor types was getting applied to monomorphic types. This lead to strange errors (compile-time of course, since it's TH) that told me something had changed in the way the functions were used. It was very likely not the programming logic itself, because I was carefully migrating individual functions a few at a time and testing as I went. But somehow, once I reached the point in the code where I needed reify, I started getting these error reports (my code, though it was for unexpected inputs).

After several hours of tracing through with report, I found the problem. Here is the blah-ified (and simplified) original code:

-- Non-monadic:
blah :: ... -> ...
blah ... = ...

-- Monadic wrapper
blahM :: ... -> Q [...]
blahM ... = return [blah ...]

-- Exposed
fun :: ... -> Q ...
fun ... = do
  a <- ...
  b <- blahM ...
  let x =
        case ... of
          1 -> a
          2 -> b
          _ -> []
  return (... ++ x)

Most of the logic lies under blah, so it follows that I monadicized (monadified?) blah which required a minor changed to blahM:

blah :: ... -> Q ...
blah ... = ...

blahM :: ... -> Q [...]
blahM ... = do
  x <- blah ...
  return [x]

Now, this didn't affect anything immediately. I kept converting my functions to the monadic religion, and at some point, I suppose I evangelized too much. Then, I believe that blah, which by the original design was meant to be called lazily, became too strict. It was getting called every time, instead of only when the case expression matched on 2. Once I realized this, it was evident that I needed to change fun.

fun :: ... -> Q ...
fun ... = do
  x <-
    case ... of
      1 -> ...
      2 -> blahM ...
      _ -> return []
  return (... ++ x)

Now, everything is strictly normal again. No strange errors to keep me up at night. I believe now that blahM was always getting evaluated even though blah was not. So, as I pushed the monad down into the code, more and more things were getting strictly evaluated.

I keep learning that I don't understand laziness as well as I think I do. Or maybe it's that I don't understand strictness. Or perhaps I'm too lazy to strictly learn both.

I also resolve never to write non-monadic Template Haske11 code again (excluding small and/or simple functions, of course). The amount of work required is not worth the benefits gained.

2009-02-15

Incremental fold, a design pattern

I recently read the article "How to Refold a Map" by David F. Place in The Monad.Reader Issue 11. I've been thinking about incremental algorithms in Haskell for some time, and I realized that Place has written a specific instance (and optimization) of a more general concept: the incremental fold.

In this article, I demonstrate a design pattern for converting a datatype and related functions into an incremental fold. The pattern is not difficult to comprehend, but it would be nice to improve upon it. I explore a few improvements and issues with those improvements. Ultimately, I'd like to see this functionality in a program instead of a design pattern.

Note: This is a literate Haskell article. You can copy the text of the entire article, paste it into a new file called IncrementalTreeFold.lhs, and load it into GHCi.


> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE ScopedTypeVariables #-}

> module IncrementalTreeFold where
> import Prelude hiding (elem)
> import qualified Data.Char as Char (ord)

Introducing a Typical Binary Tree

Before we get to the conversion, let's choose an appropriate datatype. Place adapted the Map type used in Data.Map (or Set in Data.Set). To simplify my presentation, I will use an ordered binary tree with labeled nodes.


> data Tree a
>   = Tip
>   | Bin a (Tree a) (Tree a)
>   deriving Show

Next, let's introduce some useful functions. An incremental fold is not necessarily like applying a fold function (a.k.a. a catamorphism, not a crush function that has become known as a fold) to a value directly. Instead, as I will later show, it integrates into existing functions that manipulate values. That said, we should have some functions for building Trees. Here is the beginning of a Tree API. (There are a number of other operations, e.g. delete and lookup, that can easily be added but do not contribute much to the discussion.)

empty builds a tree with no elements.


> empty :: (Ord a) => Tree a
> empty = Tip

singleton builds a tree with a single element.


> singleton :: (Ord a) => a -> Tree a
> singleton x = Bin x Tip Tip

insert puts a value in the appropriate place given a left-to-right ordering of values in the tree.


> insert :: (Ord a) => a -> Tree a -> Tree a
> insert x t =
>   case t of
>     Tip ->
>       singleton x
>     Bin y lt rt ->
>       case compare x y of
>         LT -> Bin y (insert x lt) rt
>         GT -> Bin y lt (insert x rt)
>         EQ -> Bin x lt rt

fromList creates a tree from a list of values.


> fromList :: (Ord a) => [a] -> Tree a
> fromList = foldr insert empty

elem determines if a value is an element of a tree.


> elem :: (Ord a) => a -> Tree a -> Bool
> elem x t =
>   case t of
>     Tip ->
>       False
>     Bin y lt rt ->
>       case compare x y of
>         LT -> elem x lt
>         GT -> elem x rt
>         EQ -> True

Now, using our library of sorts, we can create binary search tree and check if a value is in the tree.


> test1 = 37 `elem` fromList [8,23,37,82,3]

Tree Folds

Suppose that we now want the size of the tree. For good abstraction and high reuse, we create a fold function.


> data Alg a b = Alg { ftip :: b, fbin :: a -> b -> b -> b }

> fold :: Alg a b -> Tree a -> b
> fold alg = go
>   where
>     go Tip           = ftip alg
>     go (Bin x lt rt) = fbin alg x (go lt) (go rt)

fold allows us to write a simple size function.


> size :: Tree a -> Int
> size = fold (Alg 0 (\_ lr rr -> 1 + lr + rr))

I use the datatype Alg here to contain the algebra of the fold. In size, we simply replace each constructor in the algebra of Tree with a corresponding element from the algebra of integer addition. Since you're reading this article, you're probably a Haskell programmer and already familiar with the sorts of functions that can be written with folds. Here are a few others.


> filter :: (a -> Bool) -> Tree a -> [a]
> filter f = fold (Alg [] (\x lr rr -> if f x then [x] else [] ++ lr ++ rr))

> ord :: Tree Char -> Tree Int
> ord  = fold (Alg Tip (\x lt rt -> Bin (Char.ord x) lt rt))

Incremental Change

Now that we have a grasp on using a fold on a datatype, I would like to show how to extend my binary tree "library" defined above to support an incremental fold. The incremental fold can (I believe) do everything a traditional fold can do, but it does it during Tree construction instead of externally in a separate function. This means that every time we produce a new Tree (via singleton, insert, or fromList for example), we get a new result of the incremental fold.

Transforming our library into an incremental calculating machine involves several steps. The first step is extending the datatype to hold the incremental result. Since we want to be polymorphic in the result type, we add a type parameter r to the Tree type constructor. And since each constructor may possibly have an incremental result, it must also be extended with a place holder for r.


> data Tree' a r
>   = Tip' r
>   | Bin' a (Tree' a r) (Tree' a r) r
>   deriving Show

For convenience and possibly to hide the modified constructors from the outside world, we add a function for retrieving the increment result.


> result' :: Tree' a r -> r
> result' (Tip' r)       = r
> result' (Bin' _ _ _ r) = r

As I mentioned earlier, the machinery of the fold is now in the construction. To implement this second step, we use smart constructors.


> tip' :: Alg a r -> Tree' a r
> tip' alg = Tip' (ftip alg)

> bin' :: Alg a r -> a -> Tree' a r -> Tree' a r -> Tree' a r
> bin' alg x lt rt = Bin' x lt rt (fbin alg x (result' lt) (result' rt))

Both tip' and bin' construct new values of Tree' a r and using the algebra, calculate the incremental result to be stored in each value. Thus, the actual fold operation is "hidden" in the construction of values.

Now, in order to put the incremental fold to work in a function, we simply (1) add the algebra to the function's arguments, (2) add an wildcard pattern for the result field in constructor patterns, and (3) replace applications of the constructors with that of their incremental cousins. Here's an example of the singleton and insert functions modified for incremental folding.


> singleton' :: (Ord a) => Alg a r -> a -> Tree' a r
> singleton' alg x = bin' alg x (tip' alg) (tip' alg)

> insert' :: (Ord a) => Alg a r -> a -> Tree' a r -> Tree' a r
> insert' alg x t =
>   case t of
>     Tip' _ ->
>       singleton' alg x
>     Bin' y lt rt _ ->
>       case compare x y of
>         LT -> bin' alg y (insert' alg x lt) rt
>         GT -> bin' alg y lt (insert' alg x rt)
>         EQ -> bin' alg x lt rt

Comparing these functions with the initial versions, we see that the changes are readily apparent. Modify every other Tree'-hugging function in the same manner, and you have a design pattern for an incremental fold!

Improving the Incremental Implementation

Of course, you may complain that there's some amount of boilerplate work involved. For example, we have to add this alg argument everywhere. Let's try to replace that with a type class.


< class Alg'' a r where
<   ftip'' :: r
<   fbin'' :: a -> r -> r -> r

And we redefine our smart constructors.


< tip'' :: (Alg' a r) => Tree' a r
< tip'' = Tip' ftip''

But there's a problem here! GHC reports that it Could not deduce (Alg'' a r) from the context (Alg'' a1 r). The poor compiler cannot infer the type of the parameter a since ftip'' has only type r.

Let's try another version of the class. In this one, we add a dummy argument to ftip' in order to force GHC to correctly infer the full type.


> class Alg'' a r where
>   ftip'' :: a -> r
>   fbin'' :: a -> r -> r -> r

> tip'' :: forall a r . (Alg'' a r) => Tree' a r
> tip'' = Tip' (ftip'' (undefined :: a))

> bin'' :: (Alg'' a r) => a -> Tree' a r -> Tree' a r -> Tree' a r
> bin'' x lt rt = Bin' x lt rt (fbin'' x (result' lt) (result' rt))

This provides one (not very pretty) solution to the problem. I'm able to get around the need to require an argument for tip'' by using lexically scoped type variables. But it doesn't remove the ugly type from ftip'', and the user is forced to ignore it when writing an instance.

The functions can now be rewritten with the Alg'' constraint.


> empty'' :: (Ord a, Alg'' a r) => Tree' a r
> empty'' = tip''

> singleton'' :: (Ord a, Alg'' a r) => a -> Tree' a r
> singleton'' x = bin'' x tip'' tip''

> insert'' :: (Ord a, Alg'' a r) => a -> Tree' a r -> Tree' a r
> insert'' x t =
>   case t of
>     Tip' _ ->
>       singleton'' x
>     Bin' y lt rt _ ->
>       case compare x y of
>         LT -> bin'' y (insert'' x lt) rt
>         GT -> bin'' y lt (insert'' x rt)
>         EQ -> bin'' x lt rt

> fromList'' :: (Ord a, Alg'' a r) => [a] -> Tree' a r
> fromList'' = foldr insert'' empty''

These versions look more like the non-incremental implementations above. To use them, we need to declare an instance of Alg'' with an appropriate algebra for our desired incremental result. Here's how we would rewrite size.


> newtype Size = Size { unSize :: Int }

> instance Alg'' a Size where
>   ftip'' _ = Size 0
>   fbin'' _ lr rr = Size (1 + unSize lr + unSize rr)

> size'' :: Tree' a Size -> Int
> size'' = unSize . result'

> test2 = size'' $ insert'' 's' $ insert'' 'p' $ insert'' 'l' $ fromList'' "onderzoek"

Size is still defined as a fold, but the result is incrementally built with each application of a library function. This can have a nice performance boost as Place also found in his article.

Generic Thoughts

On reflecting over my implementation, I really don't like the dummy arguments required by constructors like Tip. There are other approaches to dealing with this, but I haven't yet found a better one. If you use a functional dependency such as r -> a in the definition of Alg'', then a would be uniquely determined by r. In the case of size'', we would have to specify a concrete element type for Tree' instead of the parameter a (or use undecidable instances). Perhaps, dear reader, you might have a better solution?

The incremental fold pattern is great for documenting an idea, but it has several downsides: (1) The obvious one is that it requires modifying a datatype and code. This is not always desirable and often not practical. (2) Implementing an incremental fold can involve a lot of boilerplate code and many, small changes that are monotonous and boring. It's very easy to make mistakes. In fact, I made several copy-paste-and-forget-to-change errors while writing this article.

As Jeremy Gibbons and others have shown us, design patterns are better as programs. Since the code is so regular, it seems very receptive to some generic programming. I plan to explore this further, possibly using one of the many generics libraries available for Haskell or designing a new one. Suggestions and feedback are welcome.

Update 2008-03-30: The source code for this entry is now available at GitHub.