Geeky things and other research from a work in progress

Showing posts with label design-pattern. Show all posts
Showing posts with label design-pattern. Show all posts

2009-03-31

Incremental attributes

I previously wrote about a design pattern I called an incremental fold (or catamorphism). I described it as a design pattern, because, as written, it cannot be factored into code. That is, it is a pattern for designing part of a program.

The pattern I presented is a useful way to implement functions that can be expressed as catamorphisms such that the result is incrementally computed for each operation on a value of a datatype. Unlike a fold defined directly as a function, which traverse an entire value, the incremental fold only traverses parts that are updated. For some values, this may provide a performance benefit.

This post shows how we can adapt the above idea to a more general concept that I'm calling incremental attributes. It's more general in that incremental attributes can express the incremental fold as well as other flows of incremental computation.

Review of the incremental fold

First, let's review the implementation of the incremental fold.

[Note: This is not a literate Haskell article, because there's too much duplicated code required; however, all source files are available.]

module IncrementalAttributes1Synthesized where

data Tree a s
= Tip s
| Bin a (Tree a s) (Tree a s) s
deriving Show

data Alg a s
= Alg { stip :: s, sbin :: a -> s -> s -> s }

result :: Tree a s -> s
result (Tip s) = s
result (Bin _ _ _ s) = s

tip :: Alg a s -> Tree a s
tip alg = Tip (stip alg)

bin :: Alg a s -> a -> Tree a s -> Tree a s -> Tree a s
bin alg x lt rt = Bin x lt rt (sbin alg x (result lt) (result rt))

empty :: (Ord a) => Alg a s -> Tree a s
empty = tip

singleton :: (Ord a) => Alg a s -> a -> Tree a s
singleton alg x = bin alg x (tip alg) (tip alg)

insert :: (Ord a) => Alg a s -> a -> Tree a s -> Tree a s
insert alg x t =
case t of
Tip _ ->
singleton alg x
Bin y lt rt _ ->
case compare x y of
LT -> bin alg y (insert alg x lt) rt
GT -> bin alg y lt (insert alg x rt)
EQ -> bin alg x lt rt

fromList :: (Ord a) => Alg a s -> [a] -> Tree a s
fromList alg = foldr (insert alg) (empty alg)

heightAlg :: Alg a Integer
heightAlg = Alg 0 (\_ x y -> 1 + max x y)

t1 = fromList heightAlg "azbycx"

This will be the starting point for our discussion. We have a basic binary tree with an algebra type that gives the fold functions, stip and sbin. The application of the algebra is blended into the utility functions just as it was with the incremental fold. I have chosen to keep the representation simple, so the algebra is passed around as an argument to the functions. This could, of course, be done with type classes. Lastly, we have an example algebra that determines the height of a tree. To get the height of the example t1, simply type result t1 after loading this file into GHCi.

Incrementally inherited attributes

Suppose that, instead of height, we wanted to incrementally compute the depth of every node. Thus, we would attach a value to each node that stored its distance from the root. We can't do that with the above implementation, because attributes are only fed "upwards" or from the leaves to the root. We have no way of passing information downwards. The solution is to use inherited attributes.

Before presenting the code, here is a little side note. You may notice the use of the words synthesized and inherited here. The terminology comes from the study of attribute grammars, extending context-free grammars to support semantic operations. Wouter Swierstra wrote a great tutorial on attribute grammars in Haskell for The Monad Reader in 2005. In fact, I use an example from there at the end of this article. You can think of synthesized as "produced by the children for the parent" and inherited as "passed down from the parent to the children."

As you can now imagine, inherited attributes will allow us to bring information to the leaves. Many of the changes to the code are trivial, so we ignore them. The relevant changes are the following:

data Alg a i
= Alg { itip :: i -> i, ibin :: a -> i -> i }

tip :: Alg a i -> i -> Tree a i
tip alg i = Tip (itip alg i)

bin :: Alg a i -> i -> a -> Tree a i -> Tree a i -> Tree a i
bin alg i x lt rt = Bin x (update i lt) (update i rt) i
where
update i' t =
case t of
Tip _ ->
tip alg i'
Bin x lt rt _ ->
let s = ibin alg x i' in
Bin x (update s lt) (update s lt) s

The datatype Alg (that I'm still calling the algebra, though that may not be proper use of the category theoretical term) now has functions that take an inherited attribute from a parent and create a new inherited attribute to be stored with the node and passed on to its children. The change to bin is the more complicated of the changes, because once a Bin constructor is constructed, all of its child nodes must be updated with new inherited values.

To implement an algebra for depth, we do the following:

depthAlg :: Alg a Int
depthAlg = Alg (+1) (const (+1))

t1 = fromList depthAlg 0 "azbycx"

Load the code and check the result to see for yourself what it looks like.

One is not enough

Now that we have use cases for synthesized and inherited incremental attributes, we're going to want both. Fortunately, that's not too difficult. The new datatypes are simply a product of the two previous:

data Tree a i s
= Tip i s
| Bin a (Tree a i s) (Tree a i s) i s
deriving Show

data Alg a i s
= Alg { itip :: i -> i, ibin :: a -> i -> i,
stip :: s, sbin :: a -> s -> s -> s }

You can now see why I was using s and i to distinguish the types of the attributes. Again, most of the code modifications are trivial, and the bin function needs special attention.

bin :: Alg a i s -> i -> a -> Tree a i s -> Tree a i s -> Tree a i s
bin alg i x lt rt =
Bin x (update i lt) (update i rt) i (sbin alg x (sresult lt) (sresult rt))
where
update i' t =
case t of
Tip _ _ ->
tip alg i'
Bin y ylt yrt _ s ->
let j = ibin alg y i' in
Bin y (update j ylt) (update j yrt) j s

Defining an algebra for both depth and height is no more difficult than defining each alone.

depthAndHeightAlg :: Alg a Int Int
depthAndHeightAlg = Alg (+1) (const (+1)) 1 (\_ x y -> 1 + max x y)

Feedback

You probably know where this is going by now. There's that famous saying, "what goes down must come up." We want more than just two separate directions of information flow. We want to utilize the information flowing toward the leaves to help determine that which flows up to the root or vice versa. A simple example of this is a counter that annotates each node with its rank in an in-order traversal. This can't be done with just synthesized or inherited attributes, because it depends on a combination of input from the parent, children, and siblings for each node.

The code is similar to the previous implementation, but the differences in Alg are important.

data Alg a i s
= Alg { ftip :: i -> s, fbin :: a -> i -> s -> s -> (i, i, s) }

Each node now has a single inherited attribute, because it has a single parent. We use the synthesized attributes to store a local result, so each constructor only has one as an output. For the Bin constructor, we have a pair of incoming synthesized values and a pair of outgoing inherited values. The left component in each pair is associated with the left child, and the right with the right child. This allows us to have information flow up from the synthesized attribute of the left child and down to the inherited attribute of the right or in the opposite direction.

The bin is again tricky to write correctly.

bin :: Alg a i s -> i -> a -> Tree a i s -> Tree a i s -> Tree a i s
bin alg i x lt rt = update i (Bin x lt rt undefined undefined)
where
update j t =
case t of
Tip _ _ ->
tip alg j
Bin y ylt yrt _ _ ->
let (li, ri, s) = fbin alg y j (sresult zlt) (sresult zrt)
zlt = update li ylt
zrt = update ri yrt
in Bin y zlt zrt j s

Notice the circular programming here. The definition and uses of, for example, li and zlt show that we could easily loop infinitely. This depends on how the specific algebra functions are implemented. Here is the "counter example":

newtype CounterI = CI { cntI :: Int } deriving Show
data CounterS = CS { size :: Int, cntS :: Int } deriving Show

counterAlg :: Alg a CounterI CounterS
counterAlg = Alg ft fb
where

ft :: CounterI -> CounterS
ft i = CS { size = 1, cntS = cntI i }

fb :: a -> CounterI -> CounterS -> CounterS -> (CounterI, CounterI, CounterS)
fb _ i ls rs =
( i -- left
, CI { cntI = 1 + cntI i + size ls } -- right
, CS { size = 1 + size ls + size rs
, cntS = cntI i + size ls }
)

t1 = fromList counterAlg (CI { cntI = 0 }) "azbycx"

I've relied heavily on record syntax to document the flow of information. Notice in fb how the i is directly inherited by the left child and how the right child inherits the new count that depends on the size of the left subtree and the inherited count of its parent. As shown in this example, the dependency flow must be unidirectional for one desired result. But there's no reason we can't go up, down, and then up again (for example).

Revisiting the diff problem.

As I mentioned, Wouter wrote a good introduction to attribute grammars in Haskell (which I highly recommend that you read). He focuses on the use of the UUAG system to generate code for solving problems that are harder to solve with traditional functional programming techniques. He describes the problem as follows:

Suppose we want to write a function diff :: [Float] -> [Float] that given a list xs, calculates a new list where every element x is replaced with the difference between x and the average of xs. Similar problems pop up in any library for performing statistical calculations.

Great problem! And we can solve it using incremental attributes in Haskell instead of in UUAG's attribute grammar syntax.

newtype DiffI = DI { avg :: Float } deriving Show
data DiffS = DS { sumD :: Float, len :: Float, res :: Float } deriving Show

diffAlg :: Alg Float DiffI DiffS
diffAlg = Alg ft fb
where

ft :: DiffI -> DiffS
ft i =
DS { sumD = 0
, len = 0
, res = 0
}

fb :: Float -> DiffI -> DiffS -> DiffS -> (DiffI, DiffI, DiffS)
fb x i ls sr =
( i
, i
, DS { sumD = x + sumD ls + sumD sr
, len = 1 + len ls + len sr
, res = x - avg i
}
)

The implementation is not too much more difficult than the attribute grammar solution. We don't have the clean separation of concerns, but adding another attribute only means adding another field in DI or DS depending on whether it's inherited or synthesized.

Oh, but we're not done! Where's the actual average generated? Ah right, that's fed to the root inherited attribute.

t2 = let val = fromList diffAlg (DI { avg = a }) [1,4,1.5,3.5,2,3,2.5]
s = sresult val
a = sumD s / len s
in val

Here's another example of circular programming. Due to the way we implemented the application of the algebra, we can take advantage of lazy evaluation to ensure that the sum and length (and thus average) are incrementally computed and, as a result, the difference (res) is determined as needed for each node.

2009-02-15

Incremental fold, a design pattern

I recently read the article "How to Refold a Map" by David F. Place in The Monad.Reader Issue 11. I've been thinking about incremental algorithms in Haskell for some time, and I realized that Place has written a specific instance (and optimization) of a more general concept: the incremental fold.

In this article, I demonstrate a design pattern for converting a datatype and related functions into an incremental fold. The pattern is not difficult to comprehend, but it would be nice to improve upon it. I explore a few improvements and issues with those improvements. Ultimately, I'd like to see this functionality in a program instead of a design pattern.

Note: This is a literate Haskell article. You can copy the text of the entire article, paste it into a new file called IncrementalTreeFold.lhs, and load it into GHCi.


> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE ScopedTypeVariables #-}

> module IncrementalTreeFold where
> import Prelude hiding (elem)
> import qualified Data.Char as Char (ord)

Introducing a Typical Binary Tree

Before we get to the conversion, let's choose an appropriate datatype. Place adapted the Map type used in Data.Map (or Set in Data.Set). To simplify my presentation, I will use an ordered binary tree with labeled nodes.


> data Tree a
>   = Tip
>   | Bin a (Tree a) (Tree a)
>   deriving Show

Next, let's introduce some useful functions. An incremental fold is not necessarily like applying a fold function (a.k.a. a catamorphism, not a crush function that has become known as a fold) to a value directly. Instead, as I will later show, it integrates into existing functions that manipulate values. That said, we should have some functions for building Trees. Here is the beginning of a Tree API. (There are a number of other operations, e.g. delete and lookup, that can easily be added but do not contribute much to the discussion.)

empty builds a tree with no elements.


> empty :: (Ord a) => Tree a
> empty = Tip

singleton builds a tree with a single element.


> singleton :: (Ord a) => a -> Tree a
> singleton x = Bin x Tip Tip

insert puts a value in the appropriate place given a left-to-right ordering of values in the tree.


> insert :: (Ord a) => a -> Tree a -> Tree a
> insert x t =
>   case t of
>     Tip ->
>       singleton x
>     Bin y lt rt ->
>       case compare x y of
>         LT -> Bin y (insert x lt) rt
>         GT -> Bin y lt (insert x rt)
>         EQ -> Bin x lt rt

fromList creates a tree from a list of values.


> fromList :: (Ord a) => [a] -> Tree a
> fromList = foldr insert empty

elem determines if a value is an element of a tree.


> elem :: (Ord a) => a -> Tree a -> Bool
> elem x t =
>   case t of
>     Tip ->
>       False
>     Bin y lt rt ->
>       case compare x y of
>         LT -> elem x lt
>         GT -> elem x rt
>         EQ -> True

Now, using our library of sorts, we can create binary search tree and check if a value is in the tree.


> test1 = 37 `elem` fromList [8,23,37,82,3]

Tree Folds

Suppose that we now want the size of the tree. For good abstraction and high reuse, we create a fold function.


> data Alg a b = Alg { ftip :: b, fbin :: a -> b -> b -> b }

> fold :: Alg a b -> Tree a -> b
> fold alg = go
>   where
>     go Tip           = ftip alg
>     go (Bin x lt rt) = fbin alg x (go lt) (go rt)

fold allows us to write a simple size function.


> size :: Tree a -> Int
> size = fold (Alg 0 (\_ lr rr -> 1 + lr + rr))

I use the datatype Alg here to contain the algebra of the fold. In size, we simply replace each constructor in the algebra of Tree with a corresponding element from the algebra of integer addition. Since you're reading this article, you're probably a Haskell programmer and already familiar with the sorts of functions that can be written with folds. Here are a few others.


> filter :: (a -> Bool) -> Tree a -> [a]
> filter f = fold (Alg [] (\x lr rr -> if f x then [x] else [] ++ lr ++ rr))

> ord :: Tree Char -> Tree Int
> ord  = fold (Alg Tip (\x lt rt -> Bin (Char.ord x) lt rt))

Incremental Change

Now that we have a grasp on using a fold on a datatype, I would like to show how to extend my binary tree "library" defined above to support an incremental fold. The incremental fold can (I believe) do everything a traditional fold can do, but it does it during Tree construction instead of externally in a separate function. This means that every time we produce a new Tree (via singleton, insert, or fromList for example), we get a new result of the incremental fold.

Transforming our library into an incremental calculating machine involves several steps. The first step is extending the datatype to hold the incremental result. Since we want to be polymorphic in the result type, we add a type parameter r to the Tree type constructor. And since each constructor may possibly have an incremental result, it must also be extended with a place holder for r.


> data Tree' a r
>   = Tip' r
>   | Bin' a (Tree' a r) (Tree' a r) r
>   deriving Show

For convenience and possibly to hide the modified constructors from the outside world, we add a function for retrieving the increment result.


> result' :: Tree' a r -> r
> result' (Tip' r)       = r
> result' (Bin' _ _ _ r) = r

As I mentioned earlier, the machinery of the fold is now in the construction. To implement this second step, we use smart constructors.


> tip' :: Alg a r -> Tree' a r
> tip' alg = Tip' (ftip alg)

> bin' :: Alg a r -> a -> Tree' a r -> Tree' a r -> Tree' a r
> bin' alg x lt rt = Bin' x lt rt (fbin alg x (result' lt) (result' rt))

Both tip' and bin' construct new values of Tree' a r and using the algebra, calculate the incremental result to be stored in each value. Thus, the actual fold operation is "hidden" in the construction of values.

Now, in order to put the incremental fold to work in a function, we simply (1) add the algebra to the function's arguments, (2) add an wildcard pattern for the result field in constructor patterns, and (3) replace applications of the constructors with that of their incremental cousins. Here's an example of the singleton and insert functions modified for incremental folding.


> singleton' :: (Ord a) => Alg a r -> a -> Tree' a r
> singleton' alg x = bin' alg x (tip' alg) (tip' alg)

> insert' :: (Ord a) => Alg a r -> a -> Tree' a r -> Tree' a r
> insert' alg x t =
>   case t of
>     Tip' _ ->
>       singleton' alg x
>     Bin' y lt rt _ ->
>       case compare x y of
>         LT -> bin' alg y (insert' alg x lt) rt
>         GT -> bin' alg y lt (insert' alg x rt)
>         EQ -> bin' alg x lt rt

Comparing these functions with the initial versions, we see that the changes are readily apparent. Modify every other Tree'-hugging function in the same manner, and you have a design pattern for an incremental fold!

Improving the Incremental Implementation

Of course, you may complain that there's some amount of boilerplate work involved. For example, we have to add this alg argument everywhere. Let's try to replace that with a type class.


< class Alg'' a r where
<   ftip'' :: r
<   fbin'' :: a -> r -> r -> r

And we redefine our smart constructors.


< tip'' :: (Alg' a r) => Tree' a r
< tip'' = Tip' ftip''

But there's a problem here! GHC reports that it Could not deduce (Alg'' a r) from the context (Alg'' a1 r). The poor compiler cannot infer the type of the parameter a since ftip'' has only type r.

Let's try another version of the class. In this one, we add a dummy argument to ftip' in order to force GHC to correctly infer the full type.


> class Alg'' a r where
>   ftip'' :: a -> r
>   fbin'' :: a -> r -> r -> r

> tip'' :: forall a r . (Alg'' a r) => Tree' a r
> tip'' = Tip' (ftip'' (undefined :: a))

> bin'' :: (Alg'' a r) => a -> Tree' a r -> Tree' a r -> Tree' a r
> bin'' x lt rt = Bin' x lt rt (fbin'' x (result' lt) (result' rt))

This provides one (not very pretty) solution to the problem. I'm able to get around the need to require an argument for tip'' by using lexically scoped type variables. But it doesn't remove the ugly type from ftip'', and the user is forced to ignore it when writing an instance.

The functions can now be rewritten with the Alg'' constraint.


> empty'' :: (Ord a, Alg'' a r) => Tree' a r
> empty'' = tip''

> singleton'' :: (Ord a, Alg'' a r) => a -> Tree' a r
> singleton'' x = bin'' x tip'' tip''

> insert'' :: (Ord a, Alg'' a r) => a -> Tree' a r -> Tree' a r
> insert'' x t =
>   case t of
>     Tip' _ ->
>       singleton'' x
>     Bin' y lt rt _ ->
>       case compare x y of
>         LT -> bin'' y (insert'' x lt) rt
>         GT -> bin'' y lt (insert'' x rt)
>         EQ -> bin'' x lt rt

> fromList'' :: (Ord a, Alg'' a r) => [a] -> Tree' a r
> fromList'' = foldr insert'' empty''

These versions look more like the non-incremental implementations above. To use them, we need to declare an instance of Alg'' with an appropriate algebra for our desired incremental result. Here's how we would rewrite size.


> newtype Size = Size { unSize :: Int }

> instance Alg'' a Size where
>   ftip'' _ = Size 0
>   fbin'' _ lr rr = Size (1 + unSize lr + unSize rr)

> size'' :: Tree' a Size -> Int
> size'' = unSize . result'

> test2 = size'' $ insert'' 's' $ insert'' 'p' $ insert'' 'l' $ fromList'' "onderzoek"

Size is still defined as a fold, but the result is incrementally built with each application of a library function. This can have a nice performance boost as Place also found in his article.

Generic Thoughts

On reflecting over my implementation, I really don't like the dummy arguments required by constructors like Tip. There are other approaches to dealing with this, but I haven't yet found a better one. If you use a functional dependency such as r -> a in the definition of Alg'', then a would be uniquely determined by r. In the case of size'', we would have to specify a concrete element type for Tree' instead of the parameter a (or use undecidable instances). Perhaps, dear reader, you might have a better solution?

The incremental fold pattern is great for documenting an idea, but it has several downsides: (1) The obvious one is that it requires modifying a datatype and code. This is not always desirable and often not practical. (2) Implementing an incremental fold can involve a lot of boilerplate code and many, small changes that are monotonous and boring. It's very easy to make mistakes. In fact, I made several copy-paste-and-forget-to-change errors while writing this article.

As Jeremy Gibbons and others have shown us, design patterns are better as programs. Since the code is so regular, it seems very receptive to some generic programming. I plan to explore this further, possibly using one of the many generics libraries available for Haskell or designing a new one. Suggestions and feedback are welcome.

Update 2008-03-30: The source code for this entry is now available at GitHub.