# splonderzoek

Geeky things and other research from a work in progress

## 2009-03-31

### Incremental attributes

I previously wrote about a design pattern I called an incremental fold (or catamorphism). I described it as a design pattern, because, as written, it cannot be factored into code. That is, it is a pattern for designing part of a program.

The pattern I presented is a useful way to implement functions that can be expressed as catamorphisms such that the result is incrementally computed for each operation on a value of a datatype. Unlike a fold defined directly as a function, which traverse an entire value, the incremental fold only traverses parts that are updated. For some values, this may provide a performance benefit.

This post shows how we can adapt the above idea to a more general concept that I'm calling incremental attributes. It's more general in that incremental attributes can express the incremental fold as well as other flows of incremental computation.

#### Review of the incremental fold

First, let's review the implementation of the incremental fold.

[Note: This is not a literate Haskell article, because there's too much duplicated code required; however, all source files are available.]

``module IncrementalAttributes1Synthesized wheredata Tree a s  = Tip s  | Bin a (Tree a s) (Tree a s) s  deriving Showdata Alg a s  = Alg { stip :: s, sbin :: a -> s -> s -> s }result :: Tree a s -> sresult (Tip s)       = sresult (Bin _ _ _ s) = stip :: Alg a s -> Tree a stip alg = Tip (stip alg)bin :: Alg a s -> a -> Tree a s -> Tree a s -> Tree a sbin alg x lt rt = Bin x lt rt (sbin alg x (result lt) (result rt))empty :: (Ord a) => Alg a s -> Tree a sempty = tipsingleton :: (Ord a) => Alg a s -> a -> Tree a ssingleton alg x = bin alg x (tip alg) (tip alg)insert :: (Ord a) => Alg a s -> a -> Tree a s -> Tree a sinsert alg x t =  case t of    Tip _ ->      singleton alg x    Bin y lt rt _ ->      case compare x y of        LT -> bin alg y (insert alg x lt) rt        GT -> bin alg y lt (insert alg x rt)        EQ -> bin alg x lt rtfromList :: (Ord a) => Alg a s -> [a] -> Tree a sfromList alg = foldr (insert alg) (empty alg)heightAlg :: Alg a IntegerheightAlg = Alg 0 (\_ x y -> 1 + max x y)t1 = fromList heightAlg "azbycx"``

This will be the starting point for our discussion. We have a basic binary tree with an algebra type that gives the fold functions, `stip` and `sbin`. The application of the algebra is blended into the utility functions just as it was with the incremental fold. I have chosen to keep the representation simple, so the algebra is passed around as an argument to the functions. This could, of course, be done with type classes. Lastly, we have an example algebra that determines the height of a tree. To get the height of the example `t1`, simply type `result t1` after loading this file into GHCi.

#### Incrementally inherited attributes

Suppose that, instead of height, we wanted to incrementally compute the depth of every node. Thus, we would attach a value to each node that stored its distance from the root. We can't do that with the above implementation, because attributes are only fed "upwards" or from the leaves to the root. We have no way of passing information downwards. The solution is to use inherited attributes.

Before presenting the code, here is a little side note. You may notice the use of the words synthesized and inherited here. The terminology comes from the study of attribute grammars, extending context-free grammars to support semantic operations. Wouter Swierstra wrote a great tutorial on attribute grammars in Haskell for The Monad Reader in 2005. In fact, I use an example from there at the end of this article. You can think of synthesized as "produced by the children for the parent" and inherited as "passed down from the parent to the children."

As you can now imagine, inherited attributes will allow us to bring information to the leaves. Many of the changes to the code are trivial, so we ignore them. The relevant changes are the following:

``data Alg a i  = Alg { itip :: i -> i, ibin :: a -> i -> i }tip :: Alg a i -> i -> Tree a itip alg i = Tip (itip alg i)bin :: Alg a i -> i -> a -> Tree a i -> Tree a i -> Tree a ibin alg i x lt rt = Bin x (update i lt) (update i rt) i  where    update i' t =      case t of        Tip _ ->          tip alg i'        Bin x lt rt _ ->          let s = ibin alg x i' in          Bin x (update s lt) (update s lt) s``

The datatype `Alg` (that I'm still calling the algebra, though that may not be proper use of the category theoretical term) now has functions that take an inherited attribute from a parent and create a new inherited attribute to be stored with the node and passed on to its children. The change to `bin` is the more complicated of the changes, because once a `Bin` constructor is constructed, all of its child nodes must be updated with new inherited values.

To implement an algebra for depth, we do the following:

``depthAlg :: Alg a IntdepthAlg = Alg (+1) (const (+1))t1 = fromList depthAlg 0 "azbycx"``

Load the code and check the result to see for yourself what it looks like.

#### One is not enough

Now that we have use cases for synthesized and inherited incremental attributes, we're going to want both. Fortunately, that's not too difficult. The new datatypes are simply a product of the two previous:

``data Tree a i s  = Tip i s  | Bin a (Tree a i s) (Tree a i s) i s  deriving Showdata Alg a i s  = Alg { itip :: i -> i, ibin :: a -> i -> i,          stip :: s,      sbin :: a -> s -> s -> s }``

You can now see why I was using `s` and `i` to distinguish the types of the attributes. Again, most of the code modifications are trivial, and the `bin` function needs special attention.

``bin :: Alg a i s -> i -> a -> Tree a i s -> Tree a i s -> Tree a i sbin alg i x lt rt =  Bin x (update i lt) (update i rt) i (sbin alg x (sresult lt) (sresult rt))  where    update i' t =      case t of        Tip _ _ ->          tip alg i'        Bin y ylt yrt _ s ->          let j = ibin alg y i' in          Bin y (update j ylt) (update j yrt) j s``

Defining an algebra for both depth and height is no more difficult than defining each alone.

``depthAndHeightAlg :: Alg a Int IntdepthAndHeightAlg = Alg (+1) (const (+1)) 1 (\_ x y -> 1 + max x y)``

#### Feedback

You probably know where this is going by now. There's that famous saying, "what goes down must come up." We want more than just two separate directions of information flow. We want to utilize the information flowing toward the leaves to help determine that which flows up to the root or vice versa. A simple example of this is a counter that annotates each node with its rank in an in-order traversal. This can't be done with just synthesized or inherited attributes, because it depends on a combination of input from the parent, children, and siblings for each node.

The code is similar to the previous implementation, but the differences in `Alg` are important.

``data Alg a i s  = Alg { ftip :: i -> s, fbin :: a -> i -> s -> s -> (i, i, s) }``

Each node now has a single `i`nherited attribute, because it has a single parent. We use the `s`ynthesized attributes to store a local result, so each constructor only has one as an output. For the `Bin` constructor, we have a pair of incoming synthesized values and a pair of outgoing inherited values. The left component in each pair is associated with the left child, and the right with the right child. This allows us to have information flow up from the synthesized attribute of the left child and down to the inherited attribute of the right or in the opposite direction.

The `bin` is again tricky to write correctly.

``bin :: Alg a i s -> i -> a -> Tree a i s -> Tree a i s -> Tree a i sbin alg i x lt rt = update i (Bin x lt rt undefined undefined)  where    update j t =      case t of        Tip _ _ ->          tip alg j        Bin y ylt yrt _ _ ->          let (li, ri, s) = fbin alg y j (sresult zlt) (sresult zrt)              zlt = update li ylt              zrt = update ri yrt          in  Bin y zlt zrt j s``

Notice the circular programming here. The definition and uses of, for example, `li` and `zlt` show that we could easily loop infinitely. This depends on how the specific algebra functions are implemented. Here is the "counter example":

``newtype CounterI = CI { cntI :: Int } deriving Showdata CounterS = CS { size :: Int, cntS :: Int } deriving ShowcounterAlg :: Alg a CounterI CounterScounterAlg = Alg ft fb  where    ft :: CounterI -> CounterS    ft i = CS { size = 1, cntS = cntI i }    fb :: a -> CounterI -> CounterS -> CounterS -> (CounterI, CounterI, CounterS)    fb _ i ls rs =      ( i                                   -- left      , CI { cntI = 1 + cntI i + size ls }  -- right      , CS { size = 1 + size ls + size rs           , cntS = cntI i + size ls }      )t1 = fromList counterAlg (CI { cntI = 0 }) "azbycx"``

I've relied heavily on record syntax to document the flow of information. Notice in `fb` how the `i` is directly inherited by the left child and how the right child inherits the new count that depends on the size of the left subtree and the inherited count of its parent. As shown in this example, the dependency flow must be unidirectional for one desired result. But there's no reason we can't go up, down, and then up again (for example).

#### Revisiting the diff problem.

As I mentioned, Wouter wrote a good introduction to attribute grammars in Haskell (which I highly recommend that you read). He focuses on the use of the UUAG system to generate code for solving problems that are harder to solve with traditional functional programming techniques. He describes the problem as follows:

Suppose we want to write a function `diff :: [Float] -> [Float]` that given a list `xs`, calculates a new list where every element `x` is replaced with the difference between `x` and the average of `xs`. Similar problems pop up in any library for performing statistical calculations.

Great problem! And we can solve it using incremental attributes in Haskell instead of in UUAG's attribute grammar syntax.

``newtype DiffI = DI { avg :: Float } deriving Showdata DiffS = DS { sumD :: Float, len :: Float, res :: Float } deriving ShowdiffAlg :: Alg Float DiffI DiffSdiffAlg = Alg ft fb  where    ft :: DiffI -> DiffS    ft i =      DS { sumD = 0         , len = 0         , res = 0         }    fb :: Float -> DiffI -> DiffS -> DiffS -> (DiffI, DiffI, DiffS)    fb x i ls sr =      ( i      , i      , DS { sumD = x + sumD ls + sumD sr           , len = 1 + len ls + len sr           , res = x - avg i           }      )``

The implementation is not too much more difficult than the attribute grammar solution. We don't have the clean separation of concerns, but adding another attribute only means adding another field in `DI` or `DS` depending on whether it's inherited or synthesized.

Oh, but we're not done! Where's the actual average generated? Ah right, that's fed to the root inherited attribute.

``t2 = let val = fromList diffAlg (DI { avg = a }) [1,4,1.5,3.5,2,3,2.5]         s = sresult val         a = sumD s / len s     in  val``

Here's another example of circular programming. Due to the way we implemented the application of the algebra, we can take advantage of lazy evaluation to ensure that the sum and length (and thus average) are incrementally computed and, as a result, the difference (`res`) is determined as needed for each node.

## 2009-03-06

### Experiments with EMGM: Emacs org files

I've been meaning to write some things about EMGM for a while, but I hadn't found one of those round tuits as of yet. Until now.

David Miani is working on a Haskell library for interacting with emacs org files. "For those that do not know, an org file is a structured outline style file that has nested headings, text, tables and other elements," says David. He has a collection of datatypes for building and manipulating these files.

David seeks a better way to do what he's doing. (It's a noble goal. I hope you keep doing it.) To return to his words: "While writing an OrgFile is fairly easy, reading (and accessing inner parts) of an org file is very tedious, and modifying them is horrendous." He goes on to give an example that I'll describe more below.

When I read the above statement, I was expecting that generic programming could help him out. When I saw his code, I knew it was a perfect use case. That's what inspired this entry, the first use case for EMGM from Haskell CafĂ©.

First, this is a literate Haskell post, so we run through the usual preliminaries.

``> {-# LANGUAGE TemplateHaskell       #-}> {-# LANGUAGE MultiParamTypeClasses #-}> {-# LANGUAGE FlexibleContexts      #-}> {-# LANGUAGE FlexibleInstances     #-}> {-# LANGUAGE OverlappingInstances  #-}> {-# LANGUAGE UndecidableInstances  #-}> > module Org where> > import Text.Regex.Posix``

We import Generics.EMGM.Derive for the deriving portion of EMGM. This is not exported from the main body of the library, because it has a lot of symbols only needed for building a representation. We'd rather not clog up your symbol list if possible.

``> import Generics.EMGM.Derive``

In general, I'd recommend doing the deriving in a separate module and only export the datatype and generated type class instances. Then, in other modules, you can use EMGM functions or write your own. However, this being a demonstration, we will also import Generics.EMGM here to use the available functions.

``> import qualified Generics.EMGM as G``

The following collection of types are copied from David's post. They describe the structure of an org file.

``> type Line = Int> type Column = Int> > data FilePosition = FilePosition Line Column> > data WithPos a = WithPos { filePos :: FilePosition, innerValue :: a }> > data OrgTableP = OrgTableP [WithPos OrgTableRow]> > data OrgFileElementP>   = TableP OrgTableP>   | ParagraphP String>   | HeadingP OrgHeadingP> > data OrgHeadingP = OrgHeadingP Int String [WithPos OrgFileElementP]> > data OrgFileP = OrgFileP [WithPos OrgFileElementP]> > data OrgTableRow>   = OrgTableRow [String]>   | OrgTableRowSep ``

In order to use EMGM, we must generate the values and instances used by the library. This is simple with one Template Haskell (TH).

``> \$(deriveMany >   [ ''FilePosition>   , ''WithPos>   , ''OrgTableP>   , ''OrgHeadingP>   , ''OrgFileElementP>   , ''OrgFileP>   , ''OrgTableRow>   ])``

Note that in this case, we had to use `deriveMany` for a list of type names. For the most part, we'd probably use `derive`; however, the datatypes `OrgHeadingP` and `OrgFileElementP` are mutually recursive. If we use `derive` for each type, then some values are generated that are naturally also muturally recursive. Apparently, TH expects all symbols to be available on a per-splice basis. This means that we can't `\$(derive ''OrgFileElementP)` and then `\$(derive ''OrgHeadingP)` or vice versa. We have to derive them simultaneously, so that both sets of symbols are available at the same time.

David gives the example of reading "the description line for the project named 'Project14'" in the following file:

``````* 2007 Projects
** Project 1
Description: 1
Tags: None
** Project 2
Tags: asdf,fdsa
Description: hello
* 2008 Projects
* 2009 Projects
** Project14
Tags: RightProject
Description: we want this
``````

He then provides some messy code to perform it. (No offense meant. Mine would've looked no better.) I'll skip the code, since I couldn't get it to compile as provided.

Our solution using EMGM follows:

``> projDesc :: String -> OrgFileP -> Maybe String> projDesc name file = do>   hdg <- G.firstr (headings name file)>   para <- firstPara hdg>   if para =~ "Description" then return para else Nothing> > headings :: String -> OrgFileP -> [OrgHeadingP]> headings name = filter check . G.collect>   where>     check (OrgHeadingP _ possible _) = name == possible> > firstPara :: OrgHeadingP -> Maybe String> firstPara hdg = paraStr =<< G.firstr (G.collect hdg)>   where>     paraStr (ParagraphP str) = Just str>     paraStr _                = Nothing``

Primarily, we take advantage of the functions `collect` and `firstr`. Here, `collect` is the key. It's type is `collect :: (Rep (Collect b) a) => a -> [b]`, and it returns a list of `b`s stored somewhere in a value of type `a`. This allows us to collect the `OrgHeadingP`s in an `OrgFileP` (`headings`) and the `OrgFileElementP`s in an `OrgHeadingP` (`firstPara`). Now, we don't have to build a bunch of functions that break down each of these types to get to their components.

Our use of `firstr` is simply the same as we would use the `Prelude` function `head`, except that `firstr` returns a `Maybe`: unlike `head`, it's a total function.

David's top-level function would now become this:

``> get14 :: OrgFileP -> Maybe String> get14 = projDesc "Project14"``

Well, this was a fun experiment with generic programming. I hope to do more in the future.

I want to thank David for bringing up this problem in the mailing list. Not only did I get to play more with EMGM, I also released an update to the library when I discovered the issue requiring `deriveMany`.

Update 2008-03-30: The source code for this entry is now available at GitHub.

## 2009-03-02

### Template Haskell 2.3 or Cabal 1.2? EMGM can't have both!

I'm about ready to give up. I would like EMGM to support both GHC 6.8 and 6.10 to allow for more potential uses. This effectively means it should build with Cabal version 1.2 (which is distributed with GHC 6.8) and 1.6 (distributed with GHC 6.10). I didn't think this would be a difficult problem, — shows you what little I know — but it has turned into a reasonably sized annoyance.

As of version 0.2, EMGM supports generating code for type representations using Template Haskell (TH). Since TH is now a dependency, I thought it would be handy to provide the representation values and instances for it. And so I do that here. If you look at the linked file, you'll notice an `#ifdef` surrounding `\$(derive ''Loc)`. This is because the `template-haskell` package includes a new `Loc` datatype in version 2.3, and in order to support versions 2.2 (for GHC 6.8) and 2.3 (for GHC 6.10), I needed to do some funky C preprocessor (CPP) stuff.

Well, I followed the advice given in this thread started by Jason Ducek with useful responses from Duncan Coutts. The result was this emgm.cabal file with a flag for determining which version of `template-haskell` was available and whether or not to define the necessary macro. I later learned that this does't work when attempting to cabal-install GHC 6.8, because `template-haskell-2.3` fails to correctly specify the version of base or GHC that it requires.

Now, to work around this problem, Duncan described how to hook into Cabal to get the actual package dependencies and insert the CPP option into the build info. It was not too difficult to figure this out. And in fact, the code for my `Setup.lhs` is in the appendix of this article in case it is later useful for me or someone else.

Unfortunately, as soon as I had this implemented, I discovered it didn't work in GHC 6.8.3/Cabal 1.2. There's a very minor difference in Cabal that simply breaks my code, and I don't know how to work around it. The difference is in `PackageIdentifier`:

``> data PackageIdentifier = PackageIdentifier { pkgName :: String, pkgVersion :: Version }``
``> data PackageIdentifier = PackageIdentifier { pkgName :: PackageName, pkgVersion :: Version }> newtype PackageName = PackageName String``

I need `PackageIdentifier` to determine which version of `template-haskell` is being used as a dependency. But I either use a `String` or a `PackageName` depending on which version of Cabal is used. I don't think there's a way to know which version of Cabal is used when building a `Setup.lhs` file.

As far as I can tell, my options are the following:

1. Hack more on the `Setup.lhs` to figure out a different way of dealing with the `template-haskell` issue.
2. Release for GHC 6.10 only. Note that the problem only occurs when mixing `cabal-install` and `template-haskell`. EMGM builds fine with GHC 6.8 in general.
3. Remove the TH deriving code and the CPP macro.
4. Leave things as they are and warn people about the issue. If/when `template-haskell` gets patched, it may fix the problem.

I'm probably going to go with the last option for now.

### Appendix

This is a literate Haskell `Setup.lhs` containing a build hook for passing a CPP option when a version of the `template-haskell` package greater than or equal to 2.3 is specified as a dependency. You might find it a useful example of using part of the Cabal library.

``> module Main (main) where``
``> import System.Cmd> import System.FilePath``
``> import Data.Version``
``> import Distribution.Simple> import Distribution.Simple.LocalBuildInfo> import Distribution.Simple.Program> import Distribution.Simple.Setup``
``> import Distribution.Package> import Distribution.PackageDescription``
``> main :: IO ()> main = defaultMainWithHooks hooks>   where>     hooks = simpleUserHooks { buildHook = buildHook' }``
``> -- Insert CPP flag for building with template-haskell versions >= 2.3.> buildHook' :: PackageDescription -> LocalBuildInfo -> UserHooks -> BuildFlags -> IO ()> buildHook' pkg lbi hooks flags =>   buildHook simpleUserHooks pkg (lbi { localPkgDescr = newPkgDescr }) hooks flags>   where``
``>     -- Old local package description>     oldPkgDescr = localPkgDescr lbi``
``>     -- New local package description>     newPkgDescr =>       case thVersion of>         Nothing ->>           oldPkgDescr>         Just version ->>           if version >= Version [2,3] []>           then>             oldPkgDescr>               { library = addThCppToLibrary (library oldPkgDescr)>               , executables = map addThCppToExec (executables oldPkgDescr)>               }>           else>             oldPkgDescr``
``>     -- Template Haskell package name>     thPackageName = "template-haskell"``
``>     -- template-haskell version>     thVersion = findThVersion (packageDeps lbi)``
``>     -- CPP options for template-haskell >= 2.3>     thCppOpt = "-DTH_LOC_DERIVEREP"``
``>     -- Find the version of the template-haskell package>     findThVersion []          = Nothing>     findThVersion (PackageIdentifier (PackageName name) version:ps)>       | name == thPackageName = Just version>       | otherwise             = findThVersion ps``
``>     -- Add the template-haskell CPP flag to a BuildInfo>     addThCppToBuildInfo :: BuildInfo -> BuildInfo>     addThCppToBuildInfo bi =>       bi { cppOptions = thCppOpt : cppOptions bi }``
``>     -- Add the template-haskell CPP flag to a library package description>     addThCppToLibrary :: Maybe Library -> Maybe Library>     addThCppToLibrary ml = do>       lib <- ml>       return (lib { libBuildInfo = addThCppToBuildInfo (libBuildInfo lib) })``
``>     -- Add the template-haskell CPP flag to an executable package description>     addThCppToExec :: Executable -> Executable>     addThCppToExec exec =>       exec { buildInfo = addThCppToBuildInfo (buildInfo exec) }``

P.S. I used the recently released pandoc 1.2 for this article. Nice highlighting, eh?

Update #1: Apparently, Blogger's feed and/or Planet Haskell aren't ready for the games I played with pandoc. Thus, I have removed the syntax highlighting. That's unfortunate, because it looked good on the web.

Update #2: Thanks to the magic of the internets and the fact that there are people much smarter than I, there is a solution. In hindsight, it's obvious (of course): dynamically typed programming!

Neil Mitchell sent me this little piece of code which does just the trick:

``````
> mkPackageName :: (Read a) => String -> a
> mkPackageName nm =
>   where
>     shownNm = show nm
``````

So, I plugged this into `thPackageName`, removed `PackageName` from the pattern in `findThVersion`, and—voilĂ !—it worked.