splonderzoek: 2009

2009-11-05

Draft: "Pull-Ups, Push-Downs, and Passing It Around: Exercises in Functional Incrementalization"

Update! The final version is now available.

Andres Löh, Johan Jeuring, and I have submitted a paper to IFL 2009.

Pull-Ups, Push-Downs, and Passing It Around: Exercises in Functional Incrementalization

Abstract Programs in functional programming languages with algebraic datatypes are often datatype-centric and use folds or fold-like functions. Incrementalization of such a program can significantly improve its performance. Functional incrementalization separates the recursion from the calculation and significantly reduces redundant computation. In this paper, we motivate incrementalization with a simple example and present a library for transforming programs using upwards, downwards, and circular incrementalization. We also give a datatype-generic implementation for the library and demonstrate the incremental zipper, a zipper extended with attributes.

This is the result of work that has previously been mentioned on this blog:

We would be happy to have any feedback.

Update! Here's the (completely messy, badly formatted, undocumented, and screwy) code used in the paper if you want to have a go at it.

2009-09-30

"Extensibility and type safety in formatting: the design of xformat" at the Dutch HUG

I'm finally getting around to posting the slides for my talk at the Dutch HUG on 11 September. After that, I was preparing my talk for IFL on incrementalization of datatype-centric programs. I arrived back in Utrecht yesterday morning and crashed mid-day.

You can find the slides on Scribd and on Github. Enjoy!

Update: Thanks to Tom Lokhorst, my talk is also on video.

2009-09-09

"Upwards and downwards accumulations on trees" translated into Haskell

I was reading "Upwards and downwards accumulations on trees" by Jeremy Gibbons, and it's written in the Bird-Meertens formalism (a.k.a. Squiggol) of yesteryear. Not that I have anything against people who can write and understand this stuff (in fact, I have a lot of respect for them), but for me, the inconsistent and seemingly arbitrary notation leaves something to be desired. In an attempt to understand what was actually being done, I translated most of the equations to Haskell code. If you plan on reading this paper, here's hoping I could jump-start your comprehension with this contribution.

The only real issue that I ran into while performing this translation is in the function s_fork_sl2. The second component of the catamorphism, ((⊙ ↟ ≪) ∘ ≪²), does not have the expected type (b, b) -> a -> (b, b) -> (b, b). After attempting to puzzle through how to unify the two, I eventually gave up and just came up with a function that seemed to do what I thought it should do, pairs g (x, _) _ (y, _) = (g x y, x) where g :: b -> b -> b is ⊙. But I'm not sure whether there's a bug in the definition in the paper or the one in my code. The code seems to work for a few tests. If you've already read this paper or you're at all interested in debugging Squiggol, I would be happy to learn how this works.

2009-08-06

Sharing on Google Reader

I share things regularly on Google Reader. They tend to be eclectic but sometimes technical and sometimes PLT- or Haskell-related. I would like to find out who else uses Reader. If you're interested in letting me and other people know about your own shared items, please post the link in the comments. We can perhaps create a cozy little Reader community.

2009-07-10

Fun and generic things to do with EMGM at the London HUG

I just gave a talk about EMGM at the London HUG. The slides can be found on Scribd or on Github.

Thanks to everybody who showed up! I'm very sorry I was so late. Thanks to Neil for filling in until I got there.

2009-06-23

RFC: Extensible, typed scanf- and printf-like functions for Haskell

I recently found myself inspired (and simultaneously frustrated as it usually happens), and I felt there was something truly missing from the collection of available code for Haskell. So, I sought to do something about it. Now, I would like some feedback on that work. But first, the story...

The inspiration came from none other than Oleg Kiselyov. Not too long ago, he sent out an email responding to some comments about a printf with Template Haskell. His safe and generic printf with C-like format string led me to think it would be nice if we had a safe and generic printf and scanf library. So I thought I could do that. I could take Oleg's code and polish it up and publish it.

I did take his code and play around with it for a while. In fact, what I did sits at its current state in Format.hs and FormatTest.hs. It was fun to play around in that area. I created a bunch of different descriptors for a wide variety of formats. And being the researcher of generic programming that I am, I wanted to make it generic. I wanted users to be able to add their own formats. As I got to thinking about it, I realized that this string format approach doesn't scale. There are only so many characters in the alphabet for one thing. And if I wanted to add alignment and spacing for some descriptors, then I need to create parsers for those. This is too much work for users to do for an extension, too.

The frustration then came. How do I improve on this? How do I make it more extensible? So I researched. With Google, of course. Eventually, I came upon Ralf Hinze's function pearl on "Formatting: a class act." That looked good. It's a type-indexed function that provides a safe way to extend for new types using a multiparameter type class with functional dependencies.

More playing with code ensued. I tried the approach using associated type synonyms because I like how they look (superficial, I suppose). Everything worked well enough, but occasionally I would run into a problem and have trouble debugging it. I eventually came to realize that a lot of those problems were due to the lack of visibility in the types. The type family approach hid the types behind unresolved synonyms. Since I couldn't see the final type, I was having trouble figuring out what I should do with the result. I learned that changing my class to use a functional dependency allowed me to see the resolved type. This helped me quite a bit. I still like how associated type synonym looked, but I gained a new appreciation for functional dependencies.

After working on showf, the printf-like function, for a while, I tried my had at a scanf-like function. At first, I tried to make it too much like showf without success. I wanted a variable-sized result for readf in the same way that showf had a variable number of arguments. In fact, that might still be possible. But for now, the input format descriptor directly determines the output's structure.

So, in the end, I came out with xformat. It has one module for showf and one for readf. It also has quite a few format descriptors. To give you an idea of what you can do, let me share a few examples.

Using the Text.XFormat.Show module:


module S where
import Text.XFormat.Show

s1 :: Int -> String
s1 = showf Int

-- Variable number of arguments mixed with constants
s2 :: String
s2 = showf ("Hello, " % String % Char) "World" '!'

-- Use tuples to group a format descriptor
s3 = showf ("The Answer is ", Int, ".") 42

-- Align right in a column width of 37.
s4 = showf (Align R 37 "Hello darkness, my old friend.")

Using the Text.XFormat.Read module:


{-# LANGUAGE TypeOperators #-}
module R where
import Text.XFormat.Read

r1 :: String -> Maybe Int
r1 = readf Int

-- Variable size format and output
r2 :: Maybe (String :%: (String :%: Char))
r2 = readf ("Hello, " % String % Char) "Hello, World!"

-- Use tuples to group a format descriptor
r3 = let Just (_, ans, _) = readf ("The Answer is ", Int, ".") "The Answer is 42."
     in ans

-- Extract the value in parentheses
r4 = readf (Wrap '(' Int ')') "(37)"

Now, finally to my request. I'd like some feedback on this library. Is the basic design reasonable? Can it be improved either aesthetically, performance-wise, or usability-wise? Any other comments on it? I'd like to go through some community improvement before committing it to Hackage.

I greatly appreciate any thoughts you might have.

Update: Soon after I posted this, I realized it didn't make much sense to ask for feedback when it's rather difficult to get access to the library. Thus, you may now find the package on the xformat Hackage page.

2009-04-12

Latest on the incremental fold and attributes

Inspired by Edward Kmett's adaptation of my incremental fold, I have developed new versions of the incremental fold and the incremental attributes. These use a style very much based on Edward's fixed-point representation (i.e. the Mu datatype). Normally, I would discuss the code in more depth, but my available time for this post is limited. So, I will just introduce the code and point to it for a more in-depth perusal if you so desire.

Fixed Point for an Incremental Fold

First, we have the fixed-point incremental fold. It is similar to Edward's, but rather than using a different fixed-point datatype (his (:>)), I use a datatype embedded in the typical Mu.

newtype Mu f = In { out :: f (Mu f) }

data Ext z f r = Ext { tag :: z, fun :: f r }

type EMu z f = Mu (Ext z f)

Also, I have renamed remember and forget to ein and eout, because they are simply the "in" and "out" for this extended fixed-point representation.

ein' :: (Functor f) => (f z -> z) -> f (EMu z f) -> EMu z f
ein' phiz x = emu (phiz (fmap result x)) x

ein :: (Algebra f z) => f (EMu z f) -> EMu z f
ein = ein' alg

eout :: EMu z f -> f (EMu z f)
eout = fun . out

As a learning experience, I implemented the catamorphism, anamorphism, hylomorphism, paramorphism, and zygomorphism for EMu. See the code for details.

To experiment with incremental folds, I implemented functions for three fixed-point datatypes, Nat, Tree, and List. Since we've been talking about trees, here's what the datatype looks like:

data TreeF a r = Bin a r r | Tip deriving (Eq, Ord, Show, Read)
 
instance Functor (TreeF a) where ...
 
type  Tree   a =  Mu   (TreeF a)
type ETree z a = EMu z (TreeF a)

Along with each datatype, there are functions that would potentially be part of a library. Though not necessary, I implemented some of these as catamorphisms, anamorphisms, and paramorphisms. It's actually quite nice to see these in use, especially when you can compare it to the implementation with implicit recursion. For an example, compare insert and insert_rec for trees.

Any of these datatypes and functions now support an incremental fold for a given algebra of type f a -> a where f is the functor of the datatype. I have included a few example algebra implementations.

Again, you can find the code for incremental fold on a fixed point here.

Fixed Point for Incremental Attributes

As I mentioned before, the fold is a specific instance of incremental attributes. Starting with the code for the incremental fixed-point fold, I put together a generalized version of incremental attributes for fixed-point datatypes. Much of the code is the same, so let me highlight the differences from the above.

We first take Ext and extend it further with an inherited attribute. Recall that the incremental catamorphism is strictly synthesized (from the children to the parent), and to generalize, we need to pass attributes from the parent to the children. This gives us Att, and the adaptation of EMu is now AMu

data Att i s f r = Att { itag :: i , stag :: s , fun :: f r }
 
type AMu i s f = Mu (Att i s f)

This causes our "in" and "out" isomorphism to be quite different.

ain' :: (Functor f, Zippable f) => (i -> f s -> (s, Maybe (f i))) -> i -> f (AMu i s f) -> AMu i s f
ain' rho i x = In (Att i s y)
  where
    fs = fmap sresult x
    (s, fi) = rho i fs
    push j = ain' rho j . fun . out
    y = case fi of
          Nothing -> x
          Just fj -> fromMaybe x (zipWith push fj x)
 
ain :: (AAlgebra f i s) => i -> f (AMu i s f) -> AMu i s f
ain = ain' aalg
 
aout :: AMu i s f -> (f (AMu i s f), i)
aout = fork fun itag . out

Looking at the above, we need a zipWith for each datatype, and we have a different algebra as well. The type of the algebra is really the key to understanding what an implementation of incremental attributes does.

class (Functor f, Zippable f) => AAlgebra f i s where
  aalg :: i -> f s -> (s, Maybe (f i))

It says that, given an inherited attribute (from the parent) and a functor of synthesized attributes (from the children), an algebra produces a pair of a synthesized attribute (for the parent) and a functor of inherited attributes (for the children). The Maybe is just an added convenience to allow synthesizing-only algebras the pleasure of not having to produce the inheritable functor.

I have provided much of the same set of examples in this module as in the fold one. Noticeably different, however, is the addition of the float differencing implementation and a counter that ranks in-order nodes. Both were described in the post on incremental attributes. It's also worth pointing out that several of the morphisms and algebras had to change due to the inherited attribute that must be provided as input.

Well, that's the current story on incremental attributes. I'm greatly appreciative to Edward Kmett for his article. I'm also currently working on the generic thoughts behind the idea. Perhaps there will be more to come...

Haskell mode for Vim on a Mac

I set up Claus Reinke's Haskell mode for Vim today. Based on the documentation and code, it doesn't appear to have gotten much exposure to a Mac. And I didn't find anybody else describing what to do. So, this is my little contribution towards helping those Mac+Vim users that want to use it.

As described, it's simplest to set up using the vimball. Simply download the latest vba file linked from the project page, and open it in Vim:


wget http://projects.haskell.org/haskellmode-vim/vimfiles/haskellmode-20090410.vba
vim haskellmode-20090410.vba

Once in Vim, source the file:


:source %

Then, quit and open up your $VIMINIT file (usually .vimrc or _vimrc). Follow the directions on the project page, but use these lines for setting up the Haddock browser variable.


" Configure browser for haskell_doc.vim
let g:haddock_browser = "open"
let g:haddock_browser_callformat = "%s %s"

The Mac OS X open command uses the default browser to open URLs, and the internals of the Vim script use URLs for Haddock pages and such. So, the above settings will tell Haskell mode to open any URL in the default browser.

If you decide you don't want to use your default browser, you can then, for example, use Firefox instead of Safari.


let g:haddock_browser = "open"
let g:haddock_browser_callformat = "%s -a Firefox %s"

Note that we can't change g:haddock_browser here. The script is using executable() to verify g:haddock_browser, so we can't add flags to it.

And that's it! Now, you can quit and open up a Haskell file to play with it. Hopefully, your GHC build has the documentation for the libraries and the user guide, so you can take advantage of the undocumented :Doc command. Unfortunately, I'm using the MacPorts GHC, and it doesn't build the user guide documentation.

If you're not sure whether you should set up Haskell mode for Vim, check out these nice screencasts that demonstrate its power (and the :Doc command).

2009-03-31

Incremental attributes

I previously wrote about a design pattern I called an incremental fold (or catamorphism). I described it as a design pattern, because, as written, it cannot be factored into code. That is, it is a pattern for designing part of a program.

The pattern I presented is a useful way to implement functions that can be expressed as catamorphisms such that the result is incrementally computed for each operation on a value of a datatype. Unlike a fold defined directly as a function, which traverse an entire value, the incremental fold only traverses parts that are updated. For some values, this may provide a performance benefit.

This post shows how we can adapt the above idea to a more general concept that I'm calling incremental attributes. It's more general in that incremental attributes can express the incremental fold as well as other flows of incremental computation.

Review of the incremental fold

First, let's review the implementation of the incremental fold.

[Note: This is not a literate Haskell article, because there's too much duplicated code required; however, all source files are available.]

module IncrementalAttributes1Synthesized where

data Tree a s
  = Tip s
  | Bin a (Tree a s) (Tree a s) s
  deriving Show

data Alg a s
  = Alg { stip :: s, sbin :: a -> s -> s -> s }

result :: Tree a s -> s
result (Tip s)       = s
result (Bin _ _ _ s) = s

tip :: Alg a s -> Tree a s
tip alg = Tip (stip alg)

bin :: Alg a s -> a -> Tree a s -> Tree a s -> Tree a s
bin alg x lt rt = Bin x lt rt (sbin alg x (result lt) (result rt))

empty :: (Ord a) => Alg a s -> Tree a s
empty = tip

singleton :: (Ord a) => Alg a s -> a -> Tree a s
singleton alg x = bin alg x (tip alg) (tip alg)

insert :: (Ord a) => Alg a s -> a -> Tree a s -> Tree a s
insert alg x t =
  case t of
    Tip _ ->
      singleton alg x
    Bin y lt rt _ ->
      case compare x y of
        LT -> bin alg y (insert alg x lt) rt
        GT -> bin alg y lt (insert alg x rt)
        EQ -> bin alg x lt rt

fromList :: (Ord a) => Alg a s -> [a] -> Tree a s
fromList alg = foldr (insert alg) (empty alg)

heightAlg :: Alg a Integer
heightAlg = Alg 0 (\_ x y -> 1 + max x y)

t1 = fromList heightAlg "azbycx"

This will be the starting point for our discussion. We have a basic binary tree with an algebra type that gives the fold functions, stip and sbin. The application of the algebra is blended into the utility functions just as it was with the incremental fold. I have chosen to keep the representation simple, so the algebra is passed around as an argument to the functions. This could, of course, be done with type classes. Lastly, we have an example algebra that determines the height of a tree. To get the height of the example t1, simply type result t1 after loading this file into GHCi.

Incrementally inherited attributes

Suppose that, instead of height, we wanted to incrementally compute the depth of every node. Thus, we would attach a value to each node that stored its distance from the root. We can't do that with the above implementation, because attributes are only fed "upwards" or from the leaves to the root. We have no way of passing information downwards. The solution is to use inherited attributes.

Before presenting the code, here is a little side note. You may notice the use of the words synthesized and inherited here. The terminology comes from the study of attribute grammars, extending context-free grammars to support semantic operations. Wouter Swierstra wrote a great tutorial on attribute grammars in Haskell for The Monad Reader in 2005. In fact, I use an example from there at the end of this article. You can think of synthesized as "produced by the children for the parent" and inherited as "passed down from the parent to the children."

As you can now imagine, inherited attributes will allow us to bring information to the leaves. Many of the changes to the code are trivial, so we ignore them. The relevant changes are the following:

data Alg a i
  = Alg { itip :: i -> i, ibin :: a -> i -> i }

tip :: Alg a i -> i -> Tree a i
tip alg i = Tip (itip alg i)

bin :: Alg a i -> i -> a -> Tree a i -> Tree a i -> Tree a i
bin alg i x lt rt = Bin x (update i lt) (update i rt) i
  where
    update i' t =
      case t of
        Tip _ ->
          tip alg i'
        Bin x lt rt _ ->
          let s = ibin alg x i' in
          Bin x (update s lt) (update s lt) s

The datatype Alg (that I'm still calling the algebra, though that may not be proper use of the category theoretical term) now has functions that take an inherited attribute from a parent and create a new inherited attribute to be stored with the node and passed on to its children. The change to bin is the more complicated of the changes, because once a Bin constructor is constructed, all of its child nodes must be updated with new inherited values.

To implement an algebra for depth, we do the following:

depthAlg :: Alg a Int
depthAlg = Alg (+1) (const (+1))

t1 = fromList depthAlg 0 "azbycx"

Load the code and check the result to see for yourself what it looks like.

One is not enough

Now that we have use cases for synthesized and inherited incremental attributes, we're going to want both. Fortunately, that's not too difficult. The new datatypes are simply a product of the two previous:

data Tree a i s
  = Tip i s
  | Bin a (Tree a i s) (Tree a i s) i s
  deriving Show

data Alg a i s
  = Alg { itip :: i -> i, ibin :: a -> i -> i,
          stip :: s,      sbin :: a -> s -> s -> s }

You can now see why I was using s and i to distinguish the types of the attributes. Again, most of the code modifications are trivial, and the bin function needs special attention.

bin :: Alg a i s -> i -> a -> Tree a i s -> Tree a i s -> Tree a i s
bin alg i x lt rt =
  Bin x (update i lt) (update i rt) i (sbin alg x (sresult lt) (sresult rt))
  where
    update i' t =
      case t of
        Tip _ _ ->
          tip alg i'
        Bin y ylt yrt _ s ->
          let j = ibin alg y i' in
          Bin y (update j ylt) (update j yrt) j s

Defining an algebra for both depth and height is no more difficult than defining each alone.

depthAndHeightAlg :: Alg a Int Int
depthAndHeightAlg = Alg (+1) (const (+1)) 1 (\_ x y -> 1 + max x y)

Feedback

You probably know where this is going by now. There's that famous saying, "what goes down must come up." We want more than just two separate directions of information flow. We want to utilize the information flowing toward the leaves to help determine that which flows up to the root or vice versa. A simple example of this is a counter that annotates each node with its rank in an in-order traversal. This can't be done with just synthesized or inherited attributes, because it depends on a combination of input from the parent, children, and siblings for each node.

The code is similar to the previous implementation, but the differences in Alg are important.

data Alg a i s
  = Alg { ftip :: i -> s, fbin :: a -> i -> s -> s -> (i, i, s) }

Each node now has a single inherited attribute, because it has a single parent. We use the synthesized attributes to store a local result, so each constructor only has one as an output. For the Bin constructor, we have a pair of incoming synthesized values and a pair of outgoing inherited values. The left component in each pair is associated with the left child, and the right with the right child. This allows us to have information flow up from the synthesized attribute of the left child and down to the inherited attribute of the right or in the opposite direction.

The bin is again tricky to write correctly.

bin :: Alg a i s -> i -> a -> Tree a i s -> Tree a i s -> Tree a i s
bin alg i x lt rt = update i (Bin x lt rt undefined undefined)
  where
    update j t =
      case t of
        Tip _ _ ->
          tip alg j
        Bin y ylt yrt _ _ ->
          let (li, ri, s) = fbin alg y j (sresult zlt) (sresult zrt)
              zlt = update li ylt
              zrt = update ri yrt
          in  Bin y zlt zrt j s

Notice the circular programming here. The definition and uses of, for example, li and zlt show that we could easily loop infinitely. This depends on how the specific algebra functions are implemented. Here is the "counter example":

newtype CounterI = CI { cntI :: Int } deriving Show
data CounterS = CS { size :: Int, cntS :: Int } deriving Show

counterAlg :: Alg a CounterI CounterS
counterAlg = Alg ft fb
  where

    ft :: CounterI -> CounterS
    ft i = CS { size = 1, cntS = cntI i }

    fb :: a -> CounterI -> CounterS -> CounterS -> (CounterI, CounterI, CounterS)
    fb _ i ls rs =
      ( i                                   -- left
      , CI { cntI = 1 + cntI i + size ls }  -- right
      , CS { size = 1 + size ls + size rs
           , cntS = cntI i + size ls }
      )

t1 = fromList counterAlg (CI { cntI = 0 }) "azbycx"

I've relied heavily on record syntax to document the flow of information. Notice in fb how the i is directly inherited by the left child and how the right child inherits the new count that depends on the size of the left subtree and the inherited count of its parent. As shown in this example, the dependency flow must be unidirectional for one desired result. But there's no reason we can't go up, down, and then up again (for example).

Revisiting the diff problem.

As I mentioned, Wouter wrote a good introduction to attribute grammars in Haskell (which I highly recommend that you read). He focuses on the use of the UUAG system to generate code for solving problems that are harder to solve with traditional functional programming techniques. He describes the problem as follows:

Suppose we want to write a function diff :: [Float] -> [Float] that given a list xs, calculates a new list where every element x is replaced with the difference between x and the average of xs. Similar problems pop up in any library for performing statistical calculations.

Great problem! And we can solve it using incremental attributes in Haskell instead of in UUAG's attribute grammar syntax.

newtype DiffI = DI { avg :: Float } deriving Show
data DiffS = DS { sumD :: Float, len :: Float, res :: Float } deriving Show

diffAlg :: Alg Float DiffI DiffS
diffAlg = Alg ft fb
  where

    ft :: DiffI -> DiffS
    ft i =
      DS { sumD = 0
         , len = 0
         , res = 0
         }

    fb :: Float -> DiffI -> DiffS -> DiffS -> (DiffI, DiffI, DiffS)
    fb x i ls sr =
      ( i
      , i
      , DS { sumD = x + sumD ls + sumD sr
           , len = 1 + len ls + len sr
           , res = x - avg i
           }
      )

The implementation is not too much more difficult than the attribute grammar solution. We don't have the clean separation of concerns, but adding another attribute only means adding another field in DI or DS depending on whether it's inherited or synthesized.

Oh, but we're not done! Where's the actual average generated? Ah right, that's fed to the root inherited attribute.

t2 = let val = fromList diffAlg (DI { avg = a }) [1,4,1.5,3.5,2,3,2.5]
         s = sresult val
         a = sumD s / len s
     in  val

Here's another example of circular programming. Due to the way we implemented the application of the algebra, we can take advantage of lazy evaluation to ensure that the sum and length (and thus average) are incrementally computed and, as a result, the difference (res) is determined as needed for each node.

2009-03-06

Experiments with EMGM: Emacs org files

I've been meaning to write some things about EMGM for a while, but I hadn't found one of those round tuits as of yet. Until now.

David Miani is working on a Haskell library for interacting with emacs org files. "For those that do not know, an org file is a structured outline style file that has nested headings, text, tables and other elements," says David. He has a collection of datatypes for building and manipulating these files.

David seeks a better way to do what he's doing. (It's a noble goal. I hope you keep doing it.) To return to his words: "While writing an OrgFile is fairly easy, reading (and accessing inner parts) of an org file is very tedious, and modifying them is horrendous." He goes on to give an example that I'll describe more below.

When I read the above statement, I was expecting that generic programming could help him out. When I saw his code, I knew it was a perfect use case. That's what inspired this entry, the first use case for EMGM from Haskell Café.

First, this is a literate Haskell post, so we run through the usual preliminaries.

> {-# LANGUAGE TemplateHaskell       #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE FlexibleContexts      #-}
> {-# LANGUAGE FlexibleInstances     #-}
> {-# LANGUAGE OverlappingInstances  #-}
> {-# LANGUAGE UndecidableInstances  #-}
> 
> module Org where
> 
> import Text.Regex.Posix

We import Generics.EMGM.Derive for the deriving portion of EMGM. This is not exported from the main body of the library, because it has a lot of symbols only needed for building a representation. We'd rather not clog up your symbol list if possible.

> import Generics.EMGM.Derive

In general, I'd recommend doing the deriving in a separate module and only export the datatype and generated type class instances. Then, in other modules, you can use EMGM functions or write your own. However, this being a demonstration, we will also import Generics.EMGM here to use the available functions.

> import qualified Generics.EMGM as G

The following collection of types are copied from David's post. They describe the structure of an org file.

> type Line = Int
> type Column = Int
> 
> data FilePosition = FilePosition Line Column
> 
> data WithPos a = WithPos { filePos :: FilePosition, innerValue :: a }
> 
> data OrgTableP = OrgTableP [WithPos OrgTableRow]
> 
> data OrgFileElementP
>   = TableP OrgTableP
>   | ParagraphP String
>   | HeadingP OrgHeadingP
> 
> data OrgHeadingP = OrgHeadingP Int String [WithPos OrgFileElementP]
> 
> data OrgFileP = OrgFileP [WithPos OrgFileElementP]
> 
> data OrgTableRow
>   = OrgTableRow [String]
>   | OrgTableRowSep

In order to use EMGM, we must generate the values and instances used by the library. This is simple with one Template Haskell (TH).

> $(deriveMany 
>   [ ''FilePosition
>   , ''WithPos
>   , ''OrgTableP
>   , ''OrgHeadingP
>   , ''OrgFileElementP
>   , ''OrgFileP
>   , ''OrgTableRow
>   ])

Note that in this case, we had to use deriveMany for a list of type names. For the most part, we'd probably use derive; however, the datatypes OrgHeadingP and OrgFileElementP are mutually recursive. If we use derive for each type, then some values are generated that are naturally also muturally recursive. Apparently, TH expects all symbols to be available on a per-splice basis. This means that we can't $(derive ''OrgFileElementP) and then $(derive ''OrgHeadingP) or vice versa. We have to derive them simultaneously, so that both sets of symbols are available at the same time.

David gives the example of reading "the description line for the project named 'Project14'" in the following file:

* 2007 Projects
** Project 1
Description: 1
Tags: None
** Project 2
Tags: asdf,fdsa
Description: hello
* 2008 Projects
* 2009 Projects
** Project14
Tags: RightProject
Description: we want this

He then provides some messy code to perform it. (No offense meant. Mine would've looked no better.) I'll skip the code, since I couldn't get it to compile as provided.

Our solution using EMGM follows:

> projDesc :: String -> OrgFileP -> Maybe String
> projDesc name file = do
>   hdg <- G.firstr (headings name file)
>   para <- firstPara hdg
>   if para =~ "Description" then return para else Nothing
> 
> headings :: String -> OrgFileP -> [OrgHeadingP]
> headings name = filter check . G.collect
>   where
>     check (OrgHeadingP _ possible _) = name == possible
> 
> firstPara :: OrgHeadingP -> Maybe String
> firstPara hdg = paraStr =<< G.firstr (G.collect hdg)
>   where
>     paraStr (ParagraphP str) = Just str
>     paraStr _                = Nothing

Primarily, we take advantage of the functions collect and firstr. Here, collect is the key. It's type is collect :: (Rep (Collect b) a) => a -> [b], and it returns a list of bs stored somewhere in a value of type a. This allows us to collect the OrgHeadingPs in an OrgFileP (headings) and the OrgFileElementPs in an OrgHeadingP (firstPara). Now, we don't have to build a bunch of functions that break down each of these types to get to their components.

Our use of firstr is simply the same as we would use the Prelude function head, except that firstr returns a Maybe: unlike head, it's a total function.

David's top-level function would now become this:

> get14 :: OrgFileP -> Maybe String
> get14 = projDesc "Project14"

Well, this was a fun experiment with generic programming. I hope to do more in the future.

I want to thank David for bringing up this problem in the mailing list. Not only did I get to play more with EMGM, I also released an update to the library when I discovered the issue requiring deriveMany.

Update 2008-03-30: The source code for this entry is now available at GitHub.

2009-03-02

Template Haskell 2.3 or Cabal 1.2? EMGM can't have both!

Updates below!

I'm about ready to give up. I would like EMGM to support both GHC 6.8 and 6.10 to allow for more potential uses. This effectively means it should build with Cabal version 1.2 (which is distributed with GHC 6.8) and 1.6 (distributed with GHC 6.10). I didn't think this would be a difficult problem, — shows you what little I know — but it has turned into a reasonably sized annoyance.

As of version 0.2, EMGM supports generating code for type representations using Template Haskell (TH). Since TH is now a dependency, I thought it would be handy to provide the representation values and instances for it. And so I do that here. If you look at the linked file, you'll notice an #ifdef surrounding $(derive ''Loc). This is because the template-haskell package includes a new Loc datatype in version 2.3, and in order to support versions 2.2 (for GHC 6.8) and 2.3 (for GHC 6.10), I needed to do some funky C preprocessor (CPP) stuff.

Well, I followed the advice given in this thread started by Jason Ducek with useful responses from Duncan Coutts. The result was this emgm.cabal file with a flag for determining which version of template-haskell was available and whether or not to define the necessary macro. I later learned that this does't work when attempting to cabal-install GHC 6.8, because template-haskell-2.3 fails to correctly specify the version of base or GHC that it requires.

Now, to work around this problem, Duncan described how to hook into Cabal to get the actual package dependencies and insert the CPP option into the build info. It was not too difficult to figure this out. And in fact, the code for my Setup.lhs is in the appendix of this article in case it is later useful for me or someone else.

Unfortunately, as soon as I had this implemented, I discovered it didn't work in GHC 6.8.3/Cabal 1.2. There's a very minor difference in Cabal that simply breaks my code, and I don't know how to work around it. The difference is in PackageIdentifier:

Cabal 1.2:

> data PackageIdentifier = PackageIdentifier { pkgName :: String, pkgVersion :: Version }

Cabal 1.6:

> data PackageIdentifier = PackageIdentifier { pkgName :: PackageName, pkgVersion :: Version }
> newtype PackageName = PackageName String

I need PackageIdentifier to determine which version of template-haskell is being used as a dependency. But I either use a String or a PackageName depending on which version of Cabal is used. I don't think there's a way to know which version of Cabal is used when building a Setup.lhs file.

As far as I can tell, my options are the following:

Hack more on the Setup.lhs to figure out a different way of dealing with the template-haskell issue.
Release for GHC 6.10 only. Note that the problem only occurs when mixing cabal-install and template-haskell. EMGM builds fine with GHC 6.8 in general.
Remove the TH deriving code and the CPP macro.
Leave things as they are and warn people about the issue. If/when template-haskell gets patched, it may fix the problem.

I'm probably going to go with the last option for now.

Appendix

This is a literate Haskell Setup.lhs containing a build hook for passing a CPP option when a version of the template-haskell package greater than or equal to 2.3 is specified as a dependency. You might find it a useful example of using part of the Cabal library.

> module Main (main) where

> import System.Cmd
> import System.FilePath

> import Data.Version

> import Distribution.Simple
> import Distribution.Simple.LocalBuildInfo
> import Distribution.Simple.Program
> import Distribution.Simple.Setup

> import Distribution.Package
> import Distribution.PackageDescription

> main :: IO ()
> main = defaultMainWithHooks hooks
>   where
>     hooks = simpleUserHooks { buildHook = buildHook' }

> -- Insert CPP flag for building with template-haskell versions >= 2.3.
> buildHook' :: PackageDescription -> LocalBuildInfo -> UserHooks -> BuildFlags -> IO ()
> buildHook' pkg lbi hooks flags =
>   buildHook simpleUserHooks pkg (lbi { localPkgDescr = newPkgDescr }) hooks flags
>   where

>     -- Old local package description
>     oldPkgDescr = localPkgDescr lbi

>     -- New local package description
>     newPkgDescr =
>       case thVersion of
>         Nothing ->
>           oldPkgDescr
>         Just version ->
>           if version >= Version [2,3] []
>           then
>             oldPkgDescr
>               { library = addThCppToLibrary (library oldPkgDescr)
>               , executables = map addThCppToExec (executables oldPkgDescr)
>               }
>           else
>             oldPkgDescr

>     -- Template Haskell package name
>     thPackageName = "template-haskell"

>     -- template-haskell version
>     thVersion = findThVersion (packageDeps lbi)

>     -- CPP options for template-haskell >= 2.3
>     thCppOpt = "-DTH_LOC_DERIVEREP"

>     -- Find the version of the template-haskell package
>     findThVersion []          = Nothing
>     findThVersion (PackageIdentifier (PackageName name) version:ps)
>       | name == thPackageName = Just version
>       | otherwise             = findThVersion ps

>     -- Add the template-haskell CPP flag to a BuildInfo
>     addThCppToBuildInfo :: BuildInfo -> BuildInfo
>     addThCppToBuildInfo bi =
>       bi { cppOptions = thCppOpt : cppOptions bi }

>     -- Add the template-haskell CPP flag to a library package description
>     addThCppToLibrary :: Maybe Library -> Maybe Library
>     addThCppToLibrary ml = do
>       lib <- ml
>       return (lib { libBuildInfo = addThCppToBuildInfo (libBuildInfo lib) })

>     -- Add the template-haskell CPP flag to an executable package description
>     addThCppToExec :: Executable -> Executable
>     addThCppToExec exec =
>       exec { buildInfo = addThCppToBuildInfo (buildInfo exec) }

P.S. I used the recently released pandoc 1.2 for this article. Nice highlighting, eh?

Update #1: Apparently, Blogger's feed and/or Planet Haskell aren't ready for the games I played with pandoc. Thus, I have removed the syntax highlighting. That's unfortunate, because it looked good on the web.

Update #2: Thanks to the magic of the internets and the fact that there are people much smarter than I, there is a solution. In hindsight, it's obvious (of course): dynamically typed programming!

Neil Mitchell sent me this little piece of code which does just the trick:


> mkPackageName :: (Read a) => String -> a
> mkPackageName nm =
>   fst $ head $ reads shownNm ++ reads ("PackageName " ++ shownNm)
>   where
>     shownNm = show nm

So, I plugged this into thPackageName, removed PackageName from the pattern in findThVersion, and—voilà!—it worked.

2009-02-19

The strict danger of switching from pure to monadic Template Haskell

Due to several issues with the Template Haskell (TH) deriving in EMGM (which I knew were there, but needed significant motivation to fix), I realized it was time to convert much of my code from pure to monadic. From the beginning, I developed it to be as non-monadic (i.e. not in Q) as possible for all the reasons one is supposed to. This led to several challenges that were not too difficult to surmount at the time. Namely, reify and newName had to be used at the top level of my function hierarchy, and their values had be passed down. But now I really need reify in a few places in the depths of the code, and it is quite impractical to do otherwise. For example, I need to expand type synonyms, and if I do that at the top level, every name is now either a name or some adaptation of a type declaration.

Most of the refactoring was surprisingly easy. I simply changed uppercase constructors to their lowercase, monadic equivalents (see the docs for the difference). However, I ran into a problem in which a function specific to bifunctor types was getting applied to monomorphic types. This lead to strange errors (compile-time of course, since it's TH) that told me something had changed in the way the functions were used. It was very likely not the programming logic itself, because I was carefully migrating individual functions a few at a time and testing as I went. But somehow, once I reached the point in the code where I needed reify, I started getting these error reports (my code, though it was for unexpected inputs).

After several hours of tracing through with report, I found the problem. Here is the blah-ified (and simplified) original code:

-- Non-monadic:
blah :: ... -> ...
blah ... = ...

-- Monadic wrapper
blahM :: ... -> Q [...]
blahM ... = return [blah ...]

-- Exposed
fun :: ... -> Q ...
fun ... = do
  a <- ...
  b <- blahM ...
  let x =
        case ... of
          1 -> a
          2 -> b
          _ -> []
  return (... ++ x)

Most of the logic lies under blah, so it follows that I monadicized (monadified?) blah which required a minor changed to blahM:

blah :: ... -> Q ...
blah ... = ...

blahM :: ... -> Q [...]
blahM ... = do
  x <- blah ...
  return [x]

Now, this didn't affect anything immediately. I kept converting my functions to the monadic religion, and at some point, I suppose I evangelized too much. Then, I believe that blah, which by the original design was meant to be called lazily, became too strict. It was getting called every time, instead of only when the case expression matched on 2. Once I realized this, it was evident that I needed to change fun.

fun :: ... -> Q ...
fun ... = do
  x <-
    case ... of
      1 -> ...
      2 -> blahM ...
      _ -> return []
  return (... ++ x)

Now, everything is strictly normal again. No strange errors to keep me up at night. I believe now that blahM was always getting evaluated even though blah was not. So, as I pushed the monad down into the code, more and more things were getting strictly evaluated.

I keep learning that I don't understand laziness as well as I think I do. Or maybe it's that I don't understand strictness. Or perhaps I'm too lazy to strictly learn both.

I also resolve never to write non-monadic Template Haske11 code again (excluding small and/or simple functions, of course). The amount of work required is not worth the benefits gained.

splonderzoek