Geeky things and other research from a work in progress

2009-06-23

RFC: Extensible, typed scanf- and printf-like functions for Haskell

I recently found myself inspired (and simultaneously frustrated as it usually happens), and I felt there was something truly missing from the collection of available code for Haskell. So, I sought to do something about it. Now, I would like some feedback on that work. But first, the story...

The inspiration came from none other than Oleg Kiselyov. Not too long ago, he sent out an email responding to some comments about a printf with Template Haskell. His safe and generic printf with C-like format string led me to think it would be nice if we had a safe and generic printf and scanf library. So I thought I could do that. I could take Oleg's code and polish it up and publish it.

I did take his code and play around with it for a while. In fact, what I did sits at its current state in Format.hs and FormatTest.hs. It was fun to play around in that area. I created a bunch of different descriptors for a wide variety of formats. And being the researcher of generic programming that I am, I wanted to make it generic. I wanted users to be able to add their own formats. As I got to thinking about it, I realized that this string format approach doesn't scale. There are only so many characters in the alphabet for one thing. And if I wanted to add alignment and spacing for some descriptors, then I need to create parsers for those. This is too much work for users to do for an extension, too.

The frustration then came. How do I improve on this? How do I make it more extensible? So I researched. With Google, of course. Eventually, I came upon Ralf Hinze's function pearl on "Formatting: a class act." That looked good. It's a type-indexed function that provides a safe way to extend for new types using a multiparameter type class with functional dependencies.

More playing with code ensued. I tried the approach using associated type synonyms because I like how they look (superficial, I suppose). Everything worked well enough, but occasionally I would run into a problem and have trouble debugging it. I eventually came to realize that a lot of those problems were due to the lack of visibility in the types. The type family approach hid the types behind unresolved synonyms. Since I couldn't see the final type, I was having trouble figuring out what I should do with the result. I learned that changing my class to use a functional dependency allowed me to see the resolved type. This helped me quite a bit. I still like how associated type synonym looked, but I gained a new appreciation for functional dependencies.

After working on showf, the printf-like function, for a while, I tried my had at a scanf-like function. At first, I tried to make it too much like showf without success. I wanted a variable-sized result for readf in the same way that showf had a variable number of arguments. In fact, that might still be possible. But for now, the input format descriptor directly determines the output's structure.

So, in the end, I came out with xformat. It has one module for showf and one for readf. It also has quite a few format descriptors. To give you an idea of what you can do, let me share a few examples.

Using the Text.XFormat.Show module:


module S where
import Text.XFormat.Show

s1 :: Int -> String
s1 = showf Int

-- Variable number of arguments mixed with constants
s2 :: String
s2 = showf ("Hello, " % String % Char) "World" '!'

-- Use tuples to group a format descriptor
s3 = showf ("The Answer is ", Int, ".") 42

-- Align right in a column width of 37.
s4 = showf (Align R 37 "Hello darkness, my old friend.")

Using the Text.XFormat.Read module:


{-# LANGUAGE TypeOperators #-}
module R where
import Text.XFormat.Read

r1 :: String -> Maybe Int
r1 = readf Int

-- Variable size format and output
r2 :: Maybe (String :%: (String :%: Char))
r2 = readf ("Hello, " % String % Char) "Hello, World!"

-- Use tuples to group a format descriptor
r3 = let Just (_, ans, _) = readf ("The Answer is ", Int, ".") "The Answer is 42."
     in ans

-- Extract the value in parentheses
r4 = readf (Wrap '(' Int ')') "(37)"

Now, finally to my request. I'd like some feedback on this library. Is the basic design reasonable? Can it be improved either aesthetically, performance-wise, or usability-wise? Any other comments on it? I'd like to go through some community improvement before committing it to Hackage.

I greatly appreciate any thoughts you might have.

Update: Soon after I posted this, I realized it didn't make much sense to ask for feedback when it's rather difficult to get access to the library. Thus, you may now find the package on the xformat Hackage page.