Geeky things and other research from a work in progress

2009-06-23

RFC: Extensible, typed scanf- and printf-like functions for Haskell

I recently found myself inspired (and simultaneously frustrated as it usually happens), and I felt there was something truly missing from the collection of available code for Haskell. So, I sought to do something about it. Now, I would like some feedback on that work. But first, the story...

The inspiration came from none other than Oleg Kiselyov. Not too long ago, he sent out an email responding to some comments about a printf with Template Haskell. His safe and generic printf with C-like format string led me to think it would be nice if we had a safe and generic printf and scanf library. So I thought I could do that. I could take Oleg's code and polish it up and publish it.

I did take his code and play around with it for a while. In fact, what I did sits at its current state in Format.hs and FormatTest.hs. It was fun to play around in that area. I created a bunch of different descriptors for a wide variety of formats. And being the researcher of generic programming that I am, I wanted to make it generic. I wanted users to be able to add their own formats. As I got to thinking about it, I realized that this string format approach doesn't scale. There are only so many characters in the alphabet for one thing. And if I wanted to add alignment and spacing for some descriptors, then I need to create parsers for those. This is too much work for users to do for an extension, too.

The frustration then came. How do I improve on this? How do I make it more extensible? So I researched. With Google, of course. Eventually, I came upon Ralf Hinze's function pearl on "Formatting: a class act." That looked good. It's a type-indexed function that provides a safe way to extend for new types using a multiparameter type class with functional dependencies.

More playing with code ensued. I tried the approach using associated type synonyms because I like how they look (superficial, I suppose). Everything worked well enough, but occasionally I would run into a problem and have trouble debugging it. I eventually came to realize that a lot of those problems were due to the lack of visibility in the types. The type family approach hid the types behind unresolved synonyms. Since I couldn't see the final type, I was having trouble figuring out what I should do with the result. I learned that changing my class to use a functional dependency allowed me to see the resolved type. This helped me quite a bit. I still like how associated type synonym looked, but I gained a new appreciation for functional dependencies.

After working on showf, the printf-like function, for a while, I tried my had at a scanf-like function. At first, I tried to make it too much like showf without success. I wanted a variable-sized result for readf in the same way that showf had a variable number of arguments. In fact, that might still be possible. But for now, the input format descriptor directly determines the output's structure.

So, in the end, I came out with xformat. It has one module for showf and one for readf. It also has quite a few format descriptors. To give you an idea of what you can do, let me share a few examples.

Using the Text.XFormat.Show module:


module S where
import Text.XFormat.Show

s1 :: Int -> String
s1 = showf Int

-- Variable number of arguments mixed with constants
s2 :: String
s2 = showf ("Hello, " % String % Char) "World" '!'

-- Use tuples to group a format descriptor
s3 = showf ("The Answer is ", Int, ".") 42

-- Align right in a column width of 37.
s4 = showf (Align R 37 "Hello darkness, my old friend.")

Using the Text.XFormat.Read module:


{-# LANGUAGE TypeOperators #-}
module R where
import Text.XFormat.Read

r1 :: String -> Maybe Int
r1 = readf Int

-- Variable size format and output
r2 :: Maybe (String :%: (String :%: Char))
r2 = readf ("Hello, " % String % Char) "Hello, World!"

-- Use tuples to group a format descriptor
r3 = let Just (_, ans, _) = readf ("The Answer is ", Int, ".") "The Answer is 42."
     in ans

-- Extract the value in parentheses
r4 = readf (Wrap '(' Int ')') "(37)"

Now, finally to my request. I'd like some feedback on this library. Is the basic design reasonable? Can it be improved either aesthetically, performance-wise, or usability-wise? Any other comments on it? I'd like to go through some community improvement before committing it to Hackage.

I greatly appreciate any thoughts you might have.

Update: Soon after I posted this, I realized it didn't make much sense to ask for feedback when it's rather difficult to get access to the library. Thus, you may now find the package on the xformat Hackage page.

6 comments:

  1. I think the interface is very clean and elegant. I'm happy that it doesn't use TH. I like that the types are so predictable and transparent.

    I'm very curious about your new insights into functional dependencies as I'm still struggling to fully appreciate them. Perhaps you can write about this in some more detail in another post? :-)

    I think the modules on hackage could benefit from some good examples at the tops of the pages.

    And a small silly style thingy: "readpf d1 >>= return . Left" can also be written as "Left <$> readpf d1" (or using liftM or fmap) which I think is somewhat prettier. I think this is one of hlint's suggestions too, but I'm not sure.

    I am wondering what the crucial difference is between XFormat.Read and normal parser combinators. Is there something one can't do and the other can? Is one easier to use than the other? Parser combinators are extensible, too.

    You write that EitherF is fully symmetric (probably because ReadP's +++ is advertised as being fully symmetric), but how can this be? If a string can be parsed by both formats, does it yield a Left or a Right?

    ReplyDelete
  2. This is great! I love Oleg's exposés and wish more of them got turned into libaries (e.g. static capabilities) rather than just proofs of concept.

    I must agree with Martijn that the interface looks very clean and elegant. I'll be sure to excercise your libaray the next time I reach for Text.Printf.

    ReplyDelete
  3. @Martijn:

    Thanks for your comments! I appreciate your review. Here are some rsponses.


    I think the interface is very clean and elegant. I'm happy that it doesn't use TH. I like that the types are so predictable and transparent.


    Yeah, it worked out well to not use TH, I think. I will try to keep the types predictable.


    I'm very curious about your new insights into functional dependencies as I'm still struggling to fully appreciate them. Perhaps you can write about this in some more detail in another post? :-)


    If I get a chance, I will post something.


    I think the modules on hackage could benefit from some good examples at the tops of the pages.


    Agreed.


    And a small silly style thingy: "readpf d1 >>= return . Left" can also be written as "Left <$> readpf d1" (or using liftM or fmap) which I think is somewhat prettier. I think this is one of hlint's suggestions too, but I'm not sure.


    I like it. Done!


    I am wondering what the crucial difference is between XFormat.Read and normal parser combinators. Is there something one can't do and the other can? Is one easier to use than the other? Parser combinators are extensible, too.


    I don't think one is any more expressive than the other. As you saw, Text.XFormat.Read is built on top of the ReadP combinators. Perhaps it's bit easier to use readf with format descriptors than combinators, or perhaps it's the other way around. readf does give you an easy way to ensure types in the result.


    You write that EitherF is fully symmetric (probably because ReadP's +++ is advertised as being fully symmetric), but how can this be? If a string can be parsed by both formats, does it yield a Left or a Right?


    If I understand it correctly, I think (+++) will return the output with the shortest parse, regardless of which side it's on. On the other hand, (<++) always returns the left parsed result first if it the parse did not fail.

    ReplyDelete
  4. @Bjorn:

    This is great! I love Oleg's exposés and wish more of them got turned into libaries (e.g. static capabilities) rather than just proofs of concept.

    Strangely enough, this work was originally inspired by Oleg, but after some time, it no longer resembled his very much. Instead, it derives more directly from Hinze's paper referenced above.

    I must agree with Martijn that the interface looks very clean and elegant. I'll be sure to excercise your libaray the next time I reach for Text.Printf.

    If/when you do use it, let me know how well it works for you and what could be changed or added. I'm sure there are a lot of "standard" formats that could be included. I just haven't spent the time on looking at that, yet

    ReplyDelete
  5. Just my two cents: Relying on the order of arguments is a nice way to drive translators crazy. Extending this, or some other, type-safe printf library to allow for internatialization would be an interesting task to pursue.

    ReplyDelete
  6. @maltem:

    It's true: xformat does not really go beyond the printf/scanf model other than providing for type safety. How do translators typically deal with this?

    ReplyDelete