Geeky things and other research from a work in progress

2008-08-30

I git it now!

After a lot of discussion about GHC switching from darcs to git, I was tempted to try it out. Today was an experimentation day for me, and it turned out really well.

I come from the world of Subversion. It has been my primary VCS for a number of years, and that's mostly by circumstance. In my master's program, it was just easiest to get access to. At my last job, SVN was the main repository for everything. I became very familiar with it: it's quirks with moving files, it's problems with merging branches, and so on. Sure, it's not the best, but it worked well enough. In fact, my current place of employment also uses SVN extensively.

As I gradually become more familiar with the Haskell community, I encountered darcs. I never really became comfortable with it, though. It was quite slow for basic things. The large number of commands and the many ways of doing things that are similar but slightly different was, truth be told, somewhat overwhelming. Of course, I only used darcs to pull source repositories for building. I never had my own repository or tried to send a patch to someone else.

Then, this thing with GHC and its libraries came up. They were looking for maintenance of the SYB library, and we volunteered. I presume that the GHC developers are going to set up git repositories for the libraries, so I figured I should get to know git. In my reading, I had learned about several features of git that intrigued me. It was these features that lured me in and have pretty much captured me, too.

First, there's the concept of local branches. I love it! I can't get enough. I generally make copies of directories and files to do my own form of local branching. Perhaps it's naive, but it is the best solution I had found up to now. I often don't want to check unstable code or unused code into the central repository. Either it's more effort than I want to spend at the time or it ends up in a branch that is forgotten about and gets out of date quickly. The git local branches are simple: I don't have to spend time copying. They are quick: git checkout <branch>. And they stay in the same place as the master branch, so I don't have to search for them in my files. Wonderful!

Second, git has a very nice extension, git-svn, for integrating with a Subversion repository. Since we use that at work, I can try out git and using it to actually do work. That's what today's experiment was about.

As for comparing VCS tools, I can't say I'm very qualified. My general (and limited) experience with darcs is that it is very slow and difficult to figure out how to use. My experience with git is that while it has a very large number of commands, it is thoroughly documented. There are a number of tutorials available to point out what I need to get started using. Plus, there's the page on git for Subversion users.

So, I will continue to experiment with git, since I can do it without much difficulty. So far, I'm more happy than not. It's obviously not perfect, but I like the things it does best.

2008-01-25

Smart constructors

[Ed: This being the inaugural post in my technical blog, I felt I should start with a topic that's rather introductory and... constructive as the case may be.]

I recently found myself confronted with the following problem in my Haskell code. I needed to create elements of an algebraic data type; however, in order to store the intermediate results of a computation, part of each constructor was generated from the other parts. It can be difficult to work with such a type, but there is a solution! Let me attempt to provide a scenario and explain both the naïve way and the smart way.

First, let's use the following data type as our pedagogical substrate:

data Expr = Val Int | Add Expr Expr

The above tells me that this expression type may be constructed as a value Val containing an integer Int or as an addition operation Add containing 2 subexpressions. With this fairly simple type, I might want to encode an arithmetic expression e1 such as (4 + 2) + 7. The result might look like so:

e1 = Add (Add (Val 4) (Val 2)) (Val 7)

The canonical follow-up to this would be an evaluation mechanism for expressions:

eval :: Expr -> Int
eval (Val i) = i
eval (Add e1 e2) = eval e1 + eval e2

When I call eval e1, it recursively traverses the tree described in e1 as evidenced by the calls to eval in the Add component.

Now, suppose I'm developing an application using the data type Expr [Ed: What—a calculator?], and I need to deal with very large expressions. The expression trees can be very deep and wide, requiring eval to spend time visiting every node. In order to remove the need for that potentially time-consuming traversal, I want to have expressions evaluated upon construction from the bottom up. So, given that the only real computation involved in our eval function is addition, let's append the result type Int onto the Add constructor for storing the result.

data Expr = Val Int | Add Expr Expr Int

Now, when we build our expression, we do the addition as well.

e2 = Add (Add (Val 4) (Val 2) (4+2)) (Val 7) (4+2+7)

But wait... That doesn't really make sense. Shouldn't the computer be forming the addition operations for us? This seems very error-prone. Let's try breaking down our construction of e2, so that we're building a single expression at a time.

e3 = Add (Val 4) (Val 2) (4+2)
e4 = Add e3 (Val 7) (??+7)

That didn't really help much, either. I suppose we could extract out the 4+2 bit like so...

i = 4+2
e5 = Add (Val 4) (Val 2) i
e6 = Add e3 (Val 7) (i+7)

but then it seems like we're creating extra work. I mean, this is functional programming! There should be an easier way to do this! [Ed: Fine. Now, will you please get to the point soon.]

And indeed there is! It's called a smart constructor. [Ed: Oh really? The title did not give that away at all, I swear!] The process is as simple as replacing the actual Add constructor with a function. This function takes in the same arguments as Add with the exception of the result value, computes the result, and returns a complete Add. Here it is in all its glory.

makeAdd :: Expr -> Expr -> Expr
makeAdd (Val i1) (Val i2) = Add (Val i1) (Val i2) (i1+i2)
makeAdd (Val i1) (Add e2a e2b s2) = Add (Val i1) (Add e2a e2b s2) (i1+s2)
makeAdd (Add e1a e1b s1) (Val i2) = Add (Add e1a e1b s1) (Val i2) (s1+i2)
makeAdd (Add e1a e1b s1) (Add e2a e2b s2) = Add (Add e1a e1b s1) (Add e2a e2b s2) (s1+s2)

As you can see, there are no recursive calls in makeAdd. We are effectively hiding the computation as well as the fact that there is this third argument to the constructor. Consequently, makeAdd is just as easy to use as the constructor of the original Add (before appending the Int):

e7 = makeAdd (makeAdd (Val 4) (Val 2)) (Val 7)

In addition, we can rewrite eval so that it will simply fetch the precomputed value of an Add.

eval :: Expr -> Int
eval (Val i) = i
eval (Add e1 e2 s) = s

And that's the basics for smart constructors. There are a few more things to add, but I think I'll stop for now. Perhaps I'll revisit this subject in a later entry.