Geeky things and other research from a work in progress

2008-01-25

Smart constructors

[Ed: This being the inaugural post in my technical blog, I felt I should start with a topic that's rather introductory and... constructive as the case may be.]

I recently found myself confronted with the following problem in my Haskell code. I needed to create elements of an algebraic data type; however, in order to store the intermediate results of a computation, part of each constructor was generated from the other parts. It can be difficult to work with such a type, but there is a solution! Let me attempt to provide a scenario and explain both the naïve way and the smart way.

First, let's use the following data type as our pedagogical substrate:

data Expr = Val Int | Add Expr Expr

The above tells me that this expression type may be constructed as a value Val containing an integer Int or as an addition operation Add containing 2 subexpressions. With this fairly simple type, I might want to encode an arithmetic expression e1 such as (4 + 2) + 7. The result might look like so:

e1 = Add (Add (Val 4) (Val 2)) (Val 7)

The canonical follow-up to this would be an evaluation mechanism for expressions:

eval :: Expr -> Int
eval (Val i) = i
eval (Add e1 e2) = eval e1 + eval e2

When I call eval e1, it recursively traverses the tree described in e1 as evidenced by the calls to eval in the Add component.

Now, suppose I'm developing an application using the data type Expr [Ed: What—a calculator?], and I need to deal with very large expressions. The expression trees can be very deep and wide, requiring eval to spend time visiting every node. In order to remove the need for that potentially time-consuming traversal, I want to have expressions evaluated upon construction from the bottom up. So, given that the only real computation involved in our eval function is addition, let's append the result type Int onto the Add constructor for storing the result.

data Expr = Val Int | Add Expr Expr Int

Now, when we build our expression, we do the addition as well.

e2 = Add (Add (Val 4) (Val 2) (4+2)) (Val 7) (4+2+7)

But wait... That doesn't really make sense. Shouldn't the computer be forming the addition operations for us? This seems very error-prone. Let's try breaking down our construction of e2, so that we're building a single expression at a time.

e3 = Add (Val 4) (Val 2) (4+2)
e4 = Add e3 (Val 7) (??+7)

That didn't really help much, either. I suppose we could extract out the 4+2 bit like so...

i = 4+2
e5 = Add (Val 4) (Val 2) i
e6 = Add e3 (Val 7) (i+7)

but then it seems like we're creating extra work. I mean, this is functional programming! There should be an easier way to do this! [Ed: Fine. Now, will you please get to the point soon.]

And indeed there is! It's called a smart constructor. [Ed: Oh really? The title did not give that away at all, I swear!] The process is as simple as replacing the actual Add constructor with a function. This function takes in the same arguments as Add with the exception of the result value, computes the result, and returns a complete Add. Here it is in all its glory.

makeAdd :: Expr -> Expr -> Expr
makeAdd (Val i1) (Val i2) = Add (Val i1) (Val i2) (i1+i2)
makeAdd (Val i1) (Add e2a e2b s2) = Add (Val i1) (Add e2a e2b s2) (i1+s2)
makeAdd (Add e1a e1b s1) (Val i2) = Add (Add e1a e1b s1) (Val i2) (s1+i2)
makeAdd (Add e1a e1b s1) (Add e2a e2b s2) = Add (Add e1a e1b s1) (Add e2a e2b s2) (s1+s2)

As you can see, there are no recursive calls in makeAdd. We are effectively hiding the computation as well as the fact that there is this third argument to the constructor. Consequently, makeAdd is just as easy to use as the constructor of the original Add (before appending the Int):

e7 = makeAdd (makeAdd (Val 4) (Val 2)) (Val 7)

In addition, we can rewrite eval so that it will simply fetch the precomputed value of an Add.

eval :: Expr -> Int
eval (Val i) = i
eval (Add e1 e2 s) = s

And that's the basics for smart constructors. There are a few more things to add, but I think I'll stop for now. Perhaps I'll revisit this subject in a later entry.