## Catamorphisms, part five

Posted by Brian on May 31, 2008

Today’s blog entry highlights previous errata and demonstrates a new improvement/generalization.

I made a major error in my previous post: I said that

In real-world applications with large immutable data structures, it is imperative to do this right and "only recreate what you need to". Otherwise, a single change to a huge tree will wind up using O(N) space, when in fact you only needed to use O(log N) space.

but then went on to show code that is still always O(N). This was unintentional, and I’m glad that a friend of mine called me on it. It makes the perfect segue into today’s blog entry.

### The trouble with Fold and XFold

Fold and XFold, as we have seen in previous blog entries in this series, are useful functions. However these functions suffer an important limitation that I haven’t yet discussed. The limitation is that they *always traverse the entire structure*. Traversing the whole structure is often what you want to do – nearly all the example functions I’ve written in terms of folds so far on this blog need to visit the whole structure. For example, if you want to sum all the nodes in a tree, or calculate a tree’s height, you have to visit every node. But sometimes you don’t want to visit every node. For example, when searching for a node in a binary *search* tree, the "search" property of the tree directs you down a certain path, so that you only need to visit the nodes that connect the root of the tree to the node you are looking for. If the tree is balanced, then the search will only take O(log N) time.

Doing a binary search on a tree (or more generally, any kind of "selective" walk over an arbitrary recursively-defined data structure) *cannot *be done efficiently using Fold or XFold, because these functions always walk the entire structure (which takes O(N) time, as well as O(N) space due to the lambdas allocated at each node). When we need to do "selective traversals", we need something different. But what?

### KFold to the rescue

That "something different" is an entity I’ll call KFold. Consider once again the binary tree structure we used last time:

| Node of (*data*)‘a * (*left*)Tree<’a> * (*right*)Tree<’a>

| Leaf

// 4

// 2 6

// 1 3 5 7

let tree7 = Node(4, Node(2, Node(1, Leaf, Leaf), Node(3, Leaf, Leaf)),

Node(6, Node(5, Leaf, Leaf), Node(7, Leaf, Leaf)))

We can define a KFold function over the tree like so:

let rec Loop t =

match t with

| Node(x,left,right) -> nodeF x (fun k -> k (Loop left)) (fun k -> k (Loop right)) t

| Leaf -> leafV t

Loop tree

A KFold is a fold where the recursive portions (e.g. the second and third parameters to "nodeF") use explicit continuations. Continuation variables are sometimes conventionally named "k", and so I have adopted that convention here in the code body, and used the "K" prefix to differentiate this fold from the other folds we’ve already defined.

The use of continuations serves two purposes. First, as we have seen before, continuations are used to help ensure that the function stays tail-recursive, so that regardless of the size and shape of the data structure, we won’t blow the call stack. Second, and more importantly to today’s blog entry, we have shifted the continuation aspect back to the client of the KFold. This puts the client in control of when and if we make a recursive call.

Consider the case where we want to find the ’5′ node in a binary *search *tree and change its value. We only want to recurse left or right (and not both ways) down the tree as we try to find the desired value. Thus I can define:

let Change5to0bst tree =

KFoldTree (fun x kl kr t ->

let (Node(_,oldL,oldR)) = t

if x < 5 then

kr (fun newR ->

Node (x, oldL, newR))

elif x > 5 then

kl (fun newL ->

Node (x, newL, oldR))

else

Node(0,oldL,oldR)

) (fun t -> t) tree

The key bits are "kl" and "kr". These are continuations which will produce the results of the recursive calls on the left and right subtrees, respectively. The person calling KFoldTree can choose if and when to invoke these continuations. So, for example, we only invoke "kl" if we need to go down the left branch. In the code above, we do the usual binary-search-tree logic: if the value of the current node is too small, we recurse right, else if it’s too big, we recurse left, else if we found it, we modify the value. I’m using the words "change" and "modify", but recall that the Tree structure is immutable, and we’re actually just returning a new tree with the 5 replaced by a 0. (Yes, this make the result no longer a search tree; oh well, "Change5to0" is just contrived example function anyway.) As in the previous blog entry, we only create new nodes when necessary, that is, we re-use as much of the original structure as possible. (Compared to the XFold code, the "reuse" here is much more explicit, as we name the "old" subtrees which we explicitly connect into the new Node structures.)

By defining this function using a KFold, we get the desired performance characteristic: running the function on a balanced binary search tree will only consume O(log N) time and space.

### The KFold boilerplate – catamorphisms#

It turns out that XFoldTree can be defined in terms of KFoldTree:

let XFoldTree nodeF leafV tree =

KFoldTree (fun x l r -> l (fun lacc -> r (fun racc -> nodeF x lacc racc))) leafV tree

That is, XFoldTree is just KFoldTree where we *always* call the recursive continuations. KFold is more general than XFold, so XFold can be defined in terms of it.

Furthermore, KFold is entirely boilerplate – given a discriminated union type definition, the definition of a KFold over that type is automatic.

### Revisiting an old problem

Back in part three, I made another minor error. The Eval function I defined was not tail recursive, in that if you had a large expression whose "if" condition was another "if", whose condition was another "if", … and so on, then you’d use a stack frame for each "if". Though it’s unlikely that anyone would ever run that version of "Eval" on a program containing 10000 nested "if"s, I would still sleep a little better at night knowing that I wrote the function tail-recursively, so that arbitrarily large data will not blow the stack. So let’s rewrite it using a KFold. This will also provide an opportunity to show what a KFold looks like on a differently-shaped type.

Here was the original data type from part three:

type Op =

| Plus

| Minus

type Expr =

| Literal of int

| BinaryOp of Expr * Op * Expr // left, op, right

| IfThenElse of Expr * Expr * Expr // cond, then, else; 0=false in cond

| Print of Expr // prints, then returns that value

The KFoldExpr function definition follows directly from the structure of the type:

let rec Loop ex =

match ex with

| Literal(x) -> litF x ex

| BinaryOp(l,op,r) -> binF (fun k -> k (Loop l)) op (fun k -> k (Loop r)) ex

| IfThenElse(c,t,e) -> ifF (fun k -> k (Loop c)) (fun k -> k (Loop t)) (fun k -> k (Loop e)) ex

| Print(e) -> printF (fun k -> k (Loop e)) ex

Loop expr

Note that each recursive portion of the type becomes a "fun k -> …" bit. With the KFold defined, we can write Eval thusly:

let Eval expr =

KFoldExpr (fun x _ -> x)

(fun kl op kr _ -> match op with

| Plus -> kl (fun l -> kr (fun r -> l+r))

| Minus -> kl (fun l -> kr (fun r -> l-r)))

(fun kc kt ke _ -> kc (fun c -> if c <> 0 then

kt (fun t -> t)

else

ke (fun e -> e)))

(fun ke _ -> ke (fun e -> printf "<%d>" e

e))

expr

The problem with the Eval function previously defined (back in part three) was the bit that said

In that code, "c()" could cause a recursive call, but it is not a tail call. Today’s Eval function (defined as a KFold) has every function application as a tail call, so we are safe.

### The source code

Here’s the full source code from today.

// handy operator

let (===) = fun x y -> Object.ReferenceEquals(x,y)

type Tree<’a> =

| Node of (*data*)‘a * (*left*)Tree<’a> * (*right*)Tree<’a>

| Leaf

// 4

// 2 6

// 1 3 5 7

let tree7 = Node(4, Node(2, Node(1, Leaf, Leaf), Node(3, Leaf, Leaf)),

Node(6, Node(5, Leaf, Leaf), Node(7, Leaf, Leaf)))

let KFoldTree nodeF leafV tree =

let rec Loop t =

match t with

| Node(x,left,right) -> nodeF x (fun k -> k (Loop left)) (fun k -> k (Loop right)) t

| Leaf -> leafV t

Loop tree

// Change5to0bst : Tree<int> -> Tree<int>

let Change5to0bst tree =

KFoldTree (fun x kl kr t ->

let (Node(_,oldL,oldR)) = t

if x < 5 then

kr (fun newR ->

Node (x, oldL, newR))

elif x > 5 then

kl (fun newL ->

Node (x, newL, oldR))

else

Node(0,oldL,oldR)

) (fun t -> t) tree

Change5to0bst tree7 // does ‘the right thing’, e.g. O(log N) space and time for a balanced search tree

// XFoldTree : (‘a -> ‘r -> ‘r -> Tree<’a> -> ‘r) -> (Tree<’a> -> ‘r) -> Tree<’a> -> ‘r

let XFoldTree nodeF leafV tree =

KFoldTree (fun x l r -> l (fun lacc -> r (fun racc -> nodeF x lacc racc))) leafV tree

// Other useful Tree boilerplate from previous blogs

let XNode (x,l,r) (Node(xo,lo,ro) as orig) =

if xo = x && lo === l && ro === r then

orig

else

Node(x,l,r)

let XLeaf (Leaf as orig) =

orig

let FoldTree nodeF leafV tree =

XFoldTree (fun x l r _ -> nodeF x l r) (fun _ -> leafV) tree

///////////////////////////////////////////////////////////////////////////////////

// types capable of representing a small integer expression language

type Op =

| Plus

| Minus

type Expr =

| Literal of int

| BinaryOp of Expr * Op * Expr // left, op, right

| IfThenElse of Expr * Expr * Expr // cond, then, else; 0=false in cond

| Print of Expr // prints, then returns that value

let exprs = [Literal(42)

BinaryOp(Literal(1), Plus, Literal(1))

IfThenElse(Literal(1), Print(Literal(42)), Print(Literal(0)))

]

let KFoldExpr litF binF ifF printF expr =

let rec Loop ex =

match ex with

| Literal(x) -> litF x ex

| BinaryOp(l,op,r) -> binF (fun k -> k (Loop l)) op (fun k -> k (Loop r)) ex

| IfThenElse(c,t,e) -> ifF (fun k -> k (Loop c)) (fun k -> k (Loop t)) (fun k -> k (Loop e)) ex

| Print(e) -> printF (fun k -> k (Loop e)) ex

Loop expr

// Eval : Expr -> int

let Eval expr =

KFoldExpr (fun x _ -> x)

(fun kl op kr _ -> match op with

| Plus -> kl (fun l -> kr (fun r -> l+r))

| Minus -> kl (fun l -> kr (fun r -> l-r)))

(fun kc kt ke _ -> kc (fun c -> if c <> 0 then

kt (fun t -> t)

else

ke (fun e -> e)))

(fun ke _ -> ke (fun e -> printf "<%d>" e

e))

expr

exprs |> List.iter (fun expr -> printfn "%d" (Eval expr))

// 42

// 2

// <42>42

## Leave a Reply