Home > FUN > Prefix Trees in Haskell

Prefix Trees in Haskell

As part of an interview, I was recently challenged to define a data type to represent prefix trees, along with lookup and insert functions. I was allowed to use any language of my choice, so decided to use Haskell. This blog post will introduce what a prefix tree is, and then cover my implementation. There may well be better implementations of prefix trees, but I was quite pleased with the functional representation that I ended up using.

Prefix Trees
The best way to define a prefix tree may well be with the aid of the following small diagram:

A Sample Prefix Tree

It is a data structure that stores values associated with certain keys. The key is used to traverse a branch of the tree. Not all nodes of the tree have to contain data, and in this instance, a key is only associated with a single piece of data. For example, in the diagram above, the prefixes (which form the keys) are the lower case letters on each branch, and the data is represented by the capital letters at (some of) the nodes.

In this highlighted example, the key “a” has no data associated with it, the key “ab” is associated with the data ‘N’, the key “abc” has no data associated with it, and the key “abcd” is associated with the data ‘X’.

The Key "abcd" Highlighted

Haskell

The first thing to decide is the types that represent the keys and data in the tree. In fact, we can make this definition polymorphic, and use any arbitrary data types for each of these. However, we’ll see later that we need to be able to compare the elements that form a key in order to define the insert function. A key is then formed by a list of the relevant key data type.

type Key a = [a]

So the key “abcd” used in the example above would have the type: Key Char

Now we can define the actual type that forms the prefix tree. A tree can simply be thought of as a node which may contain some data, along with a function which takes the next prefix and returns the tree below that point (if there is any further tree below that point).

data PFTree a b = Node (Maybe b) (a -> Maybe (PFTree a b))

With this definition we can now define an empty tree. That is, a tree with no data at the root node, and for which any prefix returns no sub tree.

initialTree :: PFTree a b
initialTree = Node Nothing (\_ -> Nothing)

So, we’ve now defined the tree data type, but we want to be able to define the lookup and insert functions. The lookup function is used to see whether a given key has any data associated with it, within the given prefix tree.

lookup :: Key a -> PFTree a b -> Maybe b

If we have got to the end of a key, then the data associated with the current node is the data we are looking up (even if it is Nothing).

lookup [] (Node b _) = b

If we haven’t reached the end of a key, then we need to see whether the relevant sub tree exists, and if it does, we need to lookup the rest of the key within the subtree.

lookup (x:xs) (Node b f) = case f x of
                           Nothing -> Nothing
                           Just pt -> lookup xs pt

The code for the insert function is also very similar. It is used to insert data associated to a given key, into a given prefix tree. However, we now need that the type of the keys fulfils the Eq typeclass, or in other words it defines the function (==) :: a -> a -> Bool that checks for equality between two members of the type.

insert :: Eq a => Key a -> b -> PFTree a b -> PFTree a b

If we have traversed the tree up to the end of the given key, then we have reached the node at which we wish to insert the given data. In this instance, we just swap the new data for whatever was previously in the tree at the current node.

insert [] b (Node _ f) = Node (Just b) f

If we haven’t reached the end of the key, then we are in one of two situations. Either the next prefix will evaluate to Nothing and we can just construct the subtree with the remaining key and an empty tree, or we have to insert the data recursively into the subtree with the remaining key. In either case, we construct the relevant function using the equality function.

insert (x:xs) b (Node mb f) = case f x of
                            Nothing -> Node mb (\x' -> if x' == x then Just (insert xs b initialTree) else f x')
                            Just pt -> Node mb (\x' -> if x' == x then Just (insert xs b pt) else f x')

Everything is now defined such that we could use the prefix trees. The insert function can be used along with the empty tree to construct arbitrary trees, and the lookup function can be used to extract data from them.

I’ll leave it as an exercise for someone to define the tree given as an example above.

Advertisements
Categories: FUN
  1. June 4, 2013 at 08:36

    Thanks for post! Very neat implementation.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: