Prefix Trees in Haskell
As part of an interview, I was recently challenged to define a data type to represent prefix trees, along with lookup and insert functions. I was allowed to use any language of my choice, so decided to use Haskell. This blog post will introduce what a prefix tree is, and then cover my implementation. There may well be better implementations of prefix trees, but I was quite pleased with the functional representation that I ended up using.
Prefix Trees
The best way to define a prefix tree may well be with the aid of the following small diagram:
It is a data structure that stores values associated with certain keys. The key is used to traverse a branch of the tree. Not all nodes of the tree have to contain data, and in this instance, a key is only associated with a single piece of data. For example, in the diagram above, the prefixes (which form the keys) are the lower case letters on each branch, and the data is represented by the capital letters at (some of) the nodes.
In this highlighted example, the key “a” has no data associated with it, the key “ab” is associated with the data ‘N’, the key “abc” has no data associated with it, and the key “abcd” is associated with the data ‘X’.
Haskell
The first thing to decide is the types that represent the keys and data in the tree. In fact, we can make this definition polymorphic, and use any arbitrary data types for each of these. However, we’ll see later that we need to be able to compare the elements that form a key in order to define the insert function. A key is then formed by a list of the relevant key data type.
type Key a = [a]
So the key “abcd” used in the example above would have the type: Key Char
Now we can define the actual type that forms the prefix tree. A tree can simply be thought of as a node which may contain some data, along with a function which takes the next prefix and returns the tree below that point (if there is any further tree below that point).
data PFTree a b = Node (Maybe b) (a -> Maybe (PFTree a b))
With this definition we can now define an empty tree. That is, a tree with no data at the root node, and for which any prefix returns no sub tree.
initialTree :: PFTree a b initialTree = Node Nothing (\_ -> Nothing)
So, we’ve now defined the tree data type, but we want to be able to define the lookup and insert functions. The lookup
function is used to see whether a given key has any data associated with it, within the given prefix tree.
lookup :: Key a -> PFTree a b -> Maybe b
If we have got to the end of a key, then the data associated with the current node is the data we are looking up (even if it is Nothing
).
lookup [] (Node b _) = b
If we haven’t reached the end of a key, then we need to see whether the relevant sub tree exists, and if it does, we need to lookup the rest of the key within the subtree.
lookup (x:xs) (Node b f) = case f x of Nothing -> Nothing Just pt -> lookup xs pt
The code for the insert
function is also very similar. It is used to insert data associated to a given key, into a given prefix tree. However, we now need that the type of the keys fulfils the Eq
typeclass, or in other words it defines the function (==) :: a -> a -> Bool
that checks for equality between two members of the type.
insert :: Eq a => Key a -> b -> PFTree a b -> PFTree a b
If we have traversed the tree up to the end of the given key, then we have reached the node at which we wish to insert the given data. In this instance, we just swap the new data for whatever was previously in the tree at the current node.
insert [] b (Node _ f) = Node (Just b) f
If we haven’t reached the end of the key, then we are in one of two situations. Either the next prefix will evaluate to Nothing
and we can just construct the subtree with the remaining key and an empty tree, or we have to insert the data recursively into the subtree with the remaining key. In either case, we construct the relevant function using the equality function.
insert (x:xs) b (Node mb f) = case f x of Nothing -> Node mb (\x' -> if x' == x then Just (insert xs b initialTree) else f x') Just pt -> Node mb (\x' -> if x' == x then Just (insert xs b pt) else f x')
Everything is now defined such that we could use the prefix trees. The insert function can be used along with the empty tree to construct arbitrary trees, and the lookup function can be used to extract data from them.
I’ll leave it as an exercise for someone to define the tree given as an example above.
Thanks for post! Very neat implementation.