Archive

Archive for the ‘FUN’ Category

Prefix Trees in Haskell

September 13, 2010 1 comment

As part of an interview, I was recently challenged to define a data type to represent prefix trees, along with lookup and insert functions. I was allowed to use any language of my choice, so decided to use Haskell. This blog post will introduce what a prefix tree is, and then cover my implementation. There may well be better implementations of prefix trees, but I was quite pleased with the functional representation that I ended up using.

Prefix Trees
The best way to define a prefix tree may well be with the aid of the following small diagram:

A Sample Prefix Tree

It is a data structure that stores values associated with certain keys. The key is used to traverse a branch of the tree. Not all nodes of the tree have to contain data, and in this instance, a key is only associated with a single piece of data. For example, in the diagram above, the prefixes (which form the keys) are the lower case letters on each branch, and the data is represented by the capital letters at (some of) the nodes.

In this highlighted example, the key “a” has no data associated with it, the key “ab” is associated with the data ‘N’, the key “abc” has no data associated with it, and the key “abcd” is associated with the data ‘X’.

The Key "abcd" Highlighted

Haskell

The first thing to decide is the types that represent the keys and data in the tree. In fact, we can make this definition polymorphic, and use any arbitrary data types for each of these. However, we’ll see later that we need to be able to compare the elements that form a key in order to define the insert function. A key is then formed by a list of the relevant key data type.

type Key a = [a]

So the key “abcd” used in the example above would have the type: Key Char

Now we can define the actual type that forms the prefix tree. A tree can simply be thought of as a node which may contain some data, along with a function which takes the next prefix and returns the tree below that point (if there is any further tree below that point).

data PFTree a b = Node (Maybe b) (a -> Maybe (PFTree a b))

With this definition we can now define an empty tree. That is, a tree with no data at the root node, and for which any prefix returns no sub tree.

initialTree :: PFTree a b
initialTree = Node Nothing (\_ -> Nothing)

So, we’ve now defined the tree data type, but we want to be able to define the lookup and insert functions. The lookup function is used to see whether a given key has any data associated with it, within the given prefix tree.

lookup :: Key a -> PFTree a b -> Maybe b

If we have got to the end of a key, then the data associated with the current node is the data we are looking up (even if it is Nothing).

lookup [] (Node b _) = b

If we haven’t reached the end of a key, then we need to see whether the relevant sub tree exists, and if it does, we need to lookup the rest of the key within the subtree.

lookup (x:xs) (Node b f) = case f x of
                           Nothing -> Nothing
                           Just pt -> lookup xs pt

The code for the insert function is also very similar. It is used to insert data associated to a given key, into a given prefix tree. However, we now need that the type of the keys fulfils the Eq typeclass, or in other words it defines the function (==) :: a -> a -> Bool that checks for equality between two members of the type.

insert :: Eq a => Key a -> b -> PFTree a b -> PFTree a b

If we have traversed the tree up to the end of the given key, then we have reached the node at which we wish to insert the given data. In this instance, we just swap the new data for whatever was previously in the tree at the current node.

insert [] b (Node _ f) = Node (Just b) f

If we haven’t reached the end of the key, then we are in one of two situations. Either the next prefix will evaluate to Nothing and we can just construct the subtree with the remaining key and an empty tree, or we have to insert the data recursively into the subtree with the remaining key. In either case, we construct the relevant function using the equality function.

insert (x:xs) b (Node mb f) = case f x of
                            Nothing -> Node mb (\x' -> if x' == x then Just (insert xs b initialTree) else f x')
                            Just pt -> Node mb (\x' -> if x' == x then Just (insert xs b pt) else f x')

Everything is now defined such that we could use the prefix trees. The insert function can be used along with the empty tree to construct arbitrary trees, and the lookup function can be used to extract data from them.

I’ll leave it as an exercise for someone to define the tree given as an example above.

Advertisements
Categories: FUN

Radix Sort in F#

August 13, 2010 1 comment

I have spent many years programming in Haskell, but have also recently started dabbling in F# and Ocaml. I would count myself as a big proponent of the Functional Programming paradigm.

Just as a practice I thought I would implement a Radix sort algorithm in F# for unsigned 32bit integers. Radix sort algorithms use a Radix in order to sort the elements in a list or array. This means that you must be able to define a Radix for the data type which you are trying to sort, but has the advantage that the time complexity of the algorithm is only linear in the number of elements to be sorted (multiplied by a constant that relates to the size of the radix), compared to O(n log n) for most partition sort algorithms.

For this implementation, I shall use the bits as the Radix. This means that each step of the sort algorithm splits the list into two, depending on the value of a single bit in each element, keeping all the elements in these sublists in the same order they appeared in the original list. The algorithm works by splitting the list depending on the least significant bit of each element, and then joining the resultant sublists, before repeating the process depending on the next least significant bit… this repeats until the last step where the splitting depends on the most significant bit of the elements, and the final joined list is ordered. I am sure there are much better definitions of a radix sort on the web, but I thought this would be an interesting initial experiment with F#.

Some of these functions my well be in the library, but my current knowledge of the library is lacking slightly, so I have implemented everything I don’t know about.

The first definition is a function that simply returns a Boolean value of whether the input integer is equal to zero. This achieved using pattern matching.

let is0 x =
  match x with
    0 -> true
  | n -> false

The next function defines how the list is split depending on the Radix. The first argument is the position of the bit which is the current Radix, and the second argument is the input list. The recursive sub-function actually does the work using a pair of accumulators.

// splits a list of integers in half depending on the value of the nth bit of the integers
let splitBy n xxs =
   let rec splitBy' n' xxs' (oz,iz) =
      match xxs' with
        [] -> (oz,iz)
      |  x :: xs -> splitBy' n' xs (if is0 ((x >>> n') &&& 1) then (x::oz,iz) else (oz,x::iz))
   splitBy' n xxs ([],[])

You’ve probably noticed that the lists returned by the previous function actually contain the elements in the reverse order in which they appear in the input list. So, we will need a function that reverses a list. This can also be achieved using an accumulator.

//simple reverse list implemented using an accumulator e.g. extra list to move elements into
let reverse ls =
   let rec reverse' (xs,scratch) =
      match xs with
        [] -> scratch
      | x :: xs -> reverse' (xs,x :: scratch)
   reverse' (ls,[])

We can use the reverse function when we want to join the two lists that are created at each step of the algorithm. I have used the list concatenation function from the library.

let join (oz,iz) = List.concat [reverse oz;reverse iz]

Each step of the algorithm can now be defined. The first argument still defines which bit you’re working with, and the second argument is the input list.

let sortBy n xs = join (splitBy n xs)

The algorithm is completed by running the algorithm recursively over all 32 bits in the unsigned 32 bit integers.

// recursive algorithm to sort depending on LSB upto MSB e.g. bits 0 to 31
let sort xs =
   let rec sort' n xs' =
      match n with
        0 -> xs'
      | i -> sortBy (i-1) (sort' (i-1) xs')
   sort' 32 xs

and that is it… An implementation of Radix sort in F#.
Comments Please.

Categories: FUN