Skip to content

Latest commit

 

History

History
726 lines (532 loc) · 24 KB

README.markdown

File metadata and controls

726 lines (532 loc) · 24 KB

Binary Search Tree (BST)

A binary search tree is a special kind of binary tree (a tree in which each node has at most two children) that performs insertions and deletions such that the tree is always sorted.

For more information about a tree, read this first.

"Always sorted" property

Here is an example of a valid binary search tree:

A binary search tree

Notice how each left child is smaller than its parent node, and each right child is greater than its parent node. This is the key feature of a binary search tree.

For example, 2 is smaller than 7, so it goes on the left; 5 is greater than 2, so it goes on the right.

Inserting new nodes

When performing an insertion, we first compare the new value to the root node. If the new value is smaller, we take the left branch; if greater, we take the right branch. We work our way down the tree this way until we find an empty spot where we can insert the new value.

Suppose we want to insert the new value 9:

  • We start at the root of the tree (the node with the value 7) and compare it to the new value 9.
  • 9 > 7, so we go down the right branch and repeat the same procedure but this time on node 10.
  • Because 9 < 10, we go down the left branch.
  • We now arrived at a point where there are no more values to compare with. A new node for 9 is inserted at that location.

Here is the tree after inserting the new value 9:

After adding 9

There is only one possible place where the new element can be inserted in the tree. Finding this place is usually quick. It takes O(h) time, where h is the height of the tree.

Note: The height of a node is the number of steps it takes to go from that node to its lowest leaf. The height of the entire tree is the distance from the root to the lowest leaf. Many of the operations on a binary search tree are expressed in terms of the tree's height.

By following this simple rule -- smaller values on the left, larger values on the right -- we keep the tree sorted, so whenever we query it, we can check if a value is in the tree.

Searching the tree

To find a value in the tree, we perform the same steps as with insertion:

  • If the value is less than the current node, then take the left branch.
  • If the value is greater than the current node, take the right branch.
  • If the value is equal to the current node, we've found it!

Like most tree operations, this is performed recursively until either we find what we are looking for or run out of nodes to look at.

Here is an example for searching the value 5:

Searching the tree

Searching is fast using the structure of the tree. It runs in O(h) time. If you have a well-balanced tree with a million nodes, it only takes about 20 steps to find anything in this tree. (The idea is very similar to binary search in an array.)

Traversing the tree

Sometimes you need to look at all nodes rather than only one.

There are three ways to traverse a binary tree:

  1. In-order (or depth-first): first look at the left child of a node then at the node itself and finally at its right child.
  2. Pre-order: first look at a node then its left and right children.
  3. Post-order: first look at the left and right children and process the node itself last.

Traversing the tree also happens recursively.

If you traverse a binary search tree in-order, it looks at all the nodes as if they were sorted from low to high. For the example tree, it would print 1, 2, 5, 7, 9, 10:

Traversing the tree

Deleting nodes

Removing nodes is easy. After removing a node, we replace the node with either its biggest child on the left or its smallest child on the right. That way the tree is still sorted after the removal. In the following example, 10 is removed and replaced with either 9 (Figure 2) or 11 (Figure 3).

Deleting a node with two children

Note the replacement needs to happen when the node has at least one child. If it has no child, you just disconnect it from its parent:

Deleting a leaf node

The code (solution 1)

So much for the theory. Let's see how we can implement a binary search tree in Swift. There are different approaches you can take. First, I will show you how to make a class-based version, but we will also look at how to make one using enums.

Here is an example for a BinarySearchTree class:

public class BinarySearchTree<T: Comparable> {
  private(set) public var value: T
  private(set) public var parent: BinarySearchTree?
  private(set) public var left: BinarySearchTree?
  private(set) public var right: BinarySearchTree?

  public init(value: T) {
    self.value = value
  }

  public var isRoot: Bool {
    return parent == nil
  }

  public var isLeaf: Bool {
    return left == nil && right == nil
  }

  public var isLeftChild: Bool {
    return parent?.left === self
  }

  public var isRightChild: Bool {
    return parent?.right === self
  }

  public var hasLeftChild: Bool {
    return left != nil
  }

  public var hasRightChild: Bool {
    return right != nil
  }

  public var hasAnyChild: Bool {
    return hasLeftChild || hasRightChild
  }

  public var hasBothChildren: Bool {
    return hasLeftChild && hasRightChild
  }

  public var count: Int {
    return (left?.count ?? 0) + 1 + (right?.count ?? 0)
  }
}

This class describes just a single node not the entire tree. It is a generic type, so the node can store any kind of data. It also has references to its left and right child nodes and a parent node.

Here is how you can use it:

let tree = BinarySearchTree<Int>(value: 7)

The count property determines how many nodes are in the subtree described by this node. This does not just count the node's immediate children but also their children and their children's children, and so on. If this particular object is the root node, then it counts how many nodes are in the entire tree. Initially, count = 0.

Note: Because left, right, and parent are optional, we can make good use of Swift's optional chaining (?) and nil-coalescing operators (??). You could also write this sort of thing with if let, but that is less concise.

Inserting nodes

A tree node by itself is useless, so here is how you would add new nodes to the tree:

  public func insert(value: T) {
    if value < self.value {
      if let left = left {
        left.insert(value: value)
      } else {
        left = BinarySearchTree(value: value)
        left?.parent = self
      }
    } else {
      if let right = right {
        right.insert(value: value)
      } else {
        right = BinarySearchTree(value: value)
        right?.parent = self
      }
    }
  }

Like so many other tree operations, insertion is easiest to implement with recursion. We compare the new value to the values of the existing nodes and decide whether to add it to the left branch or the right branch.

If there is no more left or right child to look at, we create a BinarySearchTree object for the new node and connect it to the tree by setting its parent property.

Note: Because the whole point of a binary search tree is to have smaller nodes on the left and larger ones on the right, you should always insert elements at the root to make sure this remains a valid binary tree!

To build the complete tree from the example you can do:

let tree = BinarySearchTree<Int>(value: 7)
tree.insert(2)
tree.insert(5)
tree.insert(10)
tree.insert(9)
tree.insert(1)

Note: For reasons that will become clear later, you should insert the numbers in a random order. If you insert them in a sorted order, the tree will not have the right shape.

For convenience, let's add an init method that calls insert() for all the elements in an array:

  public convenience init(array: [T]) {
    precondition(array.count > 0)
    self.init(value: array.first!)
    for v in array.dropFirst() {
      insert(value: v)
    }
  }

Now you can simply do this:

let tree = BinarySearchTree<Int>(array: [7, 2, 5, 10, 9, 1])

The first value in the array becomes the root of the tree.

Debug output

When working with complicated data structures, it is useful to have human-readable debug output.

extension BinarySearchTree: CustomStringConvertible {
  public var description: String {
    var s = ""
    if let left = left {
      s += "(\(left.description)) <- "
    }
    s += "\(value)"
    if let right = right {
      s += " -> (\(right.description))"
    }
    return s
  }
}

When you do a print(tree), you should get something like this:

((1) <- 2 -> (5)) <- 7 -> ((9) <- 10)

The root node is in the middle. With some imagination, you should see that this indeed corresponds to the following tree:

The tree

You may be wondering what happens when you insert duplicate items? We always insert those in the right branch. Try it out!

Searching

What do we do now that we have some values in our tree? Search for them, of course! To find items quickly is the main purpose of a binary search tree. :-)

Here is the implementation of search():

  public func search(value: T) -> BinarySearchTree? {
    if value < self.value {
      return left?.search(value)
    } else if value > self.value {
      return right?.search(value)
    } else {
      return self  // found it!
    }
  }

I hope the logic is clear: this starts at the current node (usually the root) and compares the values. If the search value is less than the node's value, we continue searching in the left branch; if the search value is greater, we dive into the right branch.

If there are no more nodes to look at -- when left or right is nil -- then we return nil to indicate the search value is not in the tree.

Note: In Swift, that is conveniently done with optional chaining; when you write left?.search(value) it automatically returns nil if left is nil. There is no need to explicitly check for this with an if statement.

Searching is a recursive process, but you can also implement it with a simple loop instead:

  public func search(value: T) -> BinarySearchTree? {
    var node: BinarySearchTree? = self
    while case let n? = node {
      if value < n.value {
        node = n.left
      } else if value > n.value {
        node = n.right
      } else {
        return node
      }
    }
    return nil
  }

Verify that you understand these two implementations are equivalent. Personally, I prefer to use iterative code over recursive code, but your opinion may differ. ;-)

Here is how to test searching:

tree.search(5)
tree.search(2)
tree.search(7)
tree.search(6)   // nil

The first three lines return the corresponding BinaryTreeNode object. The last line returns nil because there is no node with value 6.

Note: If there are duplicate items in the tree, search() returns the "highest" node. That makes sense, because we start searching from the root downwards.

Traversal

Remember there are 3 different ways to look at all nodes in the tree? Here they are:

  public func traverseInOrder(process: (T) -> Void) {
    left?.traverseInOrder(process: process)
    process(value)
    right?.traverseInOrder(process: process)
  }

  public func traversePreOrder(process: (T) -> Void) {
    process(value)
    left?.traversePreOrder(process: process)
    right?.traversePreOrder(process: process)
  }

  public func traversePostOrder(process: (T) -> Void) {
    left?.traversePostOrder(process: process)
    right?.traversePostOrder(process: process)
    process(value)
  }

They all work the same but in different orders. Notice that all the work is done recursively. The Swift's optional chaining makes it clear that the calls to traverseInOrder() etc are ignored when there is no left or right child.

To print out all the values of the tree sorted from low to high you can write:

tree.traverseInOrder { value in print(value) }

This prints the following:

1
2
5
7
9
10

You can also add things like map() and filter() to the tree. For example, here is an implementation of map:

  public func map(formula: (T) -> T) -> [T] {
    var a = [T]()
    if let left = left { a += left.map(formula: formula) }
    a.append(formula(value))
    if let right = right { a += right.map(formula: formula) }
    return a
  }

This calls the formula closure on each node in the tree and collects the results in an array. map() works by traversing the tree in-order.

An extremely simple example of how to use map():

  public func toArray() -> [T] {
    return map { $0 }
  }

This turns the contents of the tree back into a sorted array. Try it out in the playground:

tree.toArray()   // [1, 2, 5, 7, 9, 10]

As an exercise, see if you can implement filter and reduce.

Deleting nodes

We can make the code more readable by defining some helper functions.

  private func reconnectParentToNode(node: BinarySearchTree?) {
    if let parent = parent {
      if isLeftChild {
        parent.left = node
      } else {
        parent.right = node
      }
    }
    node?.parent = parent
  }

Making changes to the tree involves changing a bunch of parent and left and right pointers. This function helps with this implementation. It takes the parent of the current node -- that is self -- and connects it to another node which will be one of the children of self.

We also need a function that returns the minimum and maximum of a node:

  public func minimum() -> BinarySearchTree {
    var node = self
    while case let next? = node.left {
      node = next
    }
    return node
  }

  public func maximum() -> BinarySearchTree {
    var node = self
    while case let next? = node.right {
      node = next
    }
    return node
  }

The rest of the code is self-explanatory:

  @discardableResult public func remove() -> BinarySearchTree? {
    let replacement: BinarySearchTree?

    // Replacement for current node can be either biggest one on the left or
    // smallest one on the right, whichever is not nil
    if let right = right {
      replacement = right.minimum()
    } else if let left = left {
      replacement = left.maximum()
    } else {
      replacement = nil
    }

    replacement?.remove()

    // Place the replacement on current node's position
    replacement?.right = right
    replacement?.left = left
    right?.parent = replacement
    left?.parent = replacement
    reconnectParentTo(node:replacement)

    // The current node is no longer part of the tree, so clean it up.
    parent = nil
    left = nil
    right = nil

    return replacement
  }

Depth and height

Recall that the height of a node is the distance to its lowest leaf. We can calculate that with the following function:

  public func height() -> Int {
    if isLeaf {
      return 0
    } else {
      return 1 + max(left?.height() ?? 0, right?.height() ?? 0)
    }
  }

We look at the heights of the left and right branches and take the highest one. Again, this is a recursive procedure. Since this looks at all children of this node, performance is O(n).

Note: Swift's null-coalescing operator is used as shorthand to deal with left or right pointers that are nil. You could write this with if let, but this is more concise.

Try it out:

tree.height()  // 2

You can also calculate the depth of a node, which is the distance to the root. Here is the code:

  public func depth() -> Int {
    var node = self
    var edges = 0
    while case let parent? = node.parent {
      node = parent
      edges += 1
    }
    return edges
  }

It steps upwards through the tree, following the parent pointers until we reach the root node (whose parent is nil). This takes O(h) time. Here is an example:

if let node9 = tree.search(9) {
  node9.depth()   // returns 2
}

Predecessor and successor

The binary search tree is always "sorted" but that does not mean that consecutive numbers are actually next to each other in the tree.

Example

Note that you cannot find the number that comes before 7 by just looking at its left child node. The left child is 2, not 5. Likewise for the number that comes after 7.

The predecessor() function returns the node whose value precedes the current value in sorted order:

  public func predecessor() -> BinarySearchTree<T>? {
    if let left = left {
      return left.maximum()
    } else {
      var node = self
      while case let parent? = node.parent {
        if parent.value < value { return parent }
        node = parent
      }
      return nil
    }
  }

It is easy if we have a left subtree. In that case, the immediate predecessor is the maximum value in that subtree. You can verify in the above picture that 5 is indeed the maximum value in 7's left branch.

If there is no left subtree, then we have to look at our parent nodes until we find a smaller value. If we want to know what the predecessor is of node 9, we keep going up until we find the first parent with a smaller value, which is 7.

The code for successor() works the same way but mirrored:

  public func successor() -> BinarySearchTree<T>? {
    if let right = right {
      return right.minimum()
    } else {
      var node = self
      while case let parent? = node.parent {
        if parent.value > value { return parent }
        node = parent
      }
      return nil
    }
  }

Both these methods run in O(h) time.

Note: There is a variation called a "threaded" binary tree where "unused" left and right pointers are repurposed to make direct links between predecessor and successor nodes. Very clever!

Is the search tree valid?

If you were intent on sabotage you could turn the binary search tree into an invalid tree by calling insert() on a node that is not the root. Here is an example:

if let node1 = tree.search(1) {
  node1.insert(100)
}

The value of the root node is 7, so a node with value 100must be in the tree's right branch. However, you are not inserting at the root but at a leaf node in the tree's left branch. So the new 100 node is in the wrong place in the tree!

As a result, doing tree.search(100) gives nil.

You can check whether a tree is a valid binary search tree with the following method:

  public func isBST(minValue minValue: T, maxValue: T) -> Bool {
    if value < minValue || value > maxValue { return false }
    let leftBST = left?.isBST(minValue: minValue, maxValue: value) ?? true
    let rightBST = right?.isBST(minValue: value, maxValue: maxValue) ?? true
    return leftBST && rightBST
  }

This verifies the left branch contains values that are less than the current node's value, and that the right branch only contains values that are larger.

Call it as follows:

if let node1 = tree.search(1) {
  tree.isBST(minValue: Int.min, maxValue: Int.max)  // true
  node1.insert(100)                                 // EVIL!!!
  tree.search(100)                                  // nil
  tree.isBST(minValue: Int.min, maxValue: Int.max)  // false
}

The code (solution 2)

We have implemented the binary tree node as a class, but you can also use an enum.

The difference is reference semantics versus value semantics. Making a change to the class-based tree will update that same instance in memory, but the enum-based tree is immutable -- any insertions or deletions will give you an entirely new copy of the tree. Which one is best, totally depends on what you want to use it for.

Here is how you can make a binary search tree using an enum:

public enum BinarySearchTree<T: Comparable> {
  case Empty
  case Leaf(T)
  indirect case Node(BinarySearchTree, T, BinarySearchTree)
}

The enum has three cases:

  • Empty to mark the end of a branch (the class-based version used nil references for this).
  • Leaf for a leaf node that has no children.
  • Node for a node that has one or two children. This is marked indirect so that it can hold BinarySearchTree values. Without indirect you can't make recursive enums.

Note: The nodes in this binary tree do not have a reference to their parent node. It is not a major impediment, but it will make certain operations more cumbersome to implement.

This implementation is recursive, and each case of the enum will be treated differently. For example, this is how you can calculate the number of nodes in the tree and the height of the tree:

  public var count: Int {
    switch self {
    case .Empty: return 0
    case .Leaf: return 1
    case let .Node(left, _, right): return left.count + 1 + right.count
    }
  }

  public var height: Int {
    switch self {
    case .Empty: return 0
    case .Leaf: return 1
    case let .Node(left, _, right): return 1 + max(left.height, right.height)
    }
  }

Inserting new nodes looks like this:

  public func insert(newValue: T) -> BinarySearchTree {
    switch self {
    case .Empty:
      return .Leaf(newValue)

    case .Leaf(let value):
      if newValue < value {
        return .Node(.Leaf(newValue), value, .Empty)
      } else {
        return .Node(.Empty, value, .Leaf(newValue))
      }

    case .Node(let left, let value, let right):
      if newValue < value {
        return .Node(left.insert(newValue), value, right)
      } else {
        return .Node(left, value, right.insert(newValue))
      }
    }
  }

Try it out in a playground:

var tree = BinarySearchTree.Leaf(7)
tree = tree.insert(2)
tree = tree.insert(5)
tree = tree.insert(10)
tree = tree.insert(9)
tree = tree.insert(1)

Notice that for each insertion, you get back a new tree object, so you need to assign the result back to the tree variable.

Here is the all-important search function:

  public func search(x: T) -> BinarySearchTree? {
    switch self {
    case .Empty:
      return nil
    case .Leaf(let y):
      return (x == y) ? self : nil
    case let .Node(left, y, right):
      if x < y {
        return left.search(x)
      } else if y < x {
        return right.search(x)
      } else {
        return self
      }
    }
  }

Most of these functions have the same structure.

Try it out in a playground:

tree.search(10)
tree.search(1)
tree.search(11)   // nil

To print the tree for debug purposes, you can use this method:

extension BinarySearchTree: CustomDebugStringConvertible {
  public var debugDescription: String {
    switch self {
    case .Empty: return "."
    case .Leaf(let value): return "\(value)"
    case .Node(let left, let value, let right):
      return "(\(left.debugDescription) <- \(value) -> \(right.debugDescription))"
    }
  }
}

When you do print(tree), it will look like this:

((1 <- 2 -> 5) <- 7 -> (9 <- 10 -> .))

The root node is in the middle, and a dot means there is no child at that position.

When the tree becomes unbalanced...

A binary search tree is balanced when its left and right subtrees contain the same number of nodes. In that case, the height of the tree is log(n), where n is the number of nodes. That is the ideal situation.

If one branch is significantly longer than the other, searching becomes very slow. We end up checking more values than we need. In the worst case, the height of the tree can become n. Such a tree acts like a linked list than a binary search tree, with performance degrading to O(n). Not good!

One way to make the binary search tree balanced is to insert the nodes in a totally random order. On average that should balance out the tree well, but it not guaranteed, nor is it always practical.

The other solution is to use a self-balancing binary tree. This type of data structure adjusts the tree to keep it balanced after you insert or delete nodes. To see examples, check AVL tree and red-black tree.

See also

Binary Search Tree on Wikipedia

Written for the Swift Algorithm Club by Nicolas Ameghino and Matthijs Hollemans