bfs

Breadth First Search

This can be throught of as being like Dijkstra's algorithm for shortest paths, but with every edge having the same length. However it is a lot simpler and doesn't need any data structures. We just keep a tree (the breadth first search tree), a list of nodes to be added to the tree, and markings (Boolean variables) on the vertices to tell whether they are in the tree or list. 

Algorithm

breadth first search:
  unmark all vertices
  choose some starting vertex x
  mark x
  list L = x
  tree T = x
  while L nonempty
  choose some vertex v from front of list
  visit v
  for each unmarked neighbor w
      mark w
      add it to end of list
      add edge vw to T
      
It's very important that you remove vertices from the other end of the list than the one you add them to, so that the list acts as a queue (fifo storage) rather than a stack (lifo). The "visit v" step would be filled out later depending on what you are using BFS for, just like the tree traversals usually involve doing something at each vertex that is not specified as part of the basic algorithm. If a vertex has several unmarked neighbors, it would be equally correct to visit them in any order. Probably the easiest method to implement would be simply to visit them in the order the adjacency list for v is stored in.

Let's prove some basic facts about this algorithm. First, each vertex is clearly marked at most once, added to the list at most once (since that happens only when it's marked), and therefore removed from the list at most once. Since the time to process a vertex is proportional to the length of its adjacency list, the total time for the whole algorithm is O(m).

Next, let's look at the tree T constructed by the algorithm. Why is it a tree? If you think of each edge vw as pointing "upward" from w to v, then each edge points from a vertex visited later to one visited earlier. Following successive edges upwards can only get stopped at x (which has no edge going upward from it) so every vertex in T has a path to x. This means that T is at least a connected subgraph of G. Now let's prove that it's a tree. A tree is just a connected and acyclic graph, so we need only to show that T has no cycles. In any cycle, no matter how you orient the edges so that one direction is "upward" and the other "downward", there is always a "bottom" vertex having two upward edges out of it. But in T, each vertex has at most one upward edge, so T can have no cycles. Therefore T really is a tree. It is known as a breadth first search tree.

We also want to know that T is a spanning tree, i.e. that if the graph is connected (every vertex has some path to the root x) then every vertex will occur somewhere in T. We can prove this by induction on the length of the shortest path to x. If v has a path of length k, starting v-w-...-x, then w has a path of length k-1, and by induction would be included in T. But then when we visited w we would have seen edge vw, and if v were not already in the tree it would have been added.

Breadth first traversal of G corresponds to some kind of tree traversal on T. But it isn't preorder, postorder, or even inorder traversal. Instead, the traversal goes a level at a time, left to right within a level (where a level is defined simply in terms of distance from the root of the tree). For instance, the following tree is drawn with vertices numbered in an order that might be followed by breadth first search:

        1
      / | \
    2   3   4
      /   \     |
    5       6   7
    |     / | \
    8    9 10 11

The proof that vertices are in this order by breadth first search goes by induction on the level number. By the induction hypothesis, BFS lists all vertices at level k-1 before those at level k. Therefore it will place into L all vertices at level k before all those of level k+1, and therefore so list those of level k before those of level k+1. (This really is a proof even though it sounds like circular reasoning.)

Breadth first search trees have a nice property: Every edge of G can be classified into one of three groups. Some edges are in T themselves. Some connect two vertices at the same level of T. And the remaining ones connect two vertices on two adjacent levels. It is not possible for an edge to skip a level.

Therefore, the breadth first search tree really is a shortest path tree starting from its root. Every vertex has a path to the root, with path length equal to its level (just follow the tree itself), and no path can skip a level so this really is a shortest path.

Breadth first search has several uses in other graph algorithms, but most are too complicated to explain in detail here. One is as part of an algorithm for matching, which is a problem in which you want to pair up the n vertices of a graph by n/2 edges. If you have a partial matching, pairing up only some of the vertices, you can extend it by finding an alternating path connecting two unmatched vertices; this is a path in which every other edge is part of the partial matching. If you remove those edges in the path from the matching, and add the other path edges back into the matching, you get a matching with one more edge. Alternating paths can be found using a version of breadth first search.

A second use of breadth first search arises in certain pattern matching problems. For instance, if you're looking for a small subgraph such as a triangle as part of a larger graph, you know that every vertex in the triangle has to be connected by an edge to every other vertex. Since no edge can skip levels in the BFS tree, you can divide the problem into subproblems, in which you look for the triangle in pairs of adjacent levels of the tree. This sort of problem, in which you look for a small graph as part of a larger one, is known as subgraph isomorphism. In a recent paper, I used this idea to solve many similar pattern-matching problems in linear time.