Graph Traversal Algorithm

Suppose we have a graph and we wish to visit (i.e. in order to print) each vertex exactly once. We don't care about the order, as long as we print everything. How can we do this?

For an adjacency matrix, the task is very simple, because the rows (or the columns) correspond to the vertices. We can simply traverse the rows (or columns) and print out the vertices. However, for an adjacency list, we have to be a bit more clever. A general way to do it is via the below algorithm
while (there are non-visited nodes)
    Initialize data structure DS
    Add a non-visited vertex v_i to DS
    Mark v_i as visited
    while (DS is not empty)
        Remove v_j from DS
        Mark v_j as visited
        Add non-visited vertices adjacent to v_j to DS
                
where the specific data structure DS we implement may change the order in which we traverse the graph in. At a high level, we go through the nodes one by one and every time we visit a node, we mark it as visited (so we don't visit it again) and add all the non-visited neighbours to the data structure.

When the vertex gets removed, we can print out its value. For a connected graph, then the outer while loop will only run once. The reason for there being two while loops is that if we start off on an "island" eventually we'll get to a point where we have no more vertices in our data structure but there will still be nodes on another island.

Breadth First Search

We can perform a breadth-first-search (BFS) by using a queue as our data structure. Suppose we have the following graph
then implementing BFS starting from node \(C\) will give te traversal (i.e. if we print when we remove the node) \[ C \to B \to D \to A \to E \to F \] which can be illustrated below:
                -----------------------------------------
                Queue Contents              Traversal
                ----------------------------------------
                C                           empty
                B D                         C
                D A                         C B
                A E                         C B D
                E F                         C B D A
                F                           C B D A E
                empty                       C B D A E F
                ----------------------------------------
                
See the course slides for another example. We can use a chess analogy for why we call this breadth first.

In this strategy game, I might have several possible moves on my turn. For each of my viable moves, my opponent has several possible moves, and so forth. One way I can approach the game is to look at all my options and for each option, look at all my opponent's options, and so forth. In a sense, my "looking-ahead" depth slowly increases. This particular strategy might be useful in the early game.

Compare this to a depth-first-search (which we will talk about next). Instead of looking at all my options, I look at one particular option, and then look at one specific response from my opponent to that option, and so forth. Each computation I make in my head makes me look one step further (I won't consider the other possible moves I can make, yet!) Eventually, I'll get to the end (perhaps checkmate!) and I can start working backwards slowly. If my opponent moved somewhere else, is it still checkmate?

Depth first allows me to explore a particular line in extreme depth before moving on to consider another line while breadth first allows me to explore all lines simultaneously but slowly. This is a trade-off in graph theory just like in chess. In chess, you might want to use a depth-first-search near the end-game but you might want to use a breadth-first-search in the opening.

Depth First Search

We can perform a depth-first-search (DFS) by using a stack (first in last out) as our data structure. Suppose we have the same graph as above. Then starting from \(C\) we get the traversal \[C \to D \to E \to F \to B \to A\] which can be illustrated below:
                -----------------------------------------
                Stack Contents              Traversal
                ----------------------------------------
                C                           empty
                B D                         C
                B E                         C D
                B F                         C D E
                B                           C D E F
                A                           C D E F B
                empty                       C D E F B A
                ----------------------------------------
                

L32: Graph Traversal Implementation

BFS and DFS Code

We can set up our graph and nodes as Python classes,
class Node:
    def __init__(self, data):
        self.data = data
        self.neighbours = []

class Graph:
    def __init__(self):
        self.nodes = [] # One representation
and first implement breadth first search (BFS):

                def bfs(graph):
                    visited = set() # Average insertion/lookup time is O(1)
                
                    for starting_node in graph.nodes:
                        queue = [starting_node]
                
                        while len(queue) > 0:
                            node = queue.pop(0)
                            if node not in visited:
                                print(node.data)
                                visited.add(node)
                                queue.extend(node.neighbours)
                
                    return visited
                
Every time we add a node to visited we print it out, and then add all the neighbours of this node to the queue. Recall that set() represents a set in Python, and operations such as a in set are allowed, and can be done in constant time (which is better than a list!).

We can test the code as follows:
A = Node('A')
                B = Node('B')
                C = Node('C')
                D = Node('D')
                E = Node('E')
                F = Node('F')
                
                A.neighbours = [B]
                B.neighbours = [A, E, D, C]
                C.neighbours = [B, D]
                D.neighbours = [B, C, E]
                E.neighbours = [B, D, F]
                F.neighbours = [E]
                
                graph = Graph()
                graph.nodes = [C, B, D, A, E, F]
                
                bfs(graph)
which represents the diagram above. Note that the order of graphs.nodes matter here because there isn't a unique way of doing BFS, i.e. there's no way to give priority to a certain neighbour over another neighbour when added to the queue. For depth first search (DFS) we can write very similar code,
def dfs(graph):
                    visited = set() # Average insertion/lookup time is O(1)
                
                    for starting_node in graph.nodes:
                        stack = [starting_node]
                
                        while len(stack) > 0:
                            node = stack.pop()
                
                            if node not in visited:
                                print(node.data)
                                visited.add(node)
                                stack.extend(node.neighbours)
                
                    return visited
What's the difference? Besides variable names, we've changed .pop(0) to .pop(). This means that instead of removing from the front of the list (which is where the older nodes are), we remove from the end of the list (which is where the newer nodes are), thus implementing a stack.

Example: Race to 21

We can represent this with a graph.
  • Nodes: game states for Race to 21. For example, a possible state is \(v = [0, 1, 3, 4]\) which corresponds to an incomplete game where the players chose \(+1\), \(+2\), \(+1\), in that order. The first node is \([0]\)
  • Edges: valid moves. For example, game states \(v_1\) and \(v_2\) are connected if and only if there is a valid move from \(v_1\) to \(v_2\).
Thus, the problem then reduces down to traversing this graph. If we are able to construct this graph, then we are done (and we can use BFS or DFS algorithm we have already written to traverse it, with some minor modifications). It turns out for this problem, because the neighbours have some structure to it (i.e. we can easily predict the neighbours (there's a maximum of 2 possible states stemming from a given state!)), we can just modify our code directly (and it'll be more efficient). We have:
def filter_neighbours(neighbours):
                    res = []
                    for n in neighbours:
                        if n[-1] <= 21:
                            res.append(n)
                    
                    return res
                def bfs21():
                    # starting node: [0]
                    # don't have node.neighbours: need to generate them
                    # don't need to worry about returning to visited nodes,
                    # since cannot go back from +1 or +2 to a previous game state
                
                    queue = [[0]]
                    while len(queue) > 0:
                        node = queue.pop(0)
                        if node[-1] == 21:
                            print(node)
                        neighbours = filter_neighbours([node + [node[-1] + 1], node + [node[-1] + 2]])
                        queue.extend(neighbours)
                
                bfs21()
Each node is a list with two neighbours:
  • Neighbour 1: [node + [node[-1] + 1]
  • Neighbour 2: [node + [node[-1] + 2]
However, some of these neighbours don't actually exist, since we can't go above \(21\). Thus, we need to filter out the neighbours that are greater than \(21\). Here, the function filter_neighbours() is a helper that takes in these neighbours and returns the ones that are actually valid.

After this is done, we can then add these new neighbours to our queue, just like we did in BFS. Note that this graph structure is that of a binary tree, so this also gives us a way of printing out binary trees!

Recursive DFS

In the last section, we were able to print out all possible game states. This is a very similar problem to a recursive problem in ESC180 where we printed out all possible passwords. What we were actually doing with the recursion method is implementing DFS instead! Here is pseudocode for DFS:
DFS(v_i):
                    Mark v_i as visited
                    For each non-visited vertex v_j adjacent to v_i
                        DFS(v_j)

Recursion and Stacks

Recall from last lecture that DFS can be implemented using a stack, or equivalently using recursion. This actually generalizes! In fact, we can use the stack data structure to implement recursion, i.e. by storing the local variables in a stack. For example, if we wanted to compute the factorial recursively, we can write:
def fact(n):
                    if n == 1:
                    return 1
                    else:
                        return n * fact(n-1)
                
But we can also write this explicitly using a stack. Effectively, the function becomes
def trace_factorial(n):
                    local_n = [n]
                    i = n
                    while i != 1:
                    local_n.append(i-1) # add to stack
                    i -= 1
                    res = 1
                    while len(local_n) > 0:
                    res *= local_n.pop() # remove from stack and compute
                    return res
Recall that .pop() removes the last element of a list (LIFO).