One common problem is that of finding the shortest path between two nodes in a graph. This has applications in transportation and navigation such as GPS systems or Google Maps. It is also a problem faced by routers in a computer network when they decide how to route data across a network.

Below is an undirected, weighted graph representing several cities in Virginia along with approximate distances between them. Given a graph like this, we would like to find the shortest path between any two cities.

The most common algorithm to solve this problem is called "Dijkstra's algorithm". The algorithm works by finding the distance from the start node to every other node in the graph. The algorithm starts by assuming that all of the distances are infinity. It then searches out and replaces these distances with better ones as it finds them. It does this by looping through the nodes, starting with the start node, and setting the neighbors distances to the distance to the current node plus the cost of the edge. When the distance to the end node is found, we are done. The algorithm is given below:

- Set the tentative distance from the start node to all others to infinity.
- Set the tentative distance from the start node to itself to 0.
- Mark all the nodes as unvisited.
- Set the current node to the start one.
- While there is a current node:
- For each neighbor of the current node:
- If the neighbor's tentative distance is more than the sum of the distance to current + the cost of the edge from current, set the neighbors tentative distance to the distance to current + the cost of the edge from current to neighbor, and set the neighbor to unvisited.

- Mark the current node as visited.
- Set the current node to the unvisited node with the least tentative cost.

- For each neighbor of the current node:
- Return the tentative distance to the end node which is now minimal.

Dijkstra's algorithm is implemented in this example. Notice that we insert each edge in both directions because we assume each street is a two-way street.

The Big-O of this algorithm depends on whether we are using an adjacency matrix or incidence list, and also on how it's written. As given above, it's $O(N^2)$, where $N$ is the number of nodes in the graph. We could actually write a faster version using a heap, which is the next data structure we will look at.

A spanning tree is a tree that is made by selecting a subset of the edges of a graph such that all of the nodes of the graph are in the tree. The tree must also be a tree, so there can be no cycles in it. Below is a spanning tree for the cities graph:

Which node is considered to be the root is not important when discussing spanning trees. There can be multiple spanning trees for any one graph. The minimum spanning tree is the one for which the sum of the weights in the tree has the least value. Also note that only a connected graph can have a spanning tree.

There are several applications of minimal spanning trees. For example if an electric company wished to supply electricity for each of the above cities, running power lines along the edges of the minimal spanning tree will be use the least amount of cable.

There are several algorithms to solve this problem, one we will talk about is Prim's algorithm. Prim's algorithm starts with an empty minimal spanning tree. It then adds any one node to the tree. Then it repeatedly adds one node and one edge to the graph until it has a minimal spanning tree. Each time through it chooses the edge that connects a node in the tree with one not yet in the tree. If there are several, it chooses the one with the least weight.

This example implements Prim's algorithm. The minimal spanning tree that is actually found for this graph is depicted below:

Like Dijkstra's algorithm, the Big-O of Prim's algorithm depends on how exactly it is implemented. The simple version above is also $O(N^2)$. Also like Dijkstra's algorithm this can be improved with the use of a heap.

Another graph problem is the travelling salesman problem. In this problem,
there is a salesman who would like to visit all of a set of cities in order to
sell their products. To do this, they need to make a *tour* of the cities
where they visit each city exactly once and end up back where they started.

It is not possible to complete a tour for every graph. For example in the graph of Virginia cities given above it is not possible to complete a full tour and end up back where we started. For the graph below, however it is possible:

One tour is C-F-A-B-D-E-C. Another is from A-B-F-C-E-D-A. However these tours will take different amounts of time to complete based off of the edges we must take on the tour. The first will take 148 and the second will take 115. Clearly the salesman will prefer to take the second tour as it will cost him less time, gas or airfare.

The travelling salesman problem is to find the tour with the least total cost.

The number of possible tours through the graph is the same as the number of
*permutations* of the cities in the graph. A permutation is a specific
order of the cities. There are $N!$ permutations of $N$ cities.

The direct approach to solving the problem is to generate all $N!$
permutations, calculate the cost of each one, and see which is the longest.
This is called a *brute force* approach and is illustrated in this example. This program goes through
the possible permutations recursively.

The Big-O for this algorithm is actually the biggest we will see in this class! Because we try every permutation of $N$ nodes, this algorithm is actually $O(N!)$.

The graph below shows how complexities increase for $N$:

As you can see, $N!$ completely dwarfs even $2^N$ as $N$ increases.

In computer science, we classify problems in terms of how difficult they are
to solve. Algorithms which have a Big-O complexity of the form $O(N^k)$ where $k$
is a constant are called *polynomial* problems. Even if an algorithm is
$O(N^{1000})$ it is considered to be polynomial. Problems that have a polynomial
algorithm are said to be in class $P$.

This means that the shortest path problem, the minimal spanning tree problem and most of the other problems we have looked at are in class $P$.

The travelling salesman problem has no known polynomial algorithm that can
solve it. The best algorithm known is $O(N^2 \cdot 2^N)$. This problem belongs to a
class of very hard programs called "$NP$-Complete" which have no polynomial time
algorithms. Moreover, it's very unlikely that there even *exists* a
polynomial algorithm to any $NP$-Complete problem. If you find one, you'll
receive a $1,000,000 prize from the Clay Mathematics Institute.

Due to their open nature, graphs have several other $NP$-Complete problems associated with them:

- Vertex Cover: Finding the minimum number of nodes that attach to each edge in the graph.
- Graph Coloring: Finding the minimum number of colors we can color the nodes such that no two of the same color are adjacent.
- Finding Cliques: Finding the larges complete subgraph (clique) in a graph.

These may sound esoteric, but many useful problems *reduce* to one of
these difficult problems.

Scheduling problems can reduce to the graph coloring problem, as can the problem of allocating registers in a compiler.

The clique problem comes up in protein folding and gene modelling applications.

Being able to recognize these problems may keep you from spending time trying to solve an infeasible problem!

Copyright © 2019 Ian Finlayson | Licensed under a Creative Commons Attribution 4.0 International License.