The Travelling Salesman Problem
The Problem
Another graph problem is the travelling salesman problem (TSP). In this problem, there is a salesman who would like to visit all of a set of cities in order to sell their products. To do this, they need to make a tour of the cities where they visit each city exactly once and end up back where they started.
It is not possible to complete a tour for every graph. For example in the graph of Virginia cities given above it is not possible to complete a full tour and end up back where we started. For the graph below, however it is possible:
One tour is C-F-A-B-D-E-C. Another is from A-B-F-C-E-D-A. However these tours will take different amounts of time to complete based off of the edges we must take on the tour. The first will take 148 and the second will take 115. Clearly the salesman will prefer to take the second tour as it will cost him less time, gas or airfare.
The travelling salesman problem is to find the tour with the least total cost.
The number of possible tours through the graph is the same as the number of permutations of the cities in the graph. A permutation is a specific order of the cities. There are $N!$ permutations of $N$ cities.
Solution
The direct approach to solving the problem is to generate all $N!$ permutations, calculate the cost of each one, and see which is the longest. This is called a brute force approach and is illustrated in this example. This program goes through the possible permutations recursively.
The Big-O for this algorithm is actually the biggest we will see in this class! Because we try every permutation of $N$ nodes, this algorithm is actually $O(N!)$.
The graph below shows how complexities increase for $N$:
As you can see, $N!$ completely dwarfs even $2^N$ as $N$ increases.
Classes of Problems
In computer science, we classify problems in terms of how difficult they are to solve. Algorithms which have a Big-O complexity of the form $O(N^k)$ where $k$ is a constant are called polynomial problems. Even if an algorithm is $O(N^{1000})$ it is considered to be polynomial. Problems that have a polynomial algorithm are said to be in class $P$.
This means that the shortest path problem, the minimal spanning tree problem and most of the other problems we have looked at are in class $P$.
The travelling salesman problem has no known polynomial algorithm that can solve it. The best algorithm known is $O(N^2 \cdot 2^N)$. This problem belongs to a class of very hard programs called "$NP$-Complete" which have no polynomial time algorithms. Moreover, it's very unlikely that there even exists a polynomial algorithm to any $NP$-Complete problem. If you find one, you'll receive a $1,000,000 prize from the Clay Mathematics Institute.
Due to their open nature, graphs have several other $NP$-Complete problems associated with them:
- Vertex Cover: Finding the minimum number of nodes that attach to each edge in the graph.
- Graph Coloring: Finding the minimum number of colors we can color the nodes such that no two of the same color are adjacent.
- Finding Cliques: Finding the larges complete subgraph (clique) in a graph.
These may sound esoteric, but many useful problems reduce to one of these difficult problems.
Scheduling problems can reduce to the graph coloring problem, as can the problem of allocating registers in a compiler.
The clique problem comes up in protein folding and gene modelling applications.
Being able to recognize these problems may keep you from spending time trying to solve an infeasible problem!
Approximation Algorithms
So what do we do when we have a problem which falls into the class of $NP$-Complete ones? We simply need to look for a solution which is "pretty good" even though it is likely not optimal. An algorithm which doesn't seek an exact solution to a problem is called an approximation algorithm. For instance the exact solution to the travelling salesman problem is a tour with the minimum total cost. An approximation algorithm to this problem will instead just search for a tour with as low a cost as we can manage.
A common class of approximation algorithms are hill climbing algorithms. In these, we start with any potential solution to the problem, and then iteratively improve the solution until we cannot improve it any further.
Hill climbing algorithms produce better results more quickly if the starting solution is reasonably good. So in the case of TSP, we don't want to start with a completely random tour. Instead we can use the following algorithm to generate a starting tour:
- Pick any node as the starting point of the tour, called current.
- While there are cities not in the tour:
- Pick the closest node to current, which is not yet part of the tour
- Add this node to the tour, and make it the current
At the end of this, we should have a starting tour which is at least somewhat reasonable.
A common hill climbing approximation algorithm for TSP is the 2-opt method. Here, we will repeatedly look for pairs of edges in the tour which can be swapped to improve the tour. For example, in the following tour:
We could reduce the total cost of the tour by swapping edges BE and CF, replacing them with edges BC and EF intead:
The algorithm in pseudocode is as follows:
- Start with any valid tour (such as one constructed along the lines given above)
- For each edge in the tour $(a,b)$:
- For each other edge in the tour $(c,d)$:
- Compute the cost of the tour we would achieve by swapping these edges out for $(a,c)$ and $(b,d)$.
- If this cost is less than the current cost of the tour, make the change
- For each other edge in the tour $(c,d)$:
We then repeat this process until we cannot make any further changes to improve the cost of the tour.
It may initially seem that this will result in an optimal tour, however that isn't the case. The reason is that hill climbing algorithms like this are liable to fall into a local optimum. That means that no single local change (like swapping 2 edges) can improve our solution. However it is possible a larger change (such as re-arranging larger sections of the tour) could result in a better solution.