Sorting is a fundamental problem in computer science. It is also a problem with a limit on algorithmic efficiency. As such, it a good candidate for being parallelized.
Some sorting algorithms are easier to parallelize than others. Some, like insertion sort and bubble sort are not really parallelizable at all. One algorithm that is especially easy to parallelize is merge sort.
The basic approach of merge sort is to split the array into two halves, a left half and a right half. Then we sort each half recursively. Then to finish up, we merge the two sorted halve back together. Because we recursively sort the sub-arrays separately, those halves can be sorted in parallel without changing the behavior of the algorithm.
The serial merge sort algorithm is a recursive algorithm which can be given as:
Below is an implementation of this algorithm:
void mergeSort(int* array, int start, int end) {
if(start < end) {
int middle = (start + end) / 2;
/* sort left half */
mergeSort(array, start, middle);
/* sort right half */
mergeSort(array, middle + 1, end);
/* merge the two halves */
merge(array, start, end);
}
}
Which part of this algorithm can be parallelized?
The two recursive calls to sort each half of the array can be done in parallel. This is most easily done with OpenMP tasks. The following program does this:
void mergeSort(int* array, int start, int end) {
if(start < end) {
printf("Thread %d is sorting %d through %d\n", omp_get_thread_num(), start, end);
int middle = (start + end) / 2;
/* sort both halves in parallel */
#pragma omp parallel
{
#pragma omp single
{
#pragma omp task
mergeSort(array, start, middle);
mergeSort(array, middle + 1, end);
}
}
/* merge the two halves */
merge(array, start, end);
}
}
The simplest way to spawn a single thread is with the OpenMP task construct which creates one thread for the statement following the line. These must be in a parallel block. In order to prevent multiple of each task starting, we also use a single block.
So the effect of this code is that we launch the first recursive call in a separate thread, then carry on and do the second call. Then at the end of the parallel block all threads launched are joined.
In order to test the performance of the parallel merge sort implementation compared to the sequential one, we can use the following two programs:
Each of these sort 50 million integers using merge sort, either sequentially or in parallel.
However, as written the parallel one will probably not complete. Even if it did complete, it would be much slower than the sequential version. Why is this? How can it be fixed?
Copyright © 2025 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.