Sorting is a fundamental problem in computer science. It is also a problem with a limit on algorithmic efficiency. As such, it a good candidate for being parallelized.

Today we will look at two parallel sorting algorithms. The first is a shared memory version of merge sort, and the second is a message passing version of the odd-even transposition sort.

The serial merge sort algorithm is a receive algorithm which can be given as:

- If the array has 0 or 1 items, we're done.
- Mergesort the left half.
- Mergesort the right half.
- Merge the two halves together to make a new, sorted, array.

Below is an implementation
of this algorithm:
```
```

```
void mergeSort(int* array, int start, int end) {
if(start < end) {
int middle = (start + end) / 2;
/* sort left half */
mergeSort(array, start, middle);
/* sort right half */
mergeSort(array, middle + 1, end);
/* merge the two halves */
merge(array, start, end);
}
}
```

Which part of this algorithm can be parallelized?

The two recursive calls to sort each half of the array can be done in parallel. This is most easily done with OpenMP tasks. The following program does this:

```
```

```
void mergeSort(int* array, int start, int end) {
if(start < end) {
printf("Thread %d is sorting %d through %d\n", omp_get_thread_num(), start, end);
int middle = (start + end) / 2;
/* sort both halves in parallel */
#pragma omp parallel
{
#pragma omp single
{
#pragma omp task
mergeSort(array, start, middle);
#pragma omp task
mergeSort(array, middle + 1, end);
}
}
/* merge the two halves */
merge(array, start, end);
}
}
```

The simplest way to spawn a single thread is with the OpenMP task construct which creates one thread for the statement following the line. These must be in a parallel block. In order to prevent multiple of each task starting, we also use a single block.

In order to test the performance of the parallel merge sort implementation compared to the sequential one, we can use the following two programs:

Each of these sort 50 million integers using merge sort, either sequentially or in parallel.

However, the parallel one will run much slower! Why is it running slower? How can it be fixed?

Copyright © 2018 Ian Finlayson | Licensed under a Creative Commons Attribution 4.0 International License.