Distributed Sorting

Overview

Sorting in a distributed environment is complicated by the fact we will not have all of the data available on one processor.

In some cases, the amount of data would not even fit into the memory of a single machine at once.

To handle this, we can sort our data according to the following rules:

All of the data on each process is locally sorted.
All of the data stored on process $i$ is less than all of the data stored on process $j$ if $i \lt j$.

The goal of our parallel sorting algorithm will be to make these two true. This can be done without ever loading all of the data onto any one process.

Odd-Even Transposition Sort

The basis for one method of distributed sorting is the "odd-even transposition sort". The serial version of this algorithm is given below:

For each phase:

If this is an even phase:
1. Compare and swap each even/odd pair of elements.
Else:
1. Compare and swap each odd/even pair of elements.

For example, if our data is as follows:

6, 9, 3, 1

Then we would start with phase 0. This is an even phase, so we will compare/swap each even/odd pair. This means pairs where the first element is an even index and the second is odd:

(6, 9) (3, 1)

Becomes:

6, 9, 1, 3

Now we move to phase 1. This is an odd phase so we will compare swap the odd/even pairs:

6, (9, 1), 3

Becomes:

6, 1, 9, 3

Next, we have phase 2, which is an even phase:

(6, 1) (9, 3)

Becomes:

1, 6, 3, 9,

Lastly, we have phase 3 which is an odd phase:

1, (6, 3), 9

Becomes:

1, 3, 6, 9

How many phases will we need in the worst case with N values?

The Parallel Odd-Even Transposition

The odd-even transposition algorithm can be extended to work in a distributed system.

In this scenario, each process will store its own portion of the data. It will keep its own data sorted.

It will then share its data with either the process to its left or right based on the phase we are in.

This algorithm is given below:

For each phase:

Sort our local data.
Find our partner based on the phase number and our rank.
If we have a partner:
1. Send our data to our partner.
2. Receive our partner's data.
3. If our rank is smaller, keep the smallest items from our data + our partners.
4. Else, our rank is larger, so keep the largest items from our data + our partners.

This is an extension of the serial algorithm, where we are comparing and swapping a subset of the data rather than a single element.

How could we implement this algorithm with an MPI program?