# Lab 9: CUDA

## Objective

To gain experience using CUDA.

For this lab, you will modify a CPU program so that a portion of it runs on the GPU instead. The program tests something known as the Collatz conjecture which states:

Take any natural number $n$. If $n$ is even, divide it by 2. If $n$ is odd, multiply it by 3 and add 1. If you repeat this process long enough, $n$ will eventually be equal to 1.

This conjecture has never been proven, but a counter-example has never been found. Different numbers take a different amount of steps to reach 1, but they always do.

The following program tries the Collatz conjecture on all numbers up to 50,000. It then prints the one which took the largest number of steps to reach 1.


#include <stdio.h>
#include <stdlib.h>

#define N 50000

/* run the collatz conjecture and return the number of steps */
unsigned long collatz(unsigned long x) {
unsigned long step = 0;

/* do the iterative process */
while (x != 1) {
if ((x % 2) == 0) {
x = x / 2;
} else {
x = 3 * x + 1;
}
step++;
}

return step;
}

int main() {
/* store the number of steps for each number up to N */
unsigned long steps[N];

/* run the collatz conjecture on all N items */
unsigned long i;
for (i = 0; i < N; i++) {
steps[i] = collatz(i + 1);
}

/* find the largest */
unsigned int largest = steps, largest_i = 0;
for (i = 1; i < N; i++) {
if (steps[i] > largest) {
largest = steps[i];
largest_i = i;
}
}

/* report results */
printf("The longest collatz chain up to %d is %d with %d steps.\n",
N, largest_i + 1, largest);

return 0;
}


You should modify this program so that the collatz function is run on multiple values in parallel on the GPU.

## Details

\$ wget "http://ianfinlayson.net/class/cpsc425/labs/collatz.c"
3. Change the collatz function so that it is a CUDA __global__function.
4. Change the main function so that it sets up memory for each CUDA core to work with. Do this by storing the values 1 to N in an array and then transfer that array to the GPU.
5. Change the collatz function so that it takes the array of values as a parameter and have it use blockIdx.x to index the array. Have it write the number of steps back over its input value.
6. Transfer the array back to the CPU which should now contain the number of Collatz steps of each value.
7. Test your program to make sure it produces the same output as the regular C version.
8. The GPU version of your program will probably be a bit slower than the CPU version because N is only 50,000. Unfortunately increasing N past 65,535 is not directly possible due to a CUDA limitation. We will talk about how to work around this next class.

## Submitting

Email me your CUDA version of the Collatz program.