Introduction to OpenMP

Overview

OpenMP is an alternative to writing shared memory parallel programs. It is higher-level than Pthreads. It allows us to specify that a portion of our code should be executed in parallel.

OpenMP can create and join threads automatically. It also allows for programs to be parallelized incrementally.

Hello World

A parallel hello world in OpenMP is quite simple:


#include <stdio.h>
#include <omp.h>

void hello() {
    /* get thread id */
    int id = omp_get_thread_num();
    printf("Hello from thread %d!\n", id);
}

int main() {
#pragma omp parallel num_threads(4)
    hello();
    return 0;
}

Compiling OpenMP

Like Pthreads, we have to specify a compiler flag to compile OpenMP code:

$ gcc hello.c -fopenmp

Without the -fopenmp flag, we will get compiler errors.

OpenMP and C++

OpenMP can be used with C++ programs as well:


#include <iostream>
#include <omp.h>

using namespace std;

void hello() {
    /* get thread id */
    int id = omp_get_thread_num();
    cout << "Hello from thread " << id << "\n";
}

int main() {
#pragma omp parallel num_threads(4)
    hello();
    return 0;
}

To compile:

$ g++ hello.cpp -fopenmp

pragma parallel

OpenMP relies on #pragma directives. These are not normal function calls, but are special directions to the compiler.

When gcc sees the line "#pragma omp parallel num_threads(4)" it needs to do something special: insert code to spawn four threads.

The form of a pragma parallel is as follows:


#pragma omp paralell num_threads(N)
code_block

Where N is the number of threads to start, and code_block is a block of code. This can be a single statement, or a block enclosed in curly braces.

Pragmas are one line in length, so the following will not work:


# pragma parallel
num_threads(4)

If you want a multi-line pragma, you can use the line continuation character:


# pragma parallel \
num_threads(4)

pragmas are ignored by compilers that do not support them. It is possible to write an OpenMP program such that it will compile as plain C on compilers that don't support OpenMP.

Under the Hood

When compiling the hello world program above, gcc will produce code as follows (with most lines removed):


main:
	movl	$4, %edx
	movl	$main.omp_fn.0, %edi
	call	GOMP_parallel_start
	call	main.omp_fn.0
	call	GOMP_parallel_end

main.omp_fn.0:
	call	hello

It creates a new function "main.omp_fn" that contains the code block for the #pragma parallel.

It changes your main function so that it passes the number of threads and the address of the "main.omp_fn" into a function called GOMP_parallel_start. This function spawns the threads, and calls the "main.omp_fn" function from each.

It then calls GOMP_parallel_end which waits for all threads to join. On POSIX systems, OpenMP is built on top of Pthreads. On other systems it uses the native threading system available.

Thread IDs

Unlike Pthreads, OpenMP keeps track of the number of threads, and individual thread id's for us. We can get these values with the functions:


int omp_get_thread_num();
int omp_get_num_threads();

Example: Code Breaking

One area of parallel programming is code breaking. The basic idea is as follows:

Pick an encryption key.
Use it on the encrypted text.
Check if the decrypted text is readable.
If not, pick another key.

This is easily parallelizeable since multiple keys can be checked independently.

The program below implements this for the very simple Caesar cipher. It tries to decrypt the text by each of the 26 possible keys. It then searches the decrypted text for common English words.


#include <regex.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char ignores [] = ",.'!?";
char common [][8] = {"THE", "BE", "TO", "OF", "AND", "A", "IN", "THAT", "HAVE", "I",
    "IT", "FOR", "NOT", "ON", "WITH", "HE", "AS", "YOU", "DO", "AT"};

unsigned char rotate(unsigned char c, int amount) {
    /* skip everything non-alphabetic */
    if (!isalpha(c)) {
        return c;
    }

    /* do the shift */
    unsigned char next = c + amount;

    /* wrap if needed */
    unsigned char last = isupper(c) ? 'Z' : 'z';
    while (next > last) {
        next -= 26;
    }

    return next;
}

void break_code(char* encrypted, int length, int amount) {
    /* make space for decrypted string */
    char* decrypted = malloc(length + 1);

    /* rotate each character that amount */
    int j;
    for (j = 0; j < length; j++) {
        decrypted[j] = rotate(encrypted[j], amount);
    }
    decrypted[j] = '\0';

    /* check how many common words are in the string */
    int count = 0;
    for (j = 0; j < (sizeof(common) / sizeof(char*)); j++) {
        /* build a regex to search for the word exactly */
        char regex_string[16];
        sprintf(regex_string, "\\<%s\\>", common[j]);
        regex_t regex;
        regcomp(&regex, regex_string, REG_ICASE);

        /* search the string for our matches if present */
        regmatch_t match;
        int offset = 0;
        /* while we have a match */
        while (!regexec(&regex, decrypted + offset, 1, &match, 0)) {
            /* count it an search the rest of the string */
            count++;
            offset += match.rm_eo;
        }
    }

    /* if the ratio of common words to the length of the text is over .01, it
     * is likely the correct decrypted text that we are looking for */
    if (((double)count / (double)strlen(decrypted)) >= .01) {
        printf("The decrypted text is:\n%s\n", decrypted);
    }
}


int main(int argc, char** argv) {
    /* be strict */
    if (argc < 2) {
        printf("Usage: code cipherfile\n");
        return 0;
    }

    /* open the file */
    FILE* file = fopen(argv[1], "r");
    if (!file) {
        printf("File '%s' could not be opened.\n", argv[1]);
        return 0;
    }

    /* will store the encrypted string */
    char* encrypted = NULL;

    /* find how big the file is and allocate that much space */
    fseek(file, 0, SEEK_END);
    long length = ftell(file);
    encrypted = malloc(length + 1);

    /* go back to the start and read it all in */
    fseek (file, 0, SEEK_SET);
    fread (encrypted, 1, length, file);
    fclose (file);

    /* try each possible shift amount */
    for (int i = 1; i <= 26; i++) {
        break_code(encrypted, length, i);
    }

    return 0;
}

How can we parallelize this with OpenMP?

Critical Sections

Critical sections are easily handled with OpenMP. Below is an OpenMP version of the sum program that currently is non-deterministic:


#include <stdlib.h>
#include <omp.h>
#include <stdio.h>

#define THREADS 32
#define START 0
#define END 100000

/* a global variable */
unsigned long sum = 0;

/* the function called for each thread */
void sum_part() {
    /* get our thread id */
    int id = omp_get_thread_num();

    /* calculate the start and end points by evenly dividing the range */
    unsigned long start = ((END - START) / THREADS) * id;
    unsigned long end = start + ((END - START) / THREADS) - 1;

    /* the last thread needs to do all remaining ones */
    if (id == (THREADS - 1)) {
        end = END;
    }

    /* do the calculation */
    unsigned long i;
    for (i = start; i <= end; i++) {
        sum += i;
    }
}

int main () {
#pragma omp parallel num_threads(THREADS)
    sum_part();

    /* now all results are in */
    printf("Final answer = %lu.\n", sum);
    return 0;
}

The problem is that all the threads are accessing the global variable sum. This can be remedied by placing the critical section in a #pragma omp critical directive:


    /* do the calculation */
    unsigned long i;
    for (i = start; i <= end; i++) {
    #pragma omp critical
        sum += i;
    }

OpenMP will create a mutex and lock/unlock it automatically.

OpenMP vs. Pthreads

OpenMP and Pthreads both allow for shared-memory multi-threaded programs. Pthreads is a low-level library which gives you direct control over how and when threads are started, and access to low-level communication primitives.

OpenMP is a higher level system which makes the most common multi-threading tasks much easier, but is not as flexible as using Pthreads or other native thread systems.