# Introduction to OpenMP

## Overview

OpenMP is an alternative to writing shared memory parallel programs. It is higher-level than Pthreads. It allows us to specify that a portion of our code should be executed in parallel.

OpenMP can create and join threads automatically. It also allows for programs to be parallelized incrementally.

## Hello World

A parallel hello world in OpenMP is quite simple:


#include <stdio.h>
#include <omp.h>

void hello() {
}

int main() {
hello();
return 0;
}


## Compiling OpenMP

Like Pthreads, we have to specify a compiler flag to compile OpenMP code:



## pragma parallel

OpenMP relies on #pragma directives. These are not normal function calls, but are special directions to the compiler.

When gcc sees the line "#pragma omp parallel num_threads(4)" it needs to do something special: insert code to spawn four threads.

The form of a pragma parallel is as follows:


code_block


Where N is the number of threads to start, and code_block is a block of code. This can be a single statement, or a block enclosed in curly braces.

Pragmas are one line in length, so the following will not work:


# pragma parallel

If you want a multi-line pragma, you can use the line continuation character:

# pragma parallel \


pragmas are ignored by compilers that do not support them. It is possible to write an OpenMP program such that it will compile as plain C on compilers that don't support OpenMP.

## Under the Hood

When compiling the hello world program above, gcc will produce code as follows (with most lines removed):


main:
movl	$4, %edx movl$main.omp_fn.0, %edi
call	GOMP_parallel_start
call	main.omp_fn.0
call	GOMP_parallel_end

main.omp_fn.0:
call	hello


It creates a new function "main.omp_fn" that contains the code block for the #pragma parallel.

It changes your main function so that it passes the number of threads and the address of the "main.omp_fn" into a function called GOMP_parallel_start. This function spawns the threads, and calls the "main.omp_fn" function from each.

It then calls GOMP_parallel_end which waits for all threads to join. On POSIX systems, OpenMP is built on top of Pthreads. On other systems it uses the native threading system available.

Unlike Pthreads, OpenMP keeps track of the number of threads, and individual thread id's for us. We can get these values with the functions:




## Example: Code Breaking

One area of parallel programming is code breaking. The basic idea is as follows:

1. Pick an encryption key.
2. Use it on the encrypted text.
3. Check if the decrypted text is readable.
4. If not, pick another key.

This is easily parallelizeable since multiple keys can be checked independently.

The program below implements this for the very simple Caesar cipher. It tries to decrypt the text by each of the 26 possible keys. It then searches the decrypted text for common English words.


#include <regex.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char ignores [] = ",.'!?";
char common [][8] = {"THE", "BE", "TO", "OF", "AND", "A", "IN", "THAT", "HAVE", "I",
"IT", "FOR", "NOT", "ON", "WITH", "HE", "AS", "YOU", "DO", "AT"};

unsigned char rotate(unsigned char c, int amount) {
/* skip everything non-alphabetic */
if (!isalpha(c)) {
return c;
}

/* do the shift */
unsigned char next = c + amount;

/* wrap if needed */
unsigned char last = isupper(c) ? 'Z' : 'z';
while (next > last) {
next -= 26;
}

return next;
}

void break_code(char* encrypted, int length, int amount) {
/* make space for decrypted string */
char* decrypted = malloc(length + 1);

/* rotate each character that amount */
int j;
for (j = 0; j < length; j++) {
decrypted[j] = rotate(encrypted[j], amount);
}
decrypted[j] = '\0';

/* check how many common words are in the string */
int count = 0;
for (j = 0; j < (sizeof(common) / sizeof(char*)); j++) {
/* build a regex to search for the word exactly */
char regex_string[16];
sprintf(regex_string, "\\<%s\\>", common[j]);
regex_t regex;
regcomp(&regex, regex_string, REG_ICASE);

/* search the string for our matches if present */
regmatch_t match;
int offset = 0;
/* while we have a match */
while (!regexec(&regex, decrypted + offset, 1, &match, 0)) {
/* count it an search the rest of the string */
count++;
offset += match.rm_eo;
}
}

/* if the ratio of common words to the length of the text is over .01, it
* is likely the correct decrypted text that we are looking for */
if (((double)count / (double)strlen(decrypted)) >= .01) {
printf("The decrypted text is:\n%s\n", decrypted);
}
}

int main(int argc, char** argv) {
/* be strict */
if (argc < 2) {
printf("Usage: code cipherfile\n");
return 0;
}

/* open the file */
FILE* file = fopen(argv[1], "r");
if (!file) {
printf("File '%s' could not be opened.\n", argv[1]);
return 0;
}

/* will store the encrypted string */
char* encrypted = NULL;

/* find how big the file is and allocate that much space */
fseek(file, 0, SEEK_END);
long length = ftell(file);
encrypted = malloc(length + 1);

/* go back to the start and read it all in */
fseek (file, 0, SEEK_SET);
fclose (file);

/* try each possible shift amount */
for (int i = 1; i <= 26; i++) {
break_code(encrypted, length, i);
}

return 0;
}



How can we parallelize this with OpenMP?





## Critical Sections

Critical sections are easily handled with OpenMP. Below is an OpenMP version of the sum program that currently is non-deterministic:


#include <stdlib.h>
#include <omp.h>
#include <stdio.h>

#define START 0
#define END 100000

/* a global variable */
unsigned long sum = 0;

/* the function called for each thread */
void sum_part() {
/* get our thread id */

/* calculate the start and end points by evenly dividing the range */
unsigned long start = ((END - START) / THREADS) * id;
unsigned long end = start + ((END - START) / THREADS) - 1;

/* the last thread needs to do all remaining ones */
if (id == (THREADS - 1)) {
end = END;
}

/* do the calculation */
unsigned long i;
for (i = start; i <= end; i++) {
sum += i;
}
}

int main () {
sum_part();

/* now all results are in */
return 0;
}


The problem is that all the threads are accessing the global variable sum. This can be remedied by placing the critical section in a #pragma omp critical directive:


/* do the calculation */
unsigned long i;
for (i = start; i <= end; i++) {
#pragma omp critical
sum += i;
}


OpenMP will create a mutex and lock/unlock it automatically.