You are on page 1of 43

Parallel Computing

Thread programming with Pthreads


Thorsten Grahs, 20. April 2015

Table of contents

Basics
Processes & threads
Posix threads
Managing threads
Mutually exclusive threads

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 2

Parallel programming patterns


Parallel programs consists of a
collection of tasks.
In general, these task should executed
on processes or threads on multiple
processors.
To handle such processes/tasks,
special coordination and control
structures are needed.
The simplest/low level structure is a thread

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 3

Shared Memory Programming


Shared memory programming can be done on a system
which has more than one computing units (core) sharing
the same physical memory.
The data between different computing units is shared in
the form of shared variables.
There are many tools (Application Programming
Interfaces or API) like
OpenMP
pthreads
ITBB ( Intel Threading Building Blocks )

for supporting this type of programming model


Today we focus on pthreads
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 4

Threads
Smallest unit of processing
that a scheduler work on.
Threads executing under
the same process share
the address space
They are light-weighted i.e.
easy to create
Context switching between
threads is very fast.

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 5

Processes & threads


The building blocks of a Linux system are processes.
Each process has it own data, instructions and memory
space.
Threads are sub-units of processes and easy to create
(because no data protection is needed) and share
memory space.
The ability of threads to run simultaneously can be used to
do many independent tasks concurrently.
Threading API provide tools to assign id to threads,
share data and instruction and for synchronization.
In general, multi-threading problems follow the fork-join
model.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 6

Programming model

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 7

Linux processes
A process on a Linux system is an object through which
the resources used by a program like memory, processor
time and IO are managed and monitored.
Processes are building blocks of any Linux system and
can run in the kernel space or in the user space.
A process consists of an address space
(a set of memory pages) and a set of data structures.
The address space of a process contains the code and
libraries that the process is executing,the process
variables, its stacks, and various extra information needed
by the kernel while the process is running.
Some of the common Linux command to monitor and
manage processes are ps -ef, top, strace, kill etc.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 8

What are threads?


Threads are light weight
sub-processes within a process.
F.i. opening a new tab in a
browser launches a new thread.
Threads inherit many attributes of
a the parent process
(such as the process address
space).

Try of a definition
A thread is a sequence of related instructions executed
independently of other instruction sequences
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 9

Threads and parallel programming


Why treads are important for parallel computing?
There are light-weighted independent stream of
instructions which could easily create concurrency.
There offer a natural programming model
for shared address spaces.
They are building blocks one of the basic concepts of
parallel APIs.
OpenMP is based on cooperating threads running
simultaneously on multiple processors ore cores.
Multi-threading is a building block for modern computer
architectures
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 10

Multi threads
If we run multiple threads concurrently,
we are doing multi-threading i.e. we run
several tasks in parallel
In a multi-thread process the processor
can switch execution resources
between threads, resulting in
concurrent execution.
Concurrency on a single processor (core) system
indicates that more than one thread is making progress,
but the threads are not actually running simultaneously.
On a multi-core system each thread in the process can
run concurrently on a separate core i.e., true parallelism.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 11

Posix Threads (PThreads)


Portable Operating System Interface Threads
pthreads (POSIX threads) is an API or library, which can be
used for shared memory/address space programming.
Low-level threading libraries for C programming on Linux
Provides more than one 100 functions to manage threads
However, less than 10 are in general commonly used.
When a thread is created, a new thread of control is
added to the current process.
Each thread in the process runs simultaneously, and has
access to the calling processs global data.
In addition each thread has its own private attributes and
call stack.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 12

Managing pthreads
To create a new thread, a running thread calls the
pthread_create() function, and passes a pointer to a
function for the new thread to run.
One argument for the new threads function can also be
passed, along with thread attributes.
The execution of a thread begins with the successful
return from the pthread create() function.
The thread ends when the function that was called with
the thread completes normally.
A thread can also be terminated if the thread calls a
pthread_exit() routine, or if any other thread calls
pthread_cancel() to explicitly terminate that thread.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 13

Example: Creating threads | pt_join.c


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

// EXAMPEL pthread : create & join Shows how data can be returned by a subprogram
#include <stdio.h>
#include <pthread.h>
void print_char (void ch){
int i ;
for ( i =0; i <10; i ++)
printf ( "%c", (char)ch);
return NULL; }
int main ()
{
char ch1=, ch2=;
pthread_t p1, p2;
pthread_create (&p1, NULL, print_char, &ch1);
pthread_create (&p2, NULL, print_char, &ch2);
pthread_join (p1, NULL);
pthread_join (p2, NULL);
printf ( " \n") ;
return 0;
}

Compile with

gcc pt_join.c -lpthread -o ptjoin

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 14

Example: Creating threads


In this program the main thread, which is started when
the program is executed, creates two additional threads:
p1
p2

Both threads loops through the function print_char


where they print - and * respectively.
When every thread has printed 10 signs, they return.
The main thread wait for them returning, prints a newline
and finishes
./ptjoin
----------**********

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 15

pthread_create
int pthread_create (pthread_t *th, pthread_attr_t *attr,void *(*start_routine)(void*),void *arg);

First argument point to an allocated memory space for the


thread (here: &p1 to pthread_t p1).
Second argument attr may hold arguments for the create
thread, Passing NULL will cause standard properties.
Third argument void *(*start_routine)(void*) is a
pointer to a function, which will be started by the thread
Fourth argument arg is passed as a argument to this
function

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 16

Example: Creating threads


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

/
EXAMPEL pthread : return
This programs shows how data can be returned by a subprogram
which every threads exectute in parallel .
/
#include <pthread.h>
#include <stdio.h>
#include<stdlib .h>
void child_thread( void param ){
long id , jd ;
id = (long )param;
jd = id id;
return (void )jd ;
}
int main(int argc, char argv[]) {
long i ;
pthread_t thread;
long return_value;
int num_threads;

Listing 1: pt_return.c
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 17

Example: Creating threads


24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46 }

if (argc < 2){


fprintf ( stderr , " ./ return_pthreads
return(1);
}

<# threads>\n");

num_threads = atoi(argv[1]);
thread = (pthread_t )malloc(num_threadssizeof(pthread_t));
return_value=(long )malloc(num_threadssizeof(long ));
for ( i =0; i <num_threads; i++ ){
pthread_create(&thread[i],NULL,&child_thread,(void)i );
}
for ( i =0; i <num_threads ; i++ ) {
pthread_join(thread[ i ], (void)&return_value[i] ) ;
printf ( "input = %ld output=%ld \n",i, return_value[i] );
}
return (0) ;

Listing 2: pt_return.c cont.


20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 18

Example: Creating threads


Compile with
gcc pt_return.c -lpthread -o ptreturn
$ ./ptreturn 10
input = 0 output=0
input = 1 output=1
input = 2 output=4
input = 3 output=9
input = 4 output=16
input = 5 output=25
input = 6 output=36
input = 7 output=49
input = 8 output=64
input = 9 output=81

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 19

pthread_exit
Terminate the calling thread
void pthread_exit (void *retval);

The pthread_exit() function terminates the calling thread


It returns a value via retval
If the thread is joinable, the return value is available to
another thread in the same process
A thread could also be canceled by another one (later).

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 20

Join with a terminated thread


Joining is one way to accomplish synchronization between
threads.
This could be accomplished by the routine pthread_join
For example:

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 21

pthread_join
int pthread_join (pthread_t th,void **returnval);

The function waits for the thread th to terminate.


pthread_join()suspends execution of calling thread until
target threads completes execution.
The thread specified by thread must be joinable.
If returnval!=NULL, returnval holds the return value
from thread th
If multiple threads simultaneously try to join with the same
thread, the results are undefined.

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 22

Shared memory model

All threads have access to the same global, shared


memory
Threads also have their own private data
Programmers are responsible for synchronizing access
(protecting) globally shared data.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 23

Sharing data
Threads can collaborate on the same data
(shared memory)
Suppose that two (or more) concurrently running threads
both try to increment an integer variable I:
thread 1: I=I+2
thread 2: I=I+3

This is an legitimate activity if the variable is an


accumulator for values computed by independent
processes.
Could this be problematic?

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 24

Race condition
It is problematic!
The result of these updates depends on the order of
execution:

This situation is called a


race condition
Could we avoid this behaviour?
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 25

Spawning threads
We already mentioned that threads share the same
memory space.
This could cause some troubles.
In the following are some examples:
(details are not SO important)

Spawning threads
In the following example
1. some threads are created
2. the update a shared variable
3. The Threads join again
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 26

forking
function adder()
joining

Spawning threads
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

#include < stdlib .h>


#include <stdio.h>
#include <pthread.h>
int sum = 0;
void adder() {
sum = sum +1;
return ;
}
#define NTHREADS 50
int main () {
int i ;
pthread_t th [NTHREADS];
printf ( " forking \n") ;
for ( i =0; i <NTHREADS; i++)
if (pthread_create(&th[i ], NULL,&adder,NULL) != 0) return i+1;
printf ( " joining \n") ;
for ( i =0; i <NTHREADS; i++)
if (pthread_join(th [ i ], NULL) != 0) return NTHREADS+i+1;
printf ( "Sum computed: %d\n", sum);
return 0;
}

Listing 3: spawnThreads.c
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 27

Spawning threads
1
2
3
4
5

$ gcc -o spawnThreads -pthread spawnThreads.c


$ ./spawnThreads
forking
joining
Sum computed: 50

This expected result is a coincidence.


It only happens because updating the variable is much
quicker than creating threads
If we artificially increase the time for the update, we will no
longer get the right result

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 28

Spawning threads modified


We chance the adder function:
1
2 void adder() {
3
int t = sum;
4
sleep(1);
5
sum = t +1;
6
return ;
7 }

Listing 4: spawnThreadsMod.c (adder)


1
2
3
4
5

$ gcc -o spawnThreadsMod -pthread spawnThreadsMod.c


$ ./spawnThreadsMod
forking
joining
Sum computed: 1

Now all threads read out the value of sum, wait a while
(presumably calculating something) and then update,
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 29

Mutually exclusive
We can fix this behaviour by having a lock on the code region
that should be mutually exclusive
7 pthread_mutex_t lock;
8
9 void adder() {
10
int t , r ;
11
pthread_mutex_lock(&lock);
12
t = sum;
13
sleep(1);
14
sum = t +1;
15
pthread_mutex_unlock(&lock);
16
return ;
22
23

pthread_t th [NTHREADS];
pthread_mutex_init(&lock,NULL);

Listing 5: spawnThreadsMutex.c
1
2
3
4
5

$ gcc -o mutex -pthread spawnThreadsMutex.c


$ ./mutex
forking
joining
Sum computed: 50
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 30

Mutex variables
Mutex
Mutex is an abbreviation for mutual exclusion.
They can be used to prevent raceconditions.
Mutex variables are one of the primary means of
implementing thread synchronization and for protecting
shared data when multiple writes occur.
A mutex variable acts like a lockprotecting access to a
shared data resource.
The basic concept is that only one thread can lock
(or own) a mutex variable at any given time.
Thus, even if several threads try to lock a mutex only one
thread will be successful.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 31

Mutex sequence
A typical sequence in the use of a mutex is as follows:
Create and initialize a mutex variable
Several threads attempt to lock the mutex
Only one succeeds and that thread owns the mutex
The owner thread performs some set of actions
The owner unlocks the mutex
Another thread acquires the mutex and repeats the
process
Finally the mutex is destroyed

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 32

Locks
When your code accesses some memory, you lock it up:
mutex: the lock.
critical section: the code locked with a
mutex.
Now if a thread wants to run this code, he needs the key. So
only one thread can run the code at a time:

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 33

Critical section
Code in a critical section

Have some privacy!


20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 34

Mutex creation
Declaration
Mutex variables must be declared with type
pthread_mutex_t
They must be initialized before they can be use
Initialization
There are two ways to initialize a mutex variable:
Statically, when it is declared. For example:
pthread_mutex_t mymutex =
PTHREAD_MUTEX_INITIALIZER;
Dynamically, with the pthread_mutex_init() routine.

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 35

Mutex locking
pthread_mutex_lock()
This routine is used by a thread to acquire a lock on the
specified mutex variable.
If the mutex is already locked by another thread, this call
will block the calling thread until the mutex is unlocked.
pthread_mutex_trylock()
This routine will attempt to lock a mutex.
However, if the mutex is already locked, the routine will
return immediately with a busy error code.
Useful in preventing deadlock conditions,
as in a priority-inversion situation.
pthread_mutex_unlock()
This routine unlock a mutex if called by the owning thread.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 36

Example: Scalar product


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

//
// This program computes the scalar product of two vectors using pthreads.
#include
#include
#include
#include

<pthread.h>
<sys/time.h>
< stdlib .h>
<stdio.h>

int time_estimate(struct timeval t1 , struct timeval t2) {


float t ;
t = (t2 .tv_sect1.tv_sec) + (t2.tv_usect1.tv_usec)/1000000.0;
printf ( "Time [seconds]= %12.6f \n",t);
return (0) ;
}
typedef struct {
int thread_id;
int num_threads;
int nsize;
} thread_data;
float vecA, vecB,global_sum ;
int vec_length;
pthread_mutex_t mutex_sum = PTHREAD_MUTEX_INITIALIZER;

Listing 6: scalarProd.c
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 37

Example: Scalar product


26 / This is the function which every threads computes in parallel /
27
28 void VecSum(void arg){
29
thread_data p = (thread_data )arg;
30
int myid = p>thread_id;
31
int nthreads=p>num_threads;
32
int n = p>nsize;
33
int i ;
34
float local_sum;
35
36
for ( i =myidn/nthreads; i < (myid+1)n/nthreads; i++)
37
local_sum+=vecA[i] vecB[i];
38
39
fprintf (stdout, "local_sum=%.2f\n",local_sum);
40
41
/updating global sum using mutex lock /
42
43
pthread_mutex_lock(&mutex_sum);
44
global_sum += local_sum;
45
pthread_mutex_unlock(&mutex_sum);
46
47
return ;
48 }

Listing 7: scalarProd (cont.)


20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 38

Example: Scalar product | main


51 int main(int argc, char argv[]) {
52
int ret_count;
53
pthread_t threads;
54
pthread_attr_t pta;
55
int i , num_threads, vec_length;
56
thread_data arg;
57
struct timeval t1 , t2 ;
58
struct timezone tzp;
59
60
if (argc < 2){ fprintf ( stderr , " ./ scalar_prod_pthreads
<vect size> <num threads>\n"); return(1); }
61
62
vec_length = abs(atoi(argv [1]) ) ;
63
num_threads = abs(atoi(argv[2]) ) ;
64
65
vecA = ( float ) malloc(sizeof( float ) vec_length);
66
vecB = ( float ) malloc(sizeof( float ) vec_length);
67
arg = (thread_data )malloc(num_threadssizeof(thread_data));
68
69
pthread_attr_init (&pta);
70
71
threads = (pthread_t ) malloc(sizeof(pthread_t) num_threads);
72
73
for ( i =0; i < vec_length; i ++){
74
vecA[i] = ( float ) i +1;
75
vecB[i] = 1.0;
76
}

Listing 8: scalarProd (cont.)


20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 39

Example: Scalar product


77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

/ Now find out the starting time /


gettimeofday (&t1, &tzp);
/Thread Creation /
for ( i = 0; i < num_threads;i++) {
arg[ i ]. thread_id = i ;
arg[ i ]. num_threads = num_threads;
arg[ i ]. nsize = vec_length;
ret_count=pthread_create(&threads[i],NULL,VecSum, (void ) (arg+i));
}
/ joining thread /
for ( i = 0; i < num_threads;i++)
ret_count=pthread_join(threads[i ], NULL);
/ Find out the final time /
gettimeofday (&t2, &tzp);
printf ( "A
= ");
for ( i =0; i < vec_length; i ++)
fprintf (stdout, " %.2f ",vecA[i]) ;
printf ( " \n") ;

Listing 9: scalarProd (cont.)


20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 40

Example: Scalar product


103
104
105
106
107
108
109
110
111
112
113
114
115 }

printf ( "B
= ");
for ( i =0; i < vec_length; i ++)
printf ( " %.2f ",vecB[i]) ;
printf ( " \n") ;
printf ( "A.B =

%.6f\n ",global_sum);

ret_count=pthread_attr_destroy(&pta);
time_estimate(t1,t2) ;
return (0) ;

Listing 10: scalarProd (cont.)


1
2
3
4
5
6
7
8
9

$ gcc -o scalar -pthread scalarProd.c


$ ./scalar 10 3
local_sum=6.00
local_sum=15.00
local_sum=34.00
A = 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00
B = 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
A.B = 55.000000
Time [seconds]=
0.000171
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 41

Resume
Why treads are important for parallel computing?
Threads are independent stream of instructions which
could easily create concurrency.
There are light-weighted i.e. easy to create with less
overhead
There offer a natural programming model for shared
address spaces.
OpenMP is based on cooperating threads running
simultaneously on multiple processors ore cores.
OpenMP will be covered in the next lectures

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 42

Further readings
POSIX Threads Programming
Blaise Barney, Lawrence Livermore National Laboratory
https://computing.llnl.gov/tutorials/pthreads
Advanced Linux Programming
Coude sourcery
Chapter 4: Threads
http://www.advancedlinuxprogramming.com
Parallel Programming
T. Rauber & G. Rnger
Chapter 6: Thread programming
Springer 2007.

20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 43

You might also like