You are on page 1of 14

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies

Laboratory no. 1

MPI & OpenMP recap MPI (Message Passing Interface)


MPI offers: Point-to-point communications (send, receive and their implementations) Collective operations (involving a group of processes) Process groups and communication contexts Different process topologies An MPI process group represents a collection of n processes, ranking from 0 to n-1. These processes are mapped on topological models (ex: mesh with 2 or 3 dimensions, graph etc.).

Execution model
Multiple MPI processes are run simultaneously. These processes can be identical or different and they can communicate collectively or on a point-to-point fashion. A communicator defines the communication domain. Inside this domain, each process has a rank. The communicator identifies a group of processes that can communicate with each other and also offers communication protection.
Communicator types:

intra-communicator for communication between processes belonging to the same group inter-communicator for communication between process groups MPI_Init initializes an MPI process MPI_Finalize terminates an MPI process MPI_Comm_size determines the number of processes MPI_Comm_rank determines the rank of the current process MPI_Send sends a message MPI_Receive receives a message

Core functions:

Examples:
int main(int argc, char **argv) { MPI_Init (&argc, &argv); MPI_Comm_size (MPI_COMM_WORLD, &count); MPI_Comm_rank (MPI_COMM_WORLD, &myrank); printf ("I am the process %d of %d", myrank, count); MPI_Finalize(); return 0; }

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies int main(int argc, char **argv) { MPI_Init (&argc, &argv); MPI_Comm_size (MPI_COMM_WORLD, &count); MPI_Comm_rank (MPI_COMM_WORLD, &myrank); if (myrank == 0) // master process code else // slave process code MPI_Finalize(); return 0; }

Laboratory no. 1

Communications
Communication types: Point-to-point Collective
Point-to-point communications Blocking communications Send: Standard blocking send: int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) Buffered mode blocking send: int MPI_Bsend(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) Synchronous mode blocking send: int MPI_Ssend(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) Ready send: int MPI_Rsend(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) <parameters>: [IN] buf - initial address of send buffer (choice) [IN] count - number of elements to send (nonnegative integer) [IN] datatype - datatype of each send buffer element (handle) [IN] dest - rank of destination (integer) [IN] tag - message tag (integer) [IN] comm - communicator (handle) Receive: Blocking receive: int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) <parameters>: [IN] count - maximum number of elements to receive (integer) 2

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies

Laboratory no. 1

[IN] datatype - datatype of each receive buffer entry (handle) [IN] source - rank of source (integer) [IN] tag - message tag (integer) [IN] comm - communicator (handle) [OUT] buf - initial address of receive buffer (choice) [OUT] status - status object (status) Non-blocking communications Send: int MPI_Isend(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) <parameters>: [IN] buf - initial address of send buffer (choice) [IN] count - number of elements in send buffer (integer) [IN] datatype - datatype of each send buffer element (handle) [IN] dest - rank of destination (integer) [IN] tag - message tag (integer) [IN] comm - communicator (handle) [OUT] request - communication request (handle) Receive: int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request) <parameters>: [IN] buf - initial address of receive buffer (choice) [IN] count - number of elements in receive buffer (integer) [IN] datatype - datatype of each receive buffer element (handle) [IN] source - rank of source (integer) [IN] tag - message tag (integer) [IN] comm - communicator (handle) [OUT] request - communication request (handle)

Note: In non-blocking mode, the receive operation returns even if no message has been received. In order to wait for the result, use MPI_Wait() and MPI_Test() functions. Example:
MPI_Comm_Rank (MPI_COMM_WORLD,&myrank); MPI_Status status; MPI_Request req; if (myrank == 0) { int a; MPI_Isend(&a,1,MPI_INT,msgtag,MPI_COMM_WORLD,&req); ... MPI_Wait(&req, &status); } else if (myrank == 1) { int b; MPI_Recv(&b,1,MPI_INT,0,mesgtag,MPI_COMM_WORLD,&status); }

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies MPI_Waitany(), MPI_Testany(), MPI_Testall(), MPI_Waitsome(), MPI_Testsome().

Laboratory no. 1 MPI_Waitall(),

Other

functions:

Collective communications

Collective communications imply a group of participating processes. The core functions provided by MPI libraries for collective communications are: barrier all group members are synchronized at barrier (MPI_Barrier) broadcast a group member broadcasts a message to all the other group members (MPI_Bcast) gather a group member collects data from all group members (MPI_Gather) scatter a group member spreads data to all group members (MPI_Scatter) all gather gather variation where all group members receive the data (MPI_Allgather) all-to-all scatter/gather collecting/spreading data from all group members to all group members (MPI_Alltoall) it is an extension of MPI_Allgather where each process may send distinct data to all the other processes reduce operations sum, multiply, min, max or other user-defined functions where the results are sent to all group members or only to one of them Notes: these functions are executed in a collective fashion by all processes that belong to the specified communicator; all processes use the same parameters all processes must execute the collective communications no synchronization is guaranteed, except when using the barrier there are no non-blocking collective communications there are no tags the size of receive buffers must match exactly the size of send buffers
int MPI_Barrier(MPI_Comm comm) <parameters>: [IN] comm - communicator (handle) int MPI_Bcast(void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm) <parameters>: [IN] buffer - starting address of buffer (choice) [IN] count - number of entries in buffer (integer) [IN] datatype - data type of buffer (handle) [IN] root - rank of broadcast root (integer) [IN] comm - communicator (handle)

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies

Laboratory no. 1

int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) <parameters>: [IN] sendbuf - address of send buffer (choice, significant only at root) [IN] sendcount - number of elements sent to each process (integer, significant only at root) [IN] sendtype - datatype of send buffer elements (handle, significant only at root) [IN] recvcount - number of elements in receive buffer (integer) [IN] recvtype - datatype of receive buffer elements (handle) [IN] root - rank of sending process (integer) [IN] comm - communicator (handle) [OUT] recvbuf - address of receive buffer (choice)

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies

Laboratory no. 1

int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) <parameters>: [IN] sendbuf - starting address of send buffer (choice) [IN] sendcount - number of elements in send buffer (integer) [IN] sendtype - datatype of send buffer elements (handle) [IN] recvcount - number of elements for any single receive (integer, significant only at root) [IN] recvtype - datatype of recvbuffer elements (handle, significant only at root) [IN] root - rank of receiving process (integer) [IN] comm - communicator (handle) [OUT] recvbuf - address of receive buffer (choice, significant only at root)

Examples:
// broadcast example double param; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); if(rank==5) { param=23.0; } MPI_Bcast(&param,1,MPI_DOUBLE,5,MPI_COMM_WORLD); printf("Process: %d, param: %f\n", rank, param); MPI_Finalize(); // scatter example MPI_Comm comm; int gsize,*sendbuf; int root, rbuf[100]; ... 6

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies MPI_Comm_size(comm, &gsize); sendbuf = (int *)malloc(gsize*100*sizeof(int)); ... MPI_Scatter(sendbuf, 100, MPI_INT, rbuf, 100, comm); // gather example MPI_Comm comm; int gsize,sendarray[100]; int root, *rbuf; ... MPI_Comm_size( comm, &gsize); rbuf = (int *)malloc(gsize*100*sizeof(int)); MPI_Gather(sendarray, 100, MPI_INT, rbuf, 100, comm);

Laboratory no. 1

MPI_INT,

root,

MPI_INT,

root,

MPI Data types


Note: The length of the message is specified as a number of entries, not as a number of bytes. MPI datatype MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_UNSIGNED_CHAR MPI_UNSIGNED_SHORT MPI_UNSIGNED MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_BYTE MPI_PACKED C datatype signed char signed short int signed int signed long int unsigned char unsigned short int unsigned int float double long double

MPI compilation and running


LAM/MPI compilation:
mpicc o prog prog.c OR mpic++ o prog prog.cpp

LAM/MPI running: Starting the MPI environment (localhost is implicit):


lamboot [v]

In order to run the application on multiple systems, create a file named lamhosts and insert on each line the IP address (or name) and the number of CPUs of every work station (if necessary). Example:
7

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies calc1.domeniu.ro/192.168.247.10 cpu=2 calc2.domeniu.ro/192.168.247.11 cpu=3 calc3.domeniu.ro/192.168.247.12 cpu=2 calc4.domeniu.ro/192.168.247.13

Laboratory no. 1

Use this command line:


lamboot [v] lamhosts

In order to verify if LAM was started use:


recon [-v] lamhosts OR lamnodes

Running programs:
mpirun-lam np 4 prog

On error use the following command:


lamclean [-v]

To shut down LAM:


lamwipe [v] lamhosts

Exercises:
Implement an application that creates 2 processes: one that sends data and one that receives data. The data will be represented by an array of double values (at least 2000 elements). Make 2 versions of the application: one that uses blocking communications, and one that uses non-blocking communications. Compare the results. Implement a producer/consumer application using non-blocking communications. Considering N processes where N-1 processes represent the producers and one process represents the consumer.

Bibliography:
Open MPI 1.5.4 API docs: http://www.open-mpi.org/doc/v1.5/ MPI 2.2 standard: http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies

Laboratory no. 1

OpenMP (Open Multi-Processing)


OpenMP is an API used to develop parallel (multi-threaded) applications based on shared memory. It offers support for both task level and loop level parallelism.

OpenMP model OpenMP programs display the following characteristics: all threads have access to the same shared memory there are two types of data: shared and private shared data is available to any thread in the working group private data can only be accessed by the owner thread data transfer takes place in a transparent manner for the programmer each execution has at least one synchronization mechanism

OpeMP programming model


model based on shared memory the parallelism is given by the threading model the parallelism is explicit: the programmer has a series of control mechanisms in order to fully manage the parallelization process model based on fork/join paradigm all OpenMP programs begin with a single thread, the master thread the master thread executes sequentially until a parallel region is encountered; at this point it spawns a set of worker threads using fork type calls the code sequence present in the parallel region is executed in parallel by all threads (master thread included) a synchronization operation takes place at the end of parallel sequence followed by a set of join type operations after which only the master thread remains in execution OpenMP parallelism is obtained by using compiler directives

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies

Laboratory no. 1

Execution model OpenMP API offers support for: nested parallel regions (a parallel region inside another parallel region) dynamic management of the number of threads used to execute parallel regions

OpenMP compiler directives


General syntax:
#pragma omp construct [clause [clause] ]

Using compiler directives, one can specify: how to partition the work volume among threads how to synchronize threads variable visibility (private, shared etc.)

Runtime libraries
provide a function set used to interrogate and modify the execution environment function signatures are located in <omp.h> (for C, C++)
#include <omp.h> main () { int var1, var2, *var3; // serial code ... // beginning of parallel section. forks a team of threads. #pragma omp construct [clause [clause]] { int var4, var5; // parallel section executed by all threads ... // all threads join master thread and are terminated } // resume serial code ... }

General OpenMP program structure

10

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies

Laboratory no. 1

OpenMP base directives


omp parallel #pragma omp parallel { // instructions ... } // end of parallel region

Directive syntax:
#pragma omp parallel [clause ...] newline if (scalar_expression) private (list) shared (list) default (shared | none) firstprivate (list) reduction (operator: list) copyin (list) num_threads (integerexpression) structured_block

Example:
#include <stdio.h> int main(void) { printf ("Master thread before omp par\n\n"); #pragma omp parallel { printf ("Hello from team thread\n"); } printf ("\nMaster thread after omp parallel\n"); return 0; } omp for #pragma omp parallel { #pragma omp for for (i=0; i<MAX; ++i) { res[i] = huge(); } } #pragma omp parallel for for (i=0; i<MAX; ++i) { res[i] = huge(); }

Directive syntax:
#pragma omp for [clause ...] newline schedule (type [,chunk]) ordered private (list) 11

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies firstprivate (list) lastprivate (list) shared (list) reduction (operator: list) collapse (n) nowait for_loop

Laboratory no. 1

Example:
#define N 12 ... #pragma omp parallel #pragma omp for for(i=1; i<N+1; ++i) c[i] = a[i] + b[i];

Variable visibility
A variable defined outside a parallel block is implicitly shared. A variable defined inside a parallel block is implicitly private to the declaring thread. The variable visibility can be modified using the shared and private clauses: shared (variable list) if this clause is present in a pragma omp directive, then all listed variables are shared among the threads spawned by the master thread. OpenMP API controls the variable access, but does NOT implicitly make any kind of synchronization. private (variable list) if this clause is present in a pragma omp directive, then all listed variables are private to every thread spawned by the master thread. Every listed variable will be redeclared in the private memory space of each thread using the same name and data type, WITHOUT copying the variable values from the shared memory. All modifications on these private variables are visible only to the owning threads.

12

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies

Laboratory no. 1

Collective operations

reduction (operator: variable list) this clause can be applied ONLY to

shared variables. Each listed variable is redeclared inside each thread as a private variable and it is initialized according to the next table. These private variables are modified by each thread, and at the end of parallel region they are combined according to the specified operator into a resulting value which is assigned to the shared variable. Operand Initial Operand Initial value value + 0 & 0 0 | 0 * 1 && 1 ^ 0 || 0
#pragma omp parallel for reduction(+:sum) for(i=0; i<N; ++i) { sum += a[i] * b[i]; }

The variable list cannot contain pointers, arrays, references or const variables. In C++, the operators cannot be overloaded.

Synchronization mechanisms
barrier Syntax:
#pragma omp barrier new-line

Example:
#pragma omp parallel shared (A, B, C) { DoSomeWork(A,B); // Processed A into B #pragma omp barrier DoSomeWork(B,C); // Processed B into C }

critical Syntax:
#pragma omp critical [(lock_name)]

Examples:
#include <omp.h> main() { int x = 0; #pragma omp parallel shared(x) { #pragma omp critical x = x + 1; } // end of parallel section } 13

Fundamentals of Distributed Systems Master study program: Distributed Systems and Web Technologies float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for shared(sum) for(int i=0; i<N; ++i) { #pragma omp critical sum += a[i] * b[i]; } return sum; }

Laboratory no. 1

Querying/modifying the working environment


int omp_get_thread_num(void) int omp_get_num_threads(void) void omp_set_num_threads(int) int omp_get_num_procs(void)

OpenMP compilation and running


On Linux: gcc -fopenmp sursa.c o file.bin OR g++ -fopenmp sursa.cpp o file.bin

Exercises:
Implement an application that computes the product between a matrix of NxN dimensions and a vector of dimension N. Consider N at least 1000. Run the application multiple times using a different number of threads. For each run, measure the running time.

Bibliography:
OpenMP 3.1 specifications: http://www.openmp.org/mp-documents/OpenMP3.1.pdf Ruud van der Pas, An Introduction Into OpenMP, IWOMP 2005 University of Oregon, Eugene, Oregon, USA June 1-4, 2005 Blaise Barney, OpenMP, Lawrence Livermore National Laboratory, https://computing.llnl.gov/tutorials/openMP/ Intel Software College, Programming with OpenMP http://gcc.gnu.org/onlinedocs/libgomp/Runtime-Library-Routines.html#Runtime-LibraryRoutines

14

You might also like