Professional Documents
Culture Documents
KTH
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 1 / 61
Outline
1 Parallel Programming
OpenMP
Message-Passing Interface
Erlang
Haskell
2 Real-Time Programming
RTOS
Introduction to Real-Time Java
Ada
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 2 / 61
Outline
1 Parallel Programming
OpenMP
Message-Passing Interface
Erlang
Haskell
2 Real-Time Programming
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 3 / 61
Solve a Problem in Parallel
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 4 / 61
Problems with Parallelization
no program can run more quickly than the longest chain of dependent
calculations; Bernstein’s conditions says Pi and Pj can run in parallel
if: (Ii & Oi are inputs and outputs of program fraction Pi )
Ij ∩ Oi = (no flow dependency)
Ii ∩ Oj = (no anti-dependency)
Oi ∩ Oj = (no output dependency)
race conditions happens when multiple threads need to update a
shared variable
locks are used to provide mutual exclusion
locks can greatly slow down a program
locking multiple variable without atomic locks can produce deadlock
barriers are used when subtasks of a program need to act in synchrony
overhead of communication between threads may dominate the time
spent on solving the problem
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 5 / 61
Classifications
Flynn’s taxonomy distinguishes parallel computer architectures using
two independent dimensions of Instruction and Data
SISD an entirely sequential computer
SIMD processor arrays, vector pipelines, GPUs, etc.
MISD few application examples exist (multiple parallel filters)
MIMD most common type of modern computers
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 6 / 61
Classifications
Flynn’s taxonomy distinguishes parallel computer architectures using
two independent dimensions of Instruction and Data
SISD an entirely sequential computer
SIMD processor arrays, vector pipelines, GPUs, etc.
MISD few application examples exist (multiple parallel filters)
MIMD most common type of modern computers
applications are classified according to how often their subtasks need
to synchronize to:
Fine-grained more than multiple times per second
Coarse-grained less than multiple times per second
Embarrassingly parallel rarely need communication
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 6 / 61
Classifications
Flynn’s taxonomy distinguishes parallel computer architectures using
two independent dimensions of Instruction and Data
SISD an entirely sequential computer
SIMD processor arrays, vector pipelines, GPUs, etc.
MISD few application examples exist (multiple parallel filters)
MIMD most common type of modern computers
applications are classified according to how often their subtasks need
to synchronize to:
Fine-grained more than multiple times per second
Coarse-grained less than multiple times per second
Embarrassingly parallel rarely need communication
parallelism can occur in different levels
Bit-level more bit width
Instruction-level pipelines, superscalar processors
Data inherent in program loops
Task entirely different calculations on same or different data
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 6 / 61
Memory Architectures
Shared Memory all processors can access all memory as global address
space; does not scale well
Uniform Memory Access (UMA)
Non-Uniform Memory Access (NUMA)
Distributed Memory Distributed memory systems require a
communication network to connect inter-processor memory;
programmer needs to do explicit communication
Hybrid the shared memory component is usually a cache coherent
SMP machine; the distributed memory component is the
networking of multiple SMPs
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 7 / 61
Parallel Programming Models
The most known and used models for programming parallel systems, based
on the assumption they make about their underlying architecture is:
Shared memory threads communicate using shared variables by means of
synchronizations facilities; implemented in POSIX threads
and OpenMP
Message Passing a set of tasks with their own local memory exchange
information and synchronize by sending and receiving
messages; implemented in MPI
Hybrid models of two above approaches are commonly used
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 8 / 61
Parallel Programming Models
The most known and used models for programming parallel systems, based
on the assumption they make about their underlying architecture is:
Shared memory threads communicate using shared variables by means of
synchronizations facilities; implemented in POSIX threads
and OpenMP
Message Passing a set of tasks with their own local memory exchange
information and synchronize by sending and receiving
messages; implemented in MPI
Hybrid models of two above approaches are commonly used
Some approaches need explicit declaration of parallelism, but there are
parallelizing compilers that can generated parallel codes:
Fully automatic candidates are loops and independent sections; limited
success
Programmer directed the programmer provides directives or programmer
flags to assist the compiler
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 8 / 61
Designing Parallel Programs
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 9 / 61
Designing Parallel Programs
Partitioning
domain Decomposition
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 9 / 61
Designing Parallel Programs (contd.)
Communication
cost
latency vs.bandwidth
Synchronous vs. asynchronous
(blocking vs. non-blocking)
Scope of communications
(point-to-point, collaborative)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 10 / 61
Designing Parallel Programs (contd.)
Communication
cost
latency vs.bandwidth
Synchronous vs. asynchronous
(blocking vs. non-blocking) Synchronization
Scope of communications Barrier
(point-to-point, collaborative)
Lock/semaphore
Synchronous
communication
operations
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 10 / 61
Introduction to OpenMP
Directives format:
#pragma omp directive name [clause, ...] newline
Required for all OpenMP A valid OpenMP direc- Optional. Clauses can be Required. Precedes the
C/C++ directives tive. Must appear after in any order, and repeated structured block which is
the pragma and before any as necessary unless other- enclosed by this directive
clauses wise restricted
#pragma omp parallel [clause ...] newline a team of threads will be created
if (scalar_expression)
private (list)
shared (list)
implied barrier at the end
default (shared | none)
firstprivate (list) nested regions supported
reduction (operator: list)
copyin (list) illegal to branch in or out
num_threads (integer-expression)
structured_block
region shouldn’t span multiple
routines or files
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 12 / 61
Parallel Region Example
every thread executes all the code enclosed in the parallel section
OpenMP library routines are used to obtain thread identifiers and
total number of threads
#include <omp.h>
main () {
int nthreads, tid;
/* Fork a team of threads with each thread having a private tid variable */
#pragma omp parallel private(tid)
{
/* Obtain and print thread id */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 13 / 61
Work Sharing Constructs
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 14 / 61
Synchronization Constructs
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 15 / 61
Data Scope Attribute Clauses
Since shared memory programming is all about shared variables, data scoping is a
very important concept
the private clause declares variables to be private to each thread
shared declares variables to be shared among all threads in the team
default allows specifying a default scope for all variables in a parallel region
the firstprivate clause combines the behavior of private clause with
automatic initialization of the variables in a provided list
the lastprivate clause clause combines behavior of the private clause
with a copy from the last loop operation or section to the original variable
object
the copyin clause provides a means for assigning the same value to
threadprivate variables for all threads in the team
the copyprivate clause can be used to broadcast values aquired by a single
thread directly to all instances of the private variables in the other threads
the reduction clause performs a reduction on the variables that appear in
its list
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 16 / 61
Example: Vector Dot Product
#include <omp.h>
main () {
int i, n, chunk;
float a[100], b[100], result;
iterations of the parallel loop
/* Some initializations */
will be distributed in equal sized n = 100;
chunk = 10;
blocks to each thread in the result = 0.0;
for (i=0; i < n; i++)
team {
a[i] = i * 1.0;
at the end of the parallel loop b[i] = i * 2.0;
}
construct, all threads will add #pragma omp parallel for \
their values of result to default(shared) private(i)
schedule(static,chunk)
\
\
update the master thread’s reduction(+:result)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 17 / 61
Introduction to MPI
Background
is a standard for a message passing library jointly developed by vendors,
researchers, library developers and users which claims to be portable,
efficient, and flexible
by itself, it is a specification not a library
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 18 / 61
Introduction to MPI
Background
is a standard for a message passing library jointly developed by vendors,
researchers, library developers and users which claims to be portable,
efficient, and flexible
by itself, it is a specification not a library
Programming Model
lends itself to virtually any
distributed memory parallel
programming model
it is also used in shared memory
architectures (SMP/NUMA)
behind the scene
all parallelism is explicit
the number of tasks dedicated to
run a parallel program is static
(relaxed in MPI-2)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 18 / 61
Program Environment
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 19 / 61
Point-to-Point Communication
occur between only two different MPI tasks and could be
synchronous send
blocking send/blocking receive
non-blocking send non-blocking receive
buffered send
combined send/receive
any type of send can be combined with any type of receive
buffering in system buffer space deals with storing data when two
tasks are out of sync and its behavior implementation defined
a blocking send/receive only returns when it is safe to modify/use the
application buffer
in non-blocking operations return immediately and its the user duty
to check/wait for completion of the operation before manipulating
the buffers (introduces the possibility to overlap communication and
computation)
MPI guarantees the correct ordering of messages but not fairness
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 20 / 61
Point-to-Point Communication Routines
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 21 / 61
Example: Nearest Neighbor Exchange in Ring Topology
#include "mpi.h"
#include <stdio.h>
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
prev = rank-1;
next = rank+1;
if (rank == 0) prev = numtasks - 1;
if (rank == (numtasks - 1)) next = 0;
{ do some work }
MPI_Finalize();
}
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 22 / 61
Collective Communication
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 23 / 61
Collective Communication Routines
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 24 / 61
More in MPI
using drived data types the user can define customized data types
(contiguous, vector, indexed, and struct)
dynamically manage groups and communicator objects to organize
tasks, enable collaborative operations on subsets of tasks
virtual topologies describe a mapping/ordering of MPI processes into
a geometric shape (Cartesian, graph, etc.)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 25 / 61
More in MPI
using drived data types the user can define customized data types
(contiguous, vector, indexed, and struct)
dynamically manage groups and communicator objects to organize
tasks, enable collaborative operations on subsets of tasks
virtual topologies describe a mapping/ordering of MPI processes into
a geometric shape (Cartesian, graph, etc.)
Added in MPI-2:
dynamic processes supported
one-sided communication for shared memory operations (put/get)
and remote accumulate operations
extended collective operations (non-blocking supported)
parallel I/O
external interfaces such as for debuggers and profilers
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 25 / 61
Erlang
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 26 / 61
Sequential Erlang
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 27 / 61
Concurrent Erlang
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 28 / 61
Abstracting Protocols
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 29 / 61
Standard Behaviours
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 30 / 61
Worker Processes
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 31 / 61
Some notes on Erlang
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 33 / 61
Parallel and Concurrent Programming in Haskell
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 34 / 61
The GHC Runtime
GHC runtime supports millions of lightweight threads
they are multiplexed to real OS threads (app. one for each CPU)
automatic thread migration and load balancing (work-stealing)
parallel garbage collector in 6.12
runtime settings
• Compile with
• -threaded -O2
• Run with
• +RTS -N2
• +RTS -N4
• ...
• +RTS -N64
• ...
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 35 / 61
Semi-Explicit Parallelism with Sparks
par :: a → b → b pseq :: a → b → b
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 36 / 61
Putting it Together
f `par` e `pseq` f + e
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 37 / 61
Explicit Parallelism with Threads and Shared Memory
forkIO :: IO () → IO ThreadId
import Control.Concurrent
import System.Directory threads scheduled preemptively
non-deterministic scheduling:
main = do random interleaving
forkIO (writeFile "xyz"
"thread was here") threads may be preempted when
v ← doesFileExist "xyz" they allocate memory
print v communicate via messages or
shared memory
Non-Determinism!
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 38 / 61
Shared Memory Communication: MVars and Chans
Chans: good for unbounded
MVars are boxes. They are
numbers of shared messages
either full or empty
send and receive messages of a
put on a full MVar causes the
pipe-like structure
thread to sleep until the MVar
is empty main = do
take on an empty MVar blocks ch <- newChan
until it is full forkIO (worker ch)
The runtime will wake you up -- convert future msgs to list
when you’re needed xs <- getChanContents ch
-- lazily print as msgs arrive
do box <- newEmptyMVar mapM_ print xs
forkIO (f ‘pseq‘ putMVar box f)
e ‘pseq‘ return () worker ch = forever $ do
f <- takeMVar box v <- readFile "/proc/loadavg"
print (e + f) -- send msg back to receiver
writeChan ch v
threadDelay (10^5)
MVars can deadlock!
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 39 / 61
Transactional Memory
An optimisitic model:
transactions run inside atomic blocks assuming no conflicts
system checks consistency at the end of the transaction
retry if conflicts
requires control of side effects (handled in the type system)
each atomic block appears to work in complete isolation
data STM a
atomically :: STM a → IO a STM a is used to build up
retry :: STM a atomic blocks
orElse :: STM a → STM a transaction code can only run
→ STM a inside atomic blocks
data TVar a
newTVar :: a → STM (TVar a) orElse lets us compose atomic
readTVar :: TVar a → STM a blocks into larger pieces
writeTVar :: TVar a → a TVars are the variables the
→ STM () runtime watches for contention
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 40 / 61
Atomic Bank Transfers
Note
transactions can not have any visible side effects; the type system forces
this (atomically :: STM a → IO a)
in case this is needed, use unsafeIOToSTM :: IO a → STM a
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 41 / 61
Data Parallelism
Simple Idea
Do the same thing in parallel to every element of a large collection
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 42 / 61
Flat vs. Nested Data Parallelism
Flat data parallelism Nested data parallelism
1 Parallel Programming
2 Real-Time Programming
RTOS
Introduction to Real-Time Java
Ada
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 44 / 61
Real-Time Computing
Definition
real-time computing (RTC), is the study of hardware and software systems
that are subject to a “real-time constraint” – i.e., operational deadlines
from event to system response.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 46 / 61
Real-Time Operating System (RTOS) contd.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 47 / 61
Real-Time Java
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 48 / 61
Real-Time Java
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 48 / 61
Real-Time additions to Java
Direct Memory access similar to J2ME, more security compared to C
Asynchronous communication comes in two forms
Asynchronous event handling can schedule response to
events coming from outside JVM
Asynchronous transfer of control controlled way of safely
interrupting another thread
High-resolution timing areas that help prevent unpredictable delay in GC
Immortal memory no GC; freed at the end of program
Scoped memory used only while a process works within a
particular section of program (e.g., a method)
Real-time threads cannot be interrupted by GC; 28 levels of strictly
enforced priorities
Real-time threads are synchronized (no priority inversion)
No-heap real-time threads may immediately preempt any
GC; no reference or allocation to/in heap
allowed
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 49 / 61
Ada
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 50 / 61
Ada Syntax
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 51 / 61
Ada Tasks
task mailbox is
accept put(m: message)
entry put(m: message);
do buffer_store(m);
entry get(m: out message);
end;
end mailbox;
a task can request that a rendezvous take place immediately, or not at all
delay statements can be used to program timed entry calls
selective wait can be used to accept multiple entries depending on the
internal state of the server; put a time limit on the arrival of a call; or
shut down in case no callers are active
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 52 / 61
Ada Concurrency Model
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 53 / 61
A Generic Bounded Buffer
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 54 / 61
A Typical Producer/Consumer
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 55 / 61
Ada 95 real-time foundation
The core does not define a notion of priority, nor of priority-based queuing
or scheduling. Thus, the Real-Time System Annex defines
additional semantics and facilities
integrated priority-based interrupt handling
run-time library behavior
that support deterministic tasking via fixed-priority, preemptive scheduling
priority inheritence and immediate ceiling priority protocol (ICPP) are
included to limit blocking
a high resolution monotonic clock providing both absolute and
relative delays
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 56 / 61
Priorities
Prioriries are assigned to tasks using a pragma directive
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 58 / 61
2005 real-time enhancements (contd.)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 59 / 61
The Ravenscar Profile
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 60 / 61
References
Parallel computing - wikipedia, the free encyclopedia.
http://en.wikipedia.org/wiki/Parallel computing.
Blaise Barney and Lawrence Livermore.
Message passing interface (MPI).
https://computing.llnl.gov/tutorials/mpi/.
Blaise Barney and Lawrence Livermore.
OpenMP.
https://computing.llnl.gov/tutorials/openMP/.
Stephen A. Edwards.
Languages for Digital Embedded Systems.
Springer, 1 edition, September 2000.
Simon Peyton Jones and Satnam Singh.
A tutorial on parallel and concurrent programming in haskell.
In Advanced Functional Programming, pages 267–305. 2009.
Jim Larson.
Erlang for concurrent programming.
Commun. ACM, 52(3):48–56, 2009.
Simon Marlow, Simon Peyton Jones, and Satnam Singh.
Runtime support for multicore haskell.
SIGPLAN Not., 44(9):65–78, 2009.
Peter Mikhalenko.
Real-Time java: An introduction.
http://onjava.com/lpt/a/6580.
Bryan O’Sullivan, John Goerzen, and Don Stewart.
Real World Haskell.
O’Reilly Media, 1 edition, December 2008.
Patrick Rogers.
Programming real-time with ada 2005.
http://www.embedded.com/columns/technicalinsights/192503587? requestid=79086.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 61 / 61