You are on page 1of 69

Topic 4: Current Software Methodologies and Languages

Seyed Hosein Attarzadeh Niaki

KTH

February 23, 2010

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 1 / 61
Outline

1 Parallel Programming
OpenMP
Message-Passing Interface
Erlang
Haskell

2 Real-Time Programming
RTOS
Introduction to Real-Time Java
Ada

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 2 / 61
Outline

1 Parallel Programming
OpenMP
Message-Passing Interface
Erlang
Haskell

2 Real-Time Programming

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 3 / 61
Solve a Problem in Parallel

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 4 / 61
Problems with Parallelization

no program can run more quickly than the longest chain of dependent
calculations; Bernstein’s conditions says Pi and Pj can run in parallel
if: (Ii & Oi are inputs and outputs of program fraction Pi )
Ij ∩ Oi = (no flow dependency)
Ii ∩ Oj = (no anti-dependency)
Oi ∩ Oj = (no output dependency)
race conditions happens when multiple threads need to update a
shared variable
locks are used to provide mutual exclusion
locks can greatly slow down a program
locking multiple variable without atomic locks can produce deadlock
barriers are used when subtasks of a program need to act in synchrony
overhead of communication between threads may dominate the time
spent on solving the problem

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 5 / 61
Classifications
Flynn’s taxonomy distinguishes parallel computer architectures using
two independent dimensions of Instruction and Data
SISD an entirely sequential computer
SIMD processor arrays, vector pipelines, GPUs, etc.
MISD few application examples exist (multiple parallel filters)
MIMD most common type of modern computers

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 6 / 61
Classifications
Flynn’s taxonomy distinguishes parallel computer architectures using
two independent dimensions of Instruction and Data
SISD an entirely sequential computer
SIMD processor arrays, vector pipelines, GPUs, etc.
MISD few application examples exist (multiple parallel filters)
MIMD most common type of modern computers
applications are classified according to how often their subtasks need
to synchronize to:
Fine-grained more than multiple times per second
Coarse-grained less than multiple times per second
Embarrassingly parallel rarely need communication

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 6 / 61
Classifications
Flynn’s taxonomy distinguishes parallel computer architectures using
two independent dimensions of Instruction and Data
SISD an entirely sequential computer
SIMD processor arrays, vector pipelines, GPUs, etc.
MISD few application examples exist (multiple parallel filters)
MIMD most common type of modern computers
applications are classified according to how often their subtasks need
to synchronize to:
Fine-grained more than multiple times per second
Coarse-grained less than multiple times per second
Embarrassingly parallel rarely need communication
parallelism can occur in different levels
Bit-level more bit width
Instruction-level pipelines, superscalar processors
Data inherent in program loops
Task entirely different calculations on same or different data
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 6 / 61
Memory Architectures

Shared Memory all processors can access all memory as global address
space; does not scale well
Uniform Memory Access (UMA)
Non-Uniform Memory Access (NUMA)
Distributed Memory Distributed memory systems require a
communication network to connect inter-processor memory;
programmer needs to do explicit communication
Hybrid the shared memory component is usually a cache coherent
SMP machine; the distributed memory component is the
networking of multiple SMPs

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 7 / 61
Parallel Programming Models
The most known and used models for programming parallel systems, based
on the assumption they make about their underlying architecture is:
Shared memory threads communicate using shared variables by means of
synchronizations facilities; implemented in POSIX threads
and OpenMP
Message Passing a set of tasks with their own local memory exchange
information and synchronize by sending and receiving
messages; implemented in MPI
Hybrid models of two above approaches are commonly used

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 8 / 61
Parallel Programming Models
The most known and used models for programming parallel systems, based
on the assumption they make about their underlying architecture is:
Shared memory threads communicate using shared variables by means of
synchronizations facilities; implemented in POSIX threads
and OpenMP
Message Passing a set of tasks with their own local memory exchange
information and synchronize by sending and receiving
messages; implemented in MPI
Hybrid models of two above approaches are commonly used
Some approaches need explicit declaration of parallelism, but there are
parallelizing compilers that can generated parallel codes:
Fully automatic candidates are loops and independent sections; limited
success
Programmer directed the programmer provides directives or programmer
flags to assist the compiler
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 8 / 61
Designing Parallel Programs

Understand the problem and the


program
can it be parallelized?
where are the hotspots and
bottlenecks?
identify inhibitors to parallelism
investigate other algorithms

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 9 / 61
Designing Parallel Programs
Partitioning
domain Decomposition

Understand the problem and the


program
can it be parallelized?
where are the hotspots and
bottlenecks? functional decomposition
identify inhibitors to parallelism
investigate other algorithms

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 9 / 61
Designing Parallel Programs (contd.)
Communication
cost
latency vs.bandwidth
Synchronous vs. asynchronous
(blocking vs. non-blocking)
Scope of communications
(point-to-point, collaborative)

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 10 / 61
Designing Parallel Programs (contd.)
Communication
cost
latency vs.bandwidth
Synchronous vs. asynchronous
(blocking vs. non-blocking) Synchronization
Scope of communications Barrier
(point-to-point, collaborative)
Lock/semaphore
Synchronous
communication
operations

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 10 / 61
Introduction to OpenMP

API defined jointly by a group of hardware and software vendors


consisting compiler directives, runtime library routines, and
environment variables
provides a portable, scalable model for explicit multi-threaded, shared
memory parallelism
it provides capability to incrementally parallelize a program

it is based on a fork/join model


supports nested parallelism
supports dynamic threads
it is NOT:
for distributed memory systems
guaranteeing IO synchronization
required to check for deadlocks, race,
dependency and conflicts
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 11 / 61
OpenMP Directives

Directives format:
#pragma omp directive name [clause, ...] newline
Required for all OpenMP A valid OpenMP direc- Optional. Clauses can be Required. Precedes the
C/C++ directives tive. Must appear after in any order, and repeated structured block which is
the pragma and before any as necessary unless other- enclosed by this directive
clauses wise restricted

each directive applies to at most one succeeding statement which


must be a structured block
a PARALLEL region is a block of code that will be executed by
multiple threads

#pragma omp parallel [clause ...] newline a team of threads will be created
if (scalar_expression)
private (list)
shared (list)
implied barrier at the end
default (shared | none)
firstprivate (list) nested regions supported
reduction (operator: list)
copyin (list) illegal to branch in or out
num_threads (integer-expression)

structured_block
region shouldn’t span multiple
routines or files

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 12 / 61
Parallel Region Example

every thread executes all the code enclosed in the parallel section
OpenMP library routines are used to obtain thread identifiers and
total number of threads

#include <omp.h>

main () {
int nthreads, tid;

/* Fork a team of threads with each thread having a private tid variable */
#pragma omp parallel private(tid)
{
/* Obtain and print thread id */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);

/* Only master thread does this */


if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
} /* All threads join master thread and terminate */
}

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 13 / 61
Work Sharing Constructs

a work sharing constructs divides the execution of the enclosed region


among among the members of a team of threads
no implied barrier at the beginning, an implied barrier at the end
Do/for: share SECTIONS: breaks SINGLE: seriallizes a
iterations of a loop work into separate, section of code (just
(data parallelism) discrete sections one thread runs)

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 14 / 61
Synchronization Constructs

the master directive specifies a region of code that is to be executed


only by the master thread of the team
the critical directive specifies a region that is to be executed only
by one thread at a time
the barrier directive synchronizes all the threads in a team
the flush directive identifies a synchronization point at which the
implementation must provide a consistent view of memory
the ordered directive specifies that the iterations of the enclosed
loop will be executed in the same order as the serial loop (used within
a Do/for loop with an ordered construct
the threadprivate directive makes global file scope variables local
to each executing thread (by making multiple copies)

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 15 / 61
Data Scope Attribute Clauses
Since shared memory programming is all about shared variables, data scoping is a
very important concept
the private clause declares variables to be private to each thread
shared declares variables to be shared among all threads in the team
default allows specifying a default scope for all variables in a parallel region
the firstprivate clause combines the behavior of private clause with
automatic initialization of the variables in a provided list
the lastprivate clause clause combines behavior of the private clause
with a copy from the last loop operation or section to the original variable
object
the copyin clause provides a means for assigning the same value to
threadprivate variables for all threads in the team
the copyprivate clause can be used to broadcast values aquired by a single
thread directly to all instances of the private variables in the other threads
the reduction clause performs a reduction on the variables that appear in
its list
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 16 / 61
Example: Vector Dot Product

#include <omp.h>

main () {

int i, n, chunk;
float a[100], b[100], result;
iterations of the parallel loop
/* Some initializations */
will be distributed in equal sized n = 100;
chunk = 10;
blocks to each thread in the result = 0.0;
for (i=0; i < n; i++)
team {
a[i] = i * 1.0;
at the end of the parallel loop b[i] = i * 2.0;
}
construct, all threads will add #pragma omp parallel for \
their values of result to default(shared) private(i)
schedule(static,chunk)
\
\
update the master thread’s reduction(+:result)

global copy for (i=0; i < n; i++)


result = result + (a[i] * b[i]);

printf("Final result= %f\n",result);

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 17 / 61
Introduction to MPI
Background
is a standard for a message passing library jointly developed by vendors,
researchers, library developers and users which claims to be portable,
efficient, and flexible
by itself, it is a specification not a library

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 18 / 61
Introduction to MPI
Background
is a standard for a message passing library jointly developed by vendors,
researchers, library developers and users which claims to be portable,
efficient, and flexible
by itself, it is a specification not a library
Programming Model
lends itself to virtually any
distributed memory parallel
programming model
it is also used in shared memory
architectures (SMP/NUMA)
behind the scene
all parallelism is explicit
the number of tasks dedicated to
run a parallel program is static
(relaxed in MPI-2)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 18 / 61
Program Environment

MPI uses object called communicators or groups to define which


collection of processes communicate together
within a communicator, every processor has its own unique, integer
identifier called rank; used to
specify source and destination of messages
control program execution
MPI Init and MPI Finalize initialize and terminate the execution
environment
MPI Comm size and MPI Comm rank determine the number of
processes in the group and the rank of the calling process

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 19 / 61
Point-to-Point Communication
occur between only two different MPI tasks and could be
synchronous send
blocking send/blocking receive
non-blocking send non-blocking receive
buffered send
combined send/receive
any type of send can be combined with any type of receive
buffering in system buffer space deals with storing data when two
tasks are out of sync and its behavior implementation defined
a blocking send/receive only returns when it is safe to modify/use the
application buffer
in non-blocking operations return immediately and its the user duty
to check/wait for completion of the operation before manipulating
the buffers (introduces the possibility to overlap communication and
computation)
MPI guarantees the correct ordering of messages but not fairness
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 20 / 61
Point-to-Point Communication Routines

MPI Send and MPI Recv are blocking routines


MPI Ssend is synchronous blocking send, waits for the receiving task
to start receiving the message
MPI Bsend is buffered blocking send, where the user can allocated
required space for the message before it is delivered
MPI Sendrecv sends a messages and posts a receive before blocking
MPI Isend, MPI Irecv, MPI Issend, and MPI Ibsend are
non-blocking versions of above routines
MPI Wait blocks until a specified non-blocking operation completes
MPI Test checks the status of a non-blocking operation

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 21 / 61
Example: Nearest Neighbor Exchange in Ring Topology
#include "mpi.h"
#include <stdio.h>

int main(int argc, char *argv[])


{
int numtasks, rank, next, prev, buf[2], tag1=1, tag2=2;
MPI_Request reqs[4];
MPI_Status stats[4];

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

prev = rank-1;
next = rank+1;
if (rank == 0) prev = numtasks - 1;
if (rank == (numtasks - 1)) next = 0;

MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]);


MPI_Irecv(&buf[1], 1, MPI_INT, next, tag2, MPI_COMM_WORLD, &reqs[1]);

MPI_Isend(&rank, 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]);


MPI_Isend(&rank, 1, MPI_INT, next, tag1, MPI_COMM_WORLD, &reqs[3]);

{ do some work }

MPI_Waitall(4, reqs, stats);

MPI_Finalize();
}

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 22 / 61
Collective Communication

collective communication involves all processes in the scope of the


communicator in the form of
synchronization
data movement
collective computation
collective operations are blocking
can only be done using MPI predefined data types

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 23 / 61
Collective Communication Routines

MPI Barrier creates a barrier synchronization


MPI Bcast sends a message from root to all other processes
MPI Scatter distributes distinct messages from a source to other tasks
MPI Gather gathers distinct messages from other tasks to a single task
MPI Allgather does a concatenation of data to all tasks in a group
MPI Reduce applies a reduction to the tasks and puts the result in one
task (e.g., MAX, SUM, OR, a user defined operation, etc.)
MPI Allreduce applies a reduction and puts the result on all tasks
MPI Reduce scatter does an element-wise reduction on a vector across
the tasks and splits the result across all tasks
MPI Alltoall each task performs a scatter operation, sending a message
to all the tasks in the group in order by index

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 24 / 61
More in MPI

using drived data types the user can define customized data types
(contiguous, vector, indexed, and struct)
dynamically manage groups and communicator objects to organize
tasks, enable collaborative operations on subsets of tasks
virtual topologies describe a mapping/ordering of MPI processes into
a geometric shape (Cartesian, graph, etc.)

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 25 / 61
More in MPI

using drived data types the user can define customized data types
(contiguous, vector, indexed, and struct)
dynamically manage groups and communicator objects to organize
tasks, enable collaborative operations on subsets of tasks
virtual topologies describe a mapping/ordering of MPI processes into
a geometric shape (Cartesian, graph, etc.)
Added in MPI-2:
dynamic processes supported
one-sided communication for shared memory operations (put/get)
and remote accumulate operations
extended collective operations (non-blocking supported)
parallel I/O
external interfaces such as for debuggers and profilers

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 25 / 61
Erlang

Developed at Ericsson in late 1980s as a platform for developing soft


real-time software for managing phone switches
They needed a high level symbolic language to achieve productivity
gain which
contains primitives for concurrency
supports error recovery
has an execution model without back-tracking
has a granularity of concurrency such that one asynchronous telephony
process is represented by one process in the language

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 26 / 61
Sequential Erlang

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 27 / 61
Concurrent Erlang

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 28 / 61
Abstracting Protocols

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 29 / 61
Standard Behaviours

Erlang’s abstraction of a protocol pattern is called a behaviour.


large Erlang applications use heavy use of behaviours
direct use of message-sending or receiving is uncommon
Erlang’s OTP standard library provides three main behaviours
generic server is the most common behaviour where responses can be
delayed or delegated, calls have optional timeouts, etc.
generic finite state machine
generic event handler an event manager monitors receives events as
incoming messages and dispatches them to arbitrary
number of event handlers, each with with its own
module of callbacks and state
behaviour libraries provide functionality for dynamic debugging,
inspecting state, producing traces of messages, and statistics

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 30 / 61
Worker Processes

many applications need to


create concurrent activities on
the fly
suppose a client needs to send
calls to multiple servers, hacking
OTP’s generic server is a pain
using worker processes clients
can use receive expressions
without worrying about being
blocked

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 31 / 61
Some notes on Erlang

All errors in concurrent programming have their equivalents in Erlang:


races, deadlock, livelock, starvation, etc.
Erlang is a safe language: all run time faults result in clearly defined
behavior, usually an exception
Erlang provides two primitives for one process to notice the failure of
another
monitoring of another process creates a one-way notification of
failure
linking two processes establishes mutual notification
when a fault notification is delivered to a linked process, causes it also
to fail; but it can be configured to be sent as a message receivable by
a receive statement
robust server deployments include an external “nanny”, that monitors
running operating system process and restarts it if it fails. In Erlang it
is done with supervisor behaviour.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 32 / 61
Implementation, Performance, and Scalability

Erlang’s concurrency is build upon process spawning and message


passing; they should be of low overhead
to be portable, the Erlang emulator takes care of scheduling, memory
management, and message passing at the user level
the Erlang processor stack has no minimal size or required granularity
the Erlang emulator interprets the intermediate code produced by the
compiler
The Erlang emulator can
create a new Erlang process in less than a microsecond
run millions of processes simultaneously
run each process with less than a kilobyte of space
do message passing and context switching in hundreds of nanoseconds

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 33 / 61
Parallel and Concurrent Programming in Haskell

purity (inherent parallelism), laziness (no specific order) and types


(faster parallelism) mean we can find more parallelism in the code
in Haskell it is distinguishable:
Parallelism exploit parallel computing hardware to improve
performance for a single task
Concurrency logically independent tasks as a structuring technique
different approaches available in haskell
sparks and parallel strategies
threads, messages and shared memory
transactional memory
data parallelism

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 34 / 61
The GHC Runtime
GHC runtime supports millions of lightweight threads
they are multiplexed to real OS threads (app. one for each CPU)
automatic thread migration and load balancing (work-stealing)
parallel garbage collector in 6.12

runtime settings
• Compile with
• -threaded -O2
• Run with
• +RTS -N2
• +RTS -N4
• ...
• +RTS -N64
• ...

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 35 / 61
Semi-Explicit Parallelism with Sparks

lack of side effects makes parallelism easy


f x y = (x * y) + (y ^ 2)
almost everything could be done in parallel → too much parallelism
the idea is to let the user annotate the code for potential parallelism
it is a deterministic approach

par :: a → b → b pseq :: a → b → b

a `par` b creates a spark for a a `pseq` b evaluates a in


runtime sees a potential to the current thread
convert spark into a thread ensures work is run in the
is semantically equal to b right thread
no restrictions in usage is semantically equal to b

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 36 / 61
Putting it Together

f `par` e `pseq` f + e

one spark created for f


f spark converted to a thread and executed
e evaluated in current thread in parallel with f
Threadscope helps think and evaluate spark code

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 37 / 61
Explicit Parallelism with Threads and Shared Memory

For stateful or imperative programs, we need explicit threads, not


speculative sparks.

forkIO :: IO () → IO ThreadId

Takes a block of code to run, and executes it in a new Haskell thread

import Control.Concurrent
import System.Directory threads scheduled preemptively
non-deterministic scheduling:
main = do random interleaving
forkIO (writeFile "xyz"
"thread was here") threads may be preempted when
v ← doesFileExist "xyz" they allocate memory
print v communicate via messages or
shared memory
Non-Determinism!
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 38 / 61
Shared Memory Communication: MVars and Chans
Chans: good for unbounded
MVars are boxes. They are
numbers of shared messages
either full or empty
send and receive messages of a
put on a full MVar causes the
pipe-like structure
thread to sleep until the MVar
is empty main = do
take on an empty MVar blocks ch <- newChan
until it is full forkIO (worker ch)
The runtime will wake you up -- convert future msgs to list
when you’re needed xs <- getChanContents ch
-- lazily print as msgs arrive
do box <- newEmptyMVar mapM_ print xs
forkIO (f ‘pseq‘ putMVar box f)
e ‘pseq‘ return () worker ch = forever $ do
f <- takeMVar box v <- readFile "/proc/loadavg"
print (e + f) -- send msg back to receiver
writeChan ch v
threadDelay (10^5)
MVars can deadlock!

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 39 / 61
Transactional Memory
An optimisitic model:
transactions run inside atomic blocks assuming no conflicts
system checks consistency at the end of the transaction
retry if conflicts
requires control of side effects (handled in the type system)
each atomic block appears to work in complete isolation

data STM a
atomically :: STM a → IO a STM a is used to build up
retry :: STM a atomic blocks
orElse :: STM a → STM a transaction code can only run
→ STM a inside atomic blocks
data TVar a
newTVar :: a → STM (TVar a) orElse lets us compose atomic
readTVar :: TVar a → STM a blocks into larger pieces
writeTVar :: TVar a → a TVars are the variables the
→ STM () runtime watches for contention
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 40 / 61
Atomic Bank Transfers

transfer :: TVar Int -> TVar Int -> Int -> IO ()


transfer from to amount =
atomically $ do
balance <- readTVar from
if balance < amount
then retry
else do
writeTVar from (balance - amount)
tobalance <- readTVar to
writeTVar to (tobalance + amount)

Note
transactions can not have any visible side effects; the type system forces
this (atomically :: STM a → IO a)
in case this is needed, use unsafeIOToSTM :: IO a → STM a
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 41 / 61
Data Parallelism

Simple Idea
Do the same thing in parallel to every element of a large collection

If a program can be expressed this way, then,


no explicit threads or communication (simplicity)
clear cost model (unlike `par`)
good locality, easy partitioning
Adds parallel array syntax:
[: e :]
along with many parallel combinators (mapP, filterP, zipP, . . . )

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 42 / 61
Flat vs. Nested Data Parallelism
Flat data parallelism Nested data parallelism

sumsq :: [: Float :] -> Float type Vector = [: Float :]


sumsq a = sumP [:x*x| x <- a:] type Matrix = [: Vector :]
dotp :: [:Float:] -> [:Float:] matMul :: Matrix -> Vector
-> Float -> Vector
dotp v w = sumP (zipWithP (*) matMul m v = [: vecMul r v |
v w) r <- m :]

break array into N chunks each element of a parallel


(for N cores) computation may in turn be a
run a sequential loop to nested parallel computation
apply f to each chunk GHC implements a vectorizer
element flattens nested data, changing
run that loop on each core representations, automatically
combine the results
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 43 / 61
Outline

1 Parallel Programming

2 Real-Time Programming
RTOS
Introduction to Real-Time Java
Ada

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 44 / 61
Real-Time Computing

Definition
real-time computing (RTC), is the study of hardware and software systems
that are subject to a “real-time constraint” – i.e., operational deadlines
from event to system response.

Often addressed in the context of real-time operating systems, and


synchronous programming languages
Definition
A system is said to be real-time if the total correctness of an operation
depends not only upon its logical correctness, but also upon the time in
which it is performed.

in a hard real-time system, the completion of an operation after its


deadline is useless
a soft real-time system on the other hand will tolerate such lateness
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 45 / 61
Real-Time Operating System (RTOS)

real-time operating systems offer programmers more control over


process priorities
the variability in the amount of time it takes to accept and complete
an application’s task is called jitter ; important to be near zero for
hard real-time systems
two approaches:
Event-driven switches tasks only when an event of higher priority
needs service called priority scheduling
Time-sharing switch tasks on a regular clock interrupt, and on events
called round robin
some algorithms used in RTOS scheduling are:
cooperative multitasking
preemptive scheduling
earliest deadline first (EDF) approach

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 46 / 61
Real-Time Operating System (RTOS) contd.

interprocess communication should be enabled among tasks


shared memory using locks and semaphores (danger of priority
inversion, deadlock!)
message passing using queues (danger of priority inversion!)
memory allocation speed is important; usually needs to be fixed time

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 47 / 61
Real-Time Java

Enabling real-time programming for Java needs addressing of:


the behavior of garbage collector which may introduce unpredicted
delays
lack of a strict priority based threading model
no priorities means no way to avoid priority inversion protocols
high resolution timing management

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 48 / 61
Real-Time Java

Enabling real-time programming for Java needs addressing of:


the behavior of garbage collector which may introduce unpredicted
delays
lack of a strict priority based threading model
no priorities means no way to avoid priority inversion protocols
high resolution timing management
As a result, the Java community defined a Real-Time Specification for
Java (RTSJ)
JVM enhancements and new API set
intended only for suitable underlying OS (e.g., QNX)
existing J2SE applications can still run under Java RTS
is already being used by U.S. Navy, Boeing and others

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 48 / 61
Real-Time additions to Java
Direct Memory access similar to J2ME, more security compared to C
Asynchronous communication comes in two forms
Asynchronous event handling can schedule response to
events coming from outside JVM
Asynchronous transfer of control controlled way of safely
interrupting another thread
High-resolution timing areas that help prevent unpredictable delay in GC
Immortal memory no GC; freed at the end of program
Scoped memory used only while a process works within a
particular section of program (e.g., a method)
Real-time threads cannot be interrupted by GC; 28 levels of strictly
enforced priorities
Real-time threads are synchronized (no priority inversion)
No-heap real-time threads may immediately preempt any
GC; no reference or allocation to/in heap
allowed
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 49 / 61
Ada

Ada is an imperative, strongly typed, block-structured language


designed by US DoD (1983) for construction of large, complex,
mission critical software.
the language is designed to avoid expensive implicit storage
manipulation operations; heap storage allocation is explicit
definition of Ada includes precise description of compilation issues,
and of the interaction between applications and libraries (not left as a
part of environment/OS)
Ada 95 provided a concurrent programming environment for
real-time systems using fixed-priority, preemptive scheduling
Ada 2005 includes new dispatching policies (e.g., non-preemptive,
round-robin, and EDF), timing events, Ravenscar profile, and more

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 50 / 61
Ada Syntax

Ada is strongly typed


subtypes can be used to set constraints on types
scalar types include integer, enumerated, floating and fixed-point
composite types are arrays and records
access types are strongly typed pointers in Ada
expressions are based on standard arithmetic and boolean operations
Ada’s statements are assignments, case statements, loop, exit
statements, blocks, and gotos
subprograms have three modes for parameter passing, designated in,
in out, and out; subprograms can be overloaded
packages separate definition of interfaces and implementation,
support abstract data types and information hiding, and are basic
structuring mechanism for large systems

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 51 / 61
Ada Tasks

Ada tasks are threads which describe concurrent computations


task specification provides the public interface in form of task entries
an entry names an action that a task will perform on behalf of the caller;
they also act as a synchronization mechanism (rendezvous)
communication is asymmetric; the caller names the server explicitly in the
call, while the server accepts them freely

task mailbox is
accept put(m: message)
entry put(m: message);
do buffer_store(m);
entry get(m: out message);
end;
end mailbox;

a task can request that a rendezvous take place immediately, or not at all
delay statements can be used to program timed entry calls
selective wait can be used to accept multiple entries depending on the
internal state of the server; put a time limit on the arrival of a call; or
shut down in case no callers are active
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 52 / 61
Ada Concurrency Model

Ada has a core and several annexes


The Core is required by all implementations and contains definition of
all language constructs
The Annexes define additional facilities in form of packages and pragmas
but never new syntax
The definition of concurrency model is included in the core in form of:
Tasks representing threads of control
Protected Objects provide mutual exclusion and condition synchronization;
they are passive and do not have separate thread of control

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 53 / 61
A Generic Bounded Buffer

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 54 / 61
A Typical Producer/Consumer

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 55 / 61
Ada 95 real-time foundation

The core does not define a notion of priority, nor of priority-based queuing
or scheduling. Thus, the Real-Time System Annex defines
additional semantics and facilities
integrated priority-based interrupt handling
run-time library behavior
that support deterministic tasking via fixed-priority, preemptive scheduling
priority inheritence and immediate ceiling priority protocol (ICPP) are
included to limit blocking
a high resolution monotonic clock providing both absolute and
relative delays

These facilities provide off-line schedulability analysis

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 56 / 61
Priorities
Prioriries are assigned to tasks using a pragma directive

assigning priority to tasks configuring behavior of the


task Producer is run-time library
pragma Priority (10); pragma Task_Dispatching_Policy
end Producer; (FIFO_Within_Priorities);
pragma Locking_Policy
task Consumer is (Ceiling_Locking);
pragma Priority (10); pragma Queuing_Policy
end Consumer; (Priority_Queuing);

priorities for protected objects would be assigned in accordance with


the ceiling priority protocol
low-level tasking control and synchronization (semaphore like objects,
asynchronous resuming/suspension of tasks, etc.) are available for
extreme needs
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 57 / 61
2005 real-time enhancements

multiple inheritance support added


new dispatching policies (specified to the pragma
Task Dispatching Policy)

assigning priority to tasks


Non_Preemptive_FIFO_Within_Priorities
Round_Robin_Within_Priorities
EDF_Across_Priorities

combining dispatching policies based on priority bands


pragma Priority_Specific_Dispatching (
FIFO_Within_Priorities, 9, 20);
pragma Priority_Specific_Dispatching (
Round_Robin_Within_Priorities, 1, 8);

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 58 / 61
2005 real-time enhancements (contd.)

timing events are conceptually lightweight interrupts generated by the


arrival of points in time
execution-time monitoring control is used to monitor the execution
time (CPU time) of tasks
execution time events are similar in concept and interface to timing
events except that they use execution time instead of wall-clock time
facilities to allocate and monitor a budgeted execution time for a
group of tasks as a whole

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 59 / 61
The Ravenscar Profile

an analyzable subset of Ada tasking, suitable for hard real-time and


high-integrity applications
the tasking model is suitable for one processor using fixed-priority,
preemptive dispatching.
there are fixed number of tasks which never terminate
there are two kind of tasks: time-triggered (periodic) and
event-triggered (sporadic)
task do not communicate directly (via rendezvous) and do not
interact with the control flow of other tasks
communication is done indirect using shared variables encapsulated
within protected objects

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 60 / 61
References
Parallel computing - wikipedia, the free encyclopedia.
http://en.wikipedia.org/wiki/Parallel computing.
Blaise Barney and Lawrence Livermore.
Message passing interface (MPI).
https://computing.llnl.gov/tutorials/mpi/.
Blaise Barney and Lawrence Livermore.
OpenMP.
https://computing.llnl.gov/tutorials/openMP/.
Stephen A. Edwards.
Languages for Digital Embedded Systems.
Springer, 1 edition, September 2000.
Simon Peyton Jones and Satnam Singh.
A tutorial on parallel and concurrent programming in haskell.
In Advanced Functional Programming, pages 267–305. 2009.
Jim Larson.
Erlang for concurrent programming.
Commun. ACM, 52(3):48–56, 2009.
Simon Marlow, Simon Peyton Jones, and Satnam Singh.
Runtime support for multicore haskell.
SIGPLAN Not., 44(9):65–78, 2009.
Peter Mikhalenko.
Real-Time java: An introduction.
http://onjava.com/lpt/a/6580.
Bryan O’Sullivan, John Goerzen, and Don Stewart.
Real World Haskell.
O’Reilly Media, 1 edition, December 2008.
Patrick Rogers.
Programming real-time with ada 2005.
http://www.embedded.com/columns/technicalinsights/192503587? requestid=79086.

Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 61 / 61

You might also like