Current Software Methodologies and Languages

Topic 4: Current Software Methodologies and Languages
Seyed Hosein Attarzadeh Niaki
KTH
February 23, 2010
Seyed Hosein Attarzadeh Niaki (KTH) Topic 4: Current Software Methodologies and Languages February 23, 2010 1 / 61
Outline
1 Parallel Programming
OpenMP
Message-Passing Interface
Erlang
Haskell
2 Real-Time Programming
RTOS
Introduction to Real-Time Java
Ada
Outline
OpenMP
Message-Passing Interface
Erlang
Haskell
Solve a Problem in Parallel
Problems with Parallelization
no program can run more quickly than the longest chain of dependent
calculations; Bernstein’s conditions says Pi and Pj can run in parallel
if: (Ii & Oi are inputs and outputs of program fraction Pi )
Ij ∩ Oi = (no flow dependency)
Ii ∩ Oj = (no anti-dependency)
Oi ∩ Oj = (no output dependency)
race conditions happens when multiple threads need to update a
shared variable
locks are used to provide mutual exclusion
locks can greatly slow down a program
locking multiple variable without atomic locks can produce deadlock
barriers are used when subtasks of a program need to act in synchrony
overhead of communication between threads may dominate the time
spent on solving the problem
Classifications
Flynn’s taxonomy distinguishes parallel computer architectures using
two independent dimensions of Instruction and Data
SISD an entirely sequential computer
SIMD processor arrays, vector pipelines, GPUs, etc.
MISD few application examples exist (multiple parallel filters)
MIMD most common type of modern computers
Classifications
applications are classified according to how often their subtasks need
to synchronize to:
Fine-grained more than multiple times per second
Coarse-grained less than multiple times per second
Embarrassingly parallel rarely need communication
Classifications
applications are classified according to how often their subtasks need
to synchronize to:
Fine-grained more than multiple times per second
Coarse-grained less than multiple times per second
Embarrassingly parallel rarely need communication
parallelism can occur in different levels
Bit-level more bit width
Instruction-level pipelines, superscalar processors
Data inherent in program loops
Task entirely different calculations on same or different data
Memory Architectures
Shared Memory all processors can access all memory as global address
space; does not scale well
Uniform Memory Access (UMA)
Non-Uniform Memory Access (NUMA)
Distributed Memory Distributed memory systems require a
communication network to connect inter-processor memory;
programmer needs to do explicit communication
Hybrid the shared memory component is usually a cache coherent
SMP machine; the distributed memory component is the
networking of multiple SMPs
Parallel Programming Models
The most known and used models for programming parallel systems, based
on the assumption they make about their underlying architecture is:
Shared memory threads communicate using shared variables by means of
synchronizations facilities; implemented in POSIX threads
and OpenMP
Message Passing a set of tasks with their own local memory exchange
information and synchronize by sending and receiving
messages; implemented in MPI
Hybrid models of two above approaches are commonly used
Parallel Programming Models
The most known and used models for programming parallel systems, based
on the assumption they make about their underlying architecture is:
Shared memory threads communicate using shared variables by means of
synchronizations facilities; implemented in POSIX threads
and OpenMP
Message Passing a set of tasks with their own local memory exchange
information and synchronize by sending and receiving
messages; implemented in MPI
Hybrid models of two above approaches are commonly used
Some approaches need explicit declaration of parallelism, but there are
parallelizing compilers that can generated parallel codes:
Fully automatic candidates are loops and independent sections; limited
success
Programmer directed the programmer provides directives or programmer
flags to assist the compiler
Designing Parallel Programs
Understand the problem and the

program
can it be parallelized?
where are the hotspots and
bottlenecks?
identify inhibitors to parallelism
investigate other algorithms
Designing Parallel Programs
Partitioning
domain Decomposition
Understand the problem and the

program
can it be parallelized?
where are the hotspots and
bottlenecks? functional decomposition
identify inhibitors to parallelism
investigate other algorithms
Designing Parallel Programs (contd.)
Communication
cost
latency vs.bandwidth
Synchronous vs. asynchronous
(blocking vs. non-blocking)
Scope of communications
(point-to-point, collaborative)
Designing Parallel Programs (contd.)
Communication
cost
latency vs.bandwidth
Synchronous vs. asynchronous
(blocking vs. non-blocking) Synchronization
Scope of communications Barrier
(point-to-point, collaborative)
Lock/semaphore
Synchronous
communication
operations
Introduction to OpenMP
API defined jointly by a group of hardware and software vendors

consisting compiler directives, runtime library routines, and
environment variables
provides a portable, scalable model for explicit multi-threaded, shared
memory parallelism
it provides capability to incrementally parallelize a program
it is based on a fork/join model

supports nested parallelism
supports dynamic threads
it is NOT:
for distributed memory systems
guaranteeing IO synchronization
required to check for deadlocks, race,
dependency and conflicts
OpenMP Directives
Directives format:
#pragma omp directive name [clause, ...] newline
Required for all OpenMP A valid OpenMP direc- Optional. Clauses can be Required. Precedes the
C/C++ directives tive. Must appear after in any order, and repeated structured block which is
the pragma and before any as necessary unless other- enclosed by this directive
clauses wise restricted
each directive applies to at most one succeeding statement which

must be a structured block
a PARALLEL region is a block of code that will be executed by
multiple threads
#pragma omp parallel [clause ...] newline a team of threads will be created
if (scalar_expression)
private (list)
shared (list)
implied barrier at the end
default (shared | none)
firstprivate (list) nested regions supported
reduction (operator: list)
copyin (list) illegal to branch in or out
num_threads (integer-expression)
structured_block
region shouldn’t span multiple
routines or files
Parallel Region Example
every thread executes all the code enclosed in the parallel section
OpenMP library routines are used to obtain thread identifiers and
total number of threads
#include <omp.h>
main () {
int nthreads, tid;
/* Fork a team of threads with each thread having a private tid variable */
#pragma omp parallel private(tid)
{
/* Obtain and print thread id */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);
/* Only master thread does this */

if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
} /* All threads join master thread and terminate */
}
Work Sharing Constructs
a work sharing constructs divides the execution of the enclosed region

among among the members of a team of threads
no implied barrier at the beginning, an implied barrier at the end
Do/for: share SECTIONS: breaks SINGLE: seriallizes a
iterations of a loop work into separate, section of code (just
(data parallelism) discrete sections one thread runs)
Synchronization Constructs
the master directive specifies a region of code that is to be executed

only by the master thread of the team
the critical directive specifies a region that is to be executed only
by one thread at a time
the barrier directive synchronizes all the threads in a team
the flush directive identifies a synchronization point at which the
implementation must provide a consistent view of memory
the ordered directive specifies that the iterations of the enclosed
loop will be executed in the same order as the serial loop (used within
a Do/for loop with an ordered construct
the threadprivate directive makes global file scope variables local
to each executing thread (by making multiple copies)
Data Scope Attribute Clauses
Since shared memory programming is all about shared variables, data scoping is a
very important concept
the private clause declares variables to be private to each thread
shared declares variables to be shared among all threads in the team
default allows specifying a default scope for all variables in a parallel region
the firstprivate clause combines the behavior of private clause with
automatic initialization of the variables in a provided list
the lastprivate clause clause combines behavior of the private clause
with a copy from the last loop operation or section to the original variable
object
the copyin clause provides a means for assigning the same value to
threadprivate variables for all threads in the team
the copyprivate clause can be used to broadcast values aquired by a single
thread directly to all instances of the private variables in the other threads
the reduction clause performs a reduction on the variables that appear in
its list
Example: Vector Dot Product
#include <omp.h>
main () {
int i, n, chunk;
float a[100], b[100], result;
iterations of the parallel loop
/* Some initializations */
will be distributed in equal sized n = 100;
chunk = 10;
blocks to each thread in the result = 0.0;
for (i=0; i < n; i++)
team {
a[i] = i * 1.0;
at the end of the parallel loop b[i] = i * 2.0;
}
construct, all threads will add #pragma omp parallel for \
their values of result to default(shared) private(i)
schedule(static,chunk)
\
\
update the master thread’s reduction(+:result)
global copy for (i=0; i < n; i++)

result = result + (a[i] * b[i]);
printf("Final result= %f\n",result);
Introduction to MPI
Background
is a standard for a message passing library jointly developed by vendors,
researchers, library developers and users which claims to be portable,
efficient, and flexible
by itself, it is a specification not a library
Introduction to MPI
Background
is a standard for a message passing library jointly developed by vendors,
researchers, library developers and users which claims to be portable,
efficient, and flexible
by itself, it is a specification not a library
Programming Model
lends itself to virtually any
distributed memory parallel
programming model
it is also used in shared memory
architectures (SMP/NUMA)
behind the scene
all parallelism is explicit
the number of tasks dedicated to
run a parallel program is static
(relaxed in MPI-2)
Program Environment
MPI uses object called communicators or groups to define which

collection of processes communicate together
within a communicator, every processor has its own unique, integer
identifier called rank; used to
specify source and destination of messages
control program execution
MPI Init and MPI Finalize initialize and terminate the execution
environment
MPI Comm size and MPI Comm rank determine the number of
processes in the group and the rank of the calling process
Point-to-Point Communication
occur between only two different MPI tasks and could be
synchronous send
blocking send/blocking receive
non-blocking send non-blocking receive
buffered send
combined send/receive
any type of send can be combined with any type of receive
buffering in system buffer space deals with storing data when two
tasks are out of sync and its behavior implementation defined
a blocking send/receive only returns when it is safe to modify/use the
application buffer
in non-blocking operations return immediately and its the user duty
to check/wait for completion of the operation before manipulating
the buffers (introduces the possibility to overlap communication and
computation)
MPI guarantees the correct ordering of messages but not fairness
Point-to-Point Communication Routines
MPI Send and MPI Recv are blocking routines

MPI Ssend is synchronous blocking send, waits for the receiving task
to start receiving the message
MPI Bsend is buffered blocking send, where the user can allocated
required space for the message before it is delivered
MPI Sendrecv sends a messages and posts a receive before blocking
MPI Isend, MPI Irecv, MPI Issend, and MPI Ibsend are
non-blocking versions of above routines
MPI Wait blocks until a specified non-blocking operation completes
MPI Test checks the status of a non-blocking operation
Example: Nearest Neighbor Exchange in Ring Topology
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])

{
int numtasks, rank, next, prev, buf[2], tag1=1, tag2=2;
MPI_Request reqs[4];
MPI_Status stats[4];
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
prev = rank-1;
next = rank+1;
if (rank == 0) prev = numtasks - 1;
if (rank == (numtasks - 1)) next = 0;
MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]);

MPI_Irecv(&buf[1], 1, MPI_INT, next, tag2, MPI_COMM_WORLD, &reqs[1]);
MPI_Isend(&rank, 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]);

MPI_Isend(&rank, 1, MPI_INT, next, tag1, MPI_COMM_WORLD, &reqs[3]);
{ do some work }
MPI_Waitall(4, reqs, stats);
MPI_Finalize();
}
Collective Communication
collective communication involves all processes in the scope of the

communicator in the form of
synchronization
data movement
collective computation
collective operations are blocking
can only be done using MPI predefined data types
Collective Communication Routines
MPI Barrier creates a barrier synchronization

MPI Bcast sends a message from root to all other processes
MPI Scatter distributes distinct messages from a source to other tasks
MPI Gather gathers distinct messages from other tasks to a single task
MPI Allgather does a concatenation of data to all tasks in a group
MPI Reduce applies a reduction to the tasks and puts the result in one
task (e.g., MAX, SUM, OR, a user defined operation, etc.)
MPI Allreduce applies a reduction and puts the result on all tasks
MPI Reduce scatter does an element-wise reduction on a vector across
the tasks and splits the result across all tasks
MPI Alltoall each task performs a scatter operation, sending a message
to all the tasks in the group in order by index
More in MPI
using drived data types the user can define customized data types
(contiguous, vector, indexed, and struct)
dynamically manage groups and communicator objects to organize
tasks, enable collaborative operations on subsets of tasks
virtual topologies describe a mapping/ordering of MPI processes into
a geometric shape (Cartesian, graph, etc.)
More in MPI
using drived data types the user can define customized data types
(contiguous, vector, indexed, and struct)
dynamically manage groups and communicator objects to organize
tasks, enable collaborative operations on subsets of tasks
virtual topologies describe a mapping/ordering of MPI processes into
a geometric shape (Cartesian, graph, etc.)
Added in MPI-2:
dynamic processes supported
one-sided communication for shared memory operations (put/get)
and remote accumulate operations
extended collective operations (non-blocking supported)
parallel I/O
external interfaces such as for debuggers and profilers
Erlang
Developed at Ericsson in late 1980s as a platform for developing soft

real-time software for managing phone switches
They needed a high level symbolic language to achieve productivity
gain which
contains primitives for concurrency
supports error recovery
has an execution model without back-tracking
has a granularity of concurrency such that one asynchronous telephony
process is represented by one process in the language
Sequential Erlang
Concurrent Erlang
Abstracting Protocols
Standard Behaviours
Erlang’s abstraction of a protocol pattern is called a behaviour.

large Erlang applications use heavy use of behaviours
direct use of message-sending or receiving is uncommon
Erlang’s OTP standard library provides three main behaviours
generic server is the most common behaviour where responses can be
delayed or delegated, calls have optional timeouts, etc.
generic finite state machine
generic event handler an event manager monitors receives events as
incoming messages and dispatches them to arbitrary
number of event handlers, each with with its own
module of callbacks and state
behaviour libraries provide functionality for dynamic debugging,
inspecting state, producing traces of messages, and statistics
Worker Processes
many applications need to

create concurrent activities on
the fly
suppose a client needs to send
calls to multiple servers, hacking
OTP’s generic server is a pain
using worker processes clients
can use receive expressions
without worrying about being
blocked
Some notes on Erlang
All errors in concurrent programming have their equivalents in Erlang:

races, deadlock, livelock, starvation, etc.
Erlang is a safe language: all run time faults result in clearly defined
behavior, usually an exception
Erlang provides two primitives for one process to notice the failure of
another
monitoring of another process creates a one-way notification of
failure
linking two processes establishes mutual notification
when a fault notification is delivered to a linked process, causes it also
to fail; but it can be configured to be sent as a message receivable by
a receive statement
robust server deployments include an external “nanny”, that monitors
running operating system process and restarts it if it fails. In Erlang it
is done with supervisor behaviour.
Implementation, Performance, and Scalability
Erlang’s concurrency is build upon process spawning and message

passing; they should be of low overhead
to be portable, the Erlang emulator takes care of scheduling, memory
management, and message passing at the user level
the Erlang processor stack has no minimal size or required granularity
the Erlang emulator interprets the intermediate code produced by the
compiler
The Erlang emulator can
create a new Erlang process in less than a microsecond
run millions of processes simultaneously
run each process with less than a kilobyte of space
do message passing and context switching in hundreds of nanoseconds
Parallel and Concurrent Programming in Haskell
purity (inherent parallelism), laziness (no specific order) and types

(faster parallelism) mean we can find more parallelism in the code
in Haskell it is distinguishable:
Parallelism exploit parallel computing hardware to improve
performance for a single task
Concurrency logically independent tasks as a structuring technique
different approaches available in haskell
sparks and parallel strategies
threads, messages and shared memory
transactional memory
data parallelism
The GHC Runtime
GHC runtime supports millions of lightweight threads
they are multiplexed to real OS threads (app. one for each CPU)
automatic thread migration and load balancing (work-stealing)
parallel garbage collector in 6.12
runtime settings
• Compile with
• -threaded -O2
• Run with
• +RTS -N2
• +RTS -N4
• ...
• +RTS -N64
• ...
Semi-Explicit Parallelism with Sparks
lack of side effects makes parallelism easy

f x y = (x * y) + (y ^ 2)
almost everything could be done in parallel → too much parallelism
the idea is to let the user annotate the code for potential parallelism
it is a deterministic approach
par :: a → b → b pseq :: a → b → b
a `par` b creates a spark for a a `pseq` b evaluates a in

runtime sees a potential to the current thread
convert spark into a thread ensures work is run in the
is semantically equal to b right thread
no restrictions in usage is semantically equal to b
Putting it Together
f `par` e `pseq` f + e
one spark created for f

f spark converted to a thread and executed
e evaluated in current thread in parallel with f
Threadscope helps think and evaluate spark code
Explicit Parallelism with Threads and Shared Memory
For stateful or imperative programs, we need explicit threads, not

speculative sparks.
forkIO :: IO () → IO ThreadId
Takes a block of code to run, and executes it in a new Haskell thread
import Control.Concurrent
import System.Directory threads scheduled preemptively
non-deterministic scheduling:
main = do random interleaving
forkIO (writeFile "xyz"
"thread was here") threads may be preempted when
v ← doesFileExist "xyz" they allocate memory
print v communicate via messages or
shared memory
Non-Determinism!
Shared Memory Communication: MVars and Chans
Chans: good for unbounded
MVars are boxes. They are
numbers of shared messages
either full or empty
send and receive messages of a
put on a full MVar causes the
pipe-like structure
thread to sleep until the MVar
is empty main = do
take on an empty MVar blocks ch <- newChan
until it is full forkIO (worker ch)
The runtime will wake you up -- convert future msgs to list
when you’re needed xs <- getChanContents ch
-- lazily print as msgs arrive
do box <- newEmptyMVar mapM_ print xs
forkIO (f ‘pseq‘ putMVar box f)
e ‘pseq‘ return () worker ch = forever $ do
f <- takeMVar box v <- readFile "/proc/loadavg"
print (e + f) -- send msg back to receiver
writeChan ch v
threadDelay (10^5)
MVars can deadlock!
Transactional Memory
An optimisitic model:
transactions run inside atomic blocks assuming no conflicts
system checks consistency at the end of the transaction
retry if conflicts
requires control of side effects (handled in the type system)
each atomic block appears to work in complete isolation
data STM a
atomically :: STM a → IO a STM a is used to build up
retry :: STM a atomic blocks
orElse :: STM a → STM a transaction code can only run
→ STM a inside atomic blocks
data TVar a
newTVar :: a → STM (TVar a) orElse lets us compose atomic
readTVar :: TVar a → STM a blocks into larger pieces
writeTVar :: TVar a → a TVars are the variables the
→ STM () runtime watches for contention
Atomic Bank Transfers
transfer :: TVar Int -> TVar Int -> Int -> IO ()

transfer from to amount =
atomically $ do
balance <- readTVar from
if balance < amount
then retry
else do
writeTVar from (balance - amount)
tobalance <- readTVar to
writeTVar to (tobalance + amount)
Note
transactions can not have any visible side effects; the type system forces
this (atomically :: STM a → IO a)
in case this is needed, use unsafeIOToSTM :: IO a → STM a
Data Parallelism
Simple Idea
Do the same thing in parallel to every element of a large collection
If a program can be expressed this way, then,

no explicit threads or communication (simplicity)
clear cost model (unlike `par`)
good locality, easy partitioning
Adds parallel array syntax:
[: e :]
along with many parallel combinators (mapP, filterP, zipP, . . . )
Flat vs. Nested Data Parallelism
Flat data parallelism Nested data parallelism
sumsq :: [: Float :] -> Float type Vector = [: Float :]

sumsq a = sumP [:x*x| x <- a:] type Matrix = [: Vector :]
dotp :: [:Float:] -> [:Float:] matMul :: Matrix -> Vector
-> Float -> Vector
dotp v w = sumP (zipWithP (*) matMul m v = [: vecMul r v |
v w) r <- m :]
break array into N chunks each element of a parallel

(for N cores) computation may in turn be a
run a sequential loop to nested parallel computation
apply f to each chunk GHC implements a vectorizer
element flattens nested data, changing
run that loop on each core representations, automatically
combine the results
Outline
RTOS
Introduction to Real-Time Java
Ada
Real-Time Computing
Definition
real-time computing (RTC), is the study of hardware and software systems
that are subject to a “real-time constraint” – i.e., operational deadlines
from event to system response.
Often addressed in the context of real-time operating systems, and

synchronous programming languages
Definition
A system is said to be real-time if the total correctness of an operation
depends not only upon its logical correctness, but also upon the time in
which it is performed.
in a hard real-time system, the completion of an operation after its

deadline is useless
a soft real-time system on the other hand will tolerate such lateness
Real-Time Operating System (RTOS)
real-time operating systems offer programmers more control over

process priorities
the variability in the amount of time it takes to accept and complete
an application’s task is called jitter ; important to be near zero for
hard real-time systems
two approaches:
Event-driven switches tasks only when an event of higher priority
needs service called priority scheduling
Time-sharing switch tasks on a regular clock interrupt, and on events
called round robin
some algorithms used in RTOS scheduling are:
cooperative multitasking
preemptive scheduling
earliest deadline first (EDF) approach
Real-Time Operating System (RTOS) contd.
interprocess communication should be enabled among tasks

shared memory using locks and semaphores (danger of priority
inversion, deadlock!)
message passing using queues (danger of priority inversion!)
memory allocation speed is important; usually needs to be fixed time
Real-Time Java
Enabling real-time programming for Java needs addressing of:

the behavior of garbage collector which may introduce unpredicted
delays
lack of a strict priority based threading model
no priorities means no way to avoid priority inversion protocols
high resolution timing management
Real-Time Java
Enabling real-time programming for Java needs addressing of:

the behavior of garbage collector which may introduce unpredicted
delays
lack of a strict priority based threading model
no priorities means no way to avoid priority inversion protocols
high resolution timing management
As a result, the Java community defined a Real-Time Specification for
Java (RTSJ)
JVM enhancements and new API set
intended only for suitable underlying OS (e.g., QNX)
existing J2SE applications can still run under Java RTS
is already being used by U.S. Navy, Boeing and others
Real-Time additions to Java
Direct Memory access similar to J2ME, more security compared to C
Asynchronous communication comes in two forms
Asynchronous event handling can schedule response to
events coming from outside JVM
Asynchronous transfer of control controlled way of safely
interrupting another thread
High-resolution timing areas that help prevent unpredictable delay in GC
Immortal memory no GC; freed at the end of program
Scoped memory used only while a process works within a
particular section of program (e.g., a method)
Real-time threads cannot be interrupted by GC; 28 levels of strictly
enforced priorities
Real-time threads are synchronized (no priority inversion)
No-heap real-time threads may immediately preempt any
GC; no reference or allocation to/in heap
allowed
Ada
Ada is an imperative, strongly typed, block-structured language

designed by US DoD (1983) for construction of large, complex,
mission critical software.
the language is designed to avoid expensive implicit storage
manipulation operations; heap storage allocation is explicit
definition of Ada includes precise description of compilation issues,
and of the interaction between applications and libraries (not left as a
part of environment/OS)
Ada 95 provided a concurrent programming environment for
real-time systems using fixed-priority, preemptive scheduling
Ada 2005 includes new dispatching policies (e.g., non-preemptive,
round-robin, and EDF), timing events, Ravenscar profile, and more
Ada Syntax
Ada is strongly typed

subtypes can be used to set constraints on types
scalar types include integer, enumerated, floating and fixed-point
composite types are arrays and records
access types are strongly typed pointers in Ada
expressions are based on standard arithmetic and boolean operations
Ada’s statements are assignments, case statements, loop, exit
statements, blocks, and gotos
subprograms have three modes for parameter passing, designated in,
in out, and out; subprograms can be overloaded
packages separate definition of interfaces and implementation,
support abstract data types and information hiding, and are basic
structuring mechanism for large systems
Ada Tasks
Ada tasks are threads which describe concurrent computations

task specification provides the public interface in form of task entries
an entry names an action that a task will perform on behalf of the caller;
they also act as a synchronization mechanism (rendezvous)
communication is asymmetric; the caller names the server explicitly in the
call, while the server accepts them freely
task mailbox is
accept put(m: message)
entry put(m: message);
do buffer_store(m);
entry get(m: out message);
end;
end mailbox;
a task can request that a rendezvous take place immediately, or not at all
delay statements can be used to program timed entry calls
selective wait can be used to accept multiple entries depending on the
internal state of the server; put a time limit on the arrival of a call; or
shut down in case no callers are active
Ada Concurrency Model
Ada has a core and several annexes

The Core is required by all implementations and contains definition of
all language constructs
The Annexes define additional facilities in form of packages and pragmas
but never new syntax
The definition of concurrency model is included in the core in form of:
Tasks representing threads of control
Protected Objects provide mutual exclusion and condition synchronization;
they are passive and do not have separate thread of control
A Generic Bounded Buffer
A Typical Producer/Consumer
Ada 95 real-time foundation
The core does not define a notion of priority, nor of priority-based queuing
or scheduling. Thus, the Real-Time System Annex defines
additional semantics and facilities
integrated priority-based interrupt handling
run-time library behavior
that support deterministic tasking via fixed-priority, preemptive scheduling
priority inheritence and immediate ceiling priority protocol (ICPP) are
included to limit blocking
a high resolution monotonic clock providing both absolute and
relative delays
These facilities provide off-line schedulability analysis
Priorities
Prioriries are assigned to tasks using a pragma directive
assigning priority to tasks configuring behavior of the

task Producer is run-time library
pragma Priority (10); pragma Task_Dispatching_Policy
end Producer; (FIFO_Within_Priorities);
pragma Locking_Policy
task Consumer is (Ceiling_Locking);
pragma Priority (10); pragma Queuing_Policy
end Consumer; (Priority_Queuing);
priorities for protected objects would be assigned in accordance with

the ceiling priority protocol
low-level tasking control and synchronization (semaphore like objects,
asynchronous resuming/suspension of tasks, etc.) are available for
extreme needs
2005 real-time enhancements
multiple inheritance support added

new dispatching policies (specified to the pragma
Task Dispatching Policy)
assigning priority to tasks

Non_Preemptive_FIFO_Within_Priorities
Round_Robin_Within_Priorities
EDF_Across_Priorities
combining dispatching policies based on priority bands

pragma Priority_Specific_Dispatching (
FIFO_Within_Priorities, 9, 20);
pragma Priority_Specific_Dispatching (
Round_Robin_Within_Priorities, 1, 8);
2005 real-time enhancements (contd.)
timing events are conceptually lightweight interrupts generated by the

arrival of points in time
execution-time monitoring control is used to monitor the execution
time (CPU time) of tasks
execution time events are similar in concept and interface to timing
events except that they use execution time instead of wall-clock time
facilities to allocate and monitor a budgeted execution time for a
group of tasks as a whole
The Ravenscar Profile
an analyzable subset of Ada tasking, suitable for hard real-time and

high-integrity applications
the tasking model is suitable for one processor using fixed-priority,
preemptive dispatching.
there are fixed number of tasks which never terminate
there are two kind of tasks: time-triggered (periodic) and
event-triggered (sporadic)
task do not communicate directly (via rendezvous) and do not
interact with the control flow of other tasks
communication is done indirect using shared variables encapsulated
within protected objects
References
Parallel computing - wikipedia, the free encyclopedia.
http://en.wikipedia.org/wiki/Parallel computing.
Blaise Barney and Lawrence Livermore.
Message passing interface (MPI).
https://computing.llnl.gov/tutorials/mpi/.
Blaise Barney and Lawrence Livermore.
OpenMP.
https://computing.llnl.gov/tutorials/openMP/.
Stephen A. Edwards.
Languages for Digital Embedded Systems.
Springer, 1 edition, September 2000.
Simon Peyton Jones and Satnam Singh.
A tutorial on parallel and concurrent programming in haskell.
In Advanced Functional Programming, pages 267–305. 2009.
Jim Larson.
Erlang for concurrent programming.
Commun. ACM, 52(3):48–56, 2009.
Simon Marlow, Simon Peyton Jones, and Satnam Singh.
Runtime support for multicore haskell.
SIGPLAN Not., 44(9):65–78, 2009.
Peter Mikhalenko.
Real-Time java: An introduction.
http://onjava.com/lpt/a/6580.
Bryan O’Sullivan, John Goerzen, and Don Stewart.
Real World Haskell.
O’Reilly Media, 1 edition, December 2008.
Patrick Rogers.
Programming real-time with ada 2005.
http://www.embedded.com/columns/technicalinsights/192503587? requestid=79086.

Current Software Methodologies and Languages

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Current Software Methodologies and Languages

Uploaded by

Copyright:

Available Formats

Topic 4: Current Software Methodologies and Languages

Seyed Hosein Attarzadeh Niaki

February 23, 2010

Understand the problem and the

Understand the problem and the

API defined jointly by a group of hardware and software vendors

it is based on a fork/join model

each directive applies to at most one succeeding statement which

/* Only master thread does this */

a work sharing constructs divides the execution of the enclosed region

the master directive specifies a region of code that is to be executed

global copy for (i=0; i < n; i++)

printf("Final result= %f\n",result);

MPI uses object called communicators or groups to define which

MPI Send and MPI Recv are blocking routines

int main(int argc, char *argv[])

MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]);

MPI_Isend(&rank, 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]);

MPI_Waitall(4, reqs, stats);

collective communication involves all processes in the scope of the

MPI Barrier creates a barrier synchronization

Developed at Ericsson in late 1980s as a platform for developing soft

Erlang’s abstraction of a protocol pattern is called a behaviour.

many applications need to

All errors in concurrent programming have their equivalents in Erlang:

Erlang’s concurrency is build upon process spawning and message

purity (inherent parallelism), laziness (no specific order) and types

lack of side effects makes parallelism easy

a `par` b creates a spark for a a `pseq` b evaluates a in

one spark created for f

For stateful or imperative programs, we need explicit threads, not

Takes a block of code to run, and executes it in a new Haskell thread

transfer :: TVar Int -> TVar Int -> Int -> IO ()

If a program can be expressed this way, then,

sumsq :: [: Float :] -> Float type Vector = [: Float :]

break array into N chunks each element of a parallel

Often addressed in the context of real-time operating systems, and

in a hard real-time system, the completion of an operation after its

real-time operating systems offer programmers more control over

interprocess communication should be enabled among tasks

Enabling real-time programming for Java needs addressing of:

Enabling real-time programming for Java needs addressing of:

Ada is an imperative, strongly typed, block-structured language

Ada is strongly typed

Ada tasks are threads which describe concurrent computations

Ada has a core and several annexes

These facilities provide off-line schedulability analysis

assigning priority to tasks configuring behavior of the

priorities for protected objects would be assigned in accordance with

multiple inheritance support added

assigning priority to tasks

combining dispatching policies based on priority bands

timing events are conceptually lightweight interrupts generated by the

an analyzable subset of Ada tasking, suitable for hard real-time and

You might also like