Unit - I

Department of Computer Science and Engineering
Subject Name: HIGH PERFORMANCE COMPUTING
Subject Code: CS T82
Prepared By:
Mr. J. Amudhavel, AP/CSE

Mr.B.Thiyagarajan, AP/CSE
Mr.D. Saravanan, AP/CSE
Verified by:
Approved by:
UNIT I
Introduction: Need of high speed computing increase the speed of computers history of parallel
computers and recent parallel computers; solving problems in parallel temporal parallelism data
parallelism comparison of temporal and data parallel processing data parallel processing with
specialized processors inter-task dependency. The need for parallel computers - models of computation
- analyzing algorithms expressing algorithms.
2 MARKS
1. What is high performance computing?
(APRIL 2014)
High Performance Computing most generally refers to the practice of aggregating computing power in a way that
delivers much higher performance than one could get out of a typical desktop computer or workstation in order to
solve large problems in science, engineering, or business.
2. Define Computing.
The process of utilizing computer technology to complete a task.

Computing may involve computer hardware and/or software, but must involve some form of a
computer system.
Most individuals use some form of computing every day whether they realize it or not.
3. What is Parallel Computing?
The simultaneous use of more than one processor or computer to solve a problem. Use of multiple
processors or computers working together on a common task.
Each processor works on its section of the problem
Processors can exchange information
4. Why do we need parallel Computing?

Serial computing is too slow
Need for large amounts of memory not accessible by a single processor
To compute beyond the limits of single PU systems:
achieve more performance;
Utilize more memory.
5. To be able to:
solve that can't be solved in a reasonable time with single PU systems;
Solve problems that dont fit on a single PU system or even a single system.
6. So we can:
Solve larger problems
Solve problems faster
Solve more problems
7. Why parallel Processing?
Single core performance growth has slowed.
More cost-effective to add multiple cores.
8. Write Limits of Parallel Computing.
Theoretical Upper Limits:
Amdahl's Law.
Gustafsons Law
Practical Limits
Load Balancing.
Non-computational sections.
Other considerations:
Time to develop/rewrite code.

Time do debug and optimize code
9. How to Classify Shared and Distributed Memory?
All processors have access to a pool of
Memory is local to each processor
shared memory
Access times vary from CPU to CPU
Data exchange by message passing
in NUMA systems
over a network
Example: SGI Altix, IBM P5 nodes
Example: Clusters with single-socket

blades
10. Define Hybrid system
A limited number, N, of processors have access to a common pool of shared memory

To use more than N processors requires data exchange over a network
Example: Cluster with multi-socket blades
11. Define Multi Core Systems.
Extension of hybrid model

Communication details increasingly complex
Cache access
Main memory access
Quick Path / Hyper Transport socket connections
Node to node connection via network
12. Define Accelerated Systems
Calculations made in both CPU and accelerator

Provide abundance of low-cost flops
Typically communicate over PCI-e bus
Load balancing critical for performance
13. What is data parallelism?
(Apr 2013, 2014)
Data parallelism is a form of parallelization of computing across multiple processors in parallel computing
environments. Data parallelism focuses on distributing the data across different parallel computing nodes. It
contrasts to task parallelism as another form of parallelism.
14. Define Stored Program Concept.
Memory is used to store both program and data instructions

Program instructions are coded data which tell the computer to do something
Data is simply information to be used by the program
A central processing unit (CPU) gets instructions and/or data from memory, decodes the instructions and then
sequentially performs them.
15. What is Parallel Computing?

Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. The
compute resources can include:
A single computer with multiple processors;
An arbitrary number of computers connected by a network;
A combination of both.
16. Why we use Parallel Computing?

There are two primary reasons for using parallel computing:
Save time - wall clock time
17. Why do we need high speed Parallel Computing?
(November 2014)
The traditional scientific paradigm is first to do theory (say on paper), and then lab experiments to confirm or deny
the theory. The traditional engineering paradigm is first to do a design (say on paper), and then build a laboratory
prototype. Both paradigms are being replacing by numerical experiments and numerical prototyping. There are
several reasons for this.
Real phenomena are too complicated to model on paper (eg. climate prediction).
Real experiments are too hard, too expensive, too slow, or too dangerous for a laboratory (eg oil reservoir
simulation, large wind tunnels, overall aircraft design, galactic evolution, whole factory or product life cycle
design and optimization, etc.).
18. How do we increase the speed of Parallel Computers?

We can increase the speed of computers in several ways.
By increasing the speed of the processing element using faster semiconductor technology (by using advanced
technology)
By architecture methods. It in turn we can increase the speed of computer by applying parallelism.
19. Use parallelism in Single processor
overlap the execution of number of instructions by pipelining or by using multiple functional units
Overlap the operation of different units.
20. Use parallelism in the problem
Use number of interconnected processors to work cooperatively to solve the problem.
21. Define Temporal Parallelism?

Use parallelism in Single processor. Temporal means Pertaining to time.
Overlap the execution of number of instructions by pipelining or by using multiple functional units
22. State the advantages & Disadvantages of temporal Parallelism?

Synchronization: Identical time
Bubbles in pipeline Bubbles are formed
Fault tolerance: does not tolerate.
Inter task communication small
Scalability: Cant be increased.
23. State the advantages & Disadvantages of Data Parallelism?

Advantages:
No Synchronization:
No Bubbles in pipeline
More Fault tolerance
No communication
Disadvantages:
Static assignment
Partitionable
Time to divide jobs is small.
24. Define Data Parallelism?

Use parallelism in the problem: use number of interconnected processors to work cooperatively to solve
the problem. Here input is divided into some set of jobs and each job is given to one Processor and works
simultaneously.
25. Define Inter task Dependency?
The following assumptions are made in evolving tasks to teacher:

The answer to a question is independent of answers to other questions
Teachers do not have to interact
The same instructions are used to grad all answer books
Tasks are inter related. Some tasks are done independently and simultaneously while others have
to wait for completion of previous tasks.
26. What is parallel Computer?
(Apr 2013 )
Parallel computing is a form of computation in which many calculations are carried out
simultaneously, operating on the principle that large problems can often be divided into smaller
ones, which are then solved concurrently.
27. Comparison Between Temporal and Data Parallelism.
TEMPORAL PARALLELISM
(November 2014)
DATA PARALLELISM
Independent task
Full jobs are assigned
Tasks take equal time
Tasks take different time
Bubbles leads to idling
No bubbles
Task assignment is static
Task assignment is static, dynamic or
Not tolerant to processor
quasi dynamic
Efficient with fine grained
Tolerates to processor
Efficient with coarse grained
28. Specify the types of parallelism that have been seen software.
Task Parallelism
Data parallelism
(November 2014)
11 MARKS
1. What is Parallel Computing and explain?
Additionally, software has been written for serial computation:
To be executed by a single computer having a single Central Processing Unit (CPU);
Problems are solved by a series of instructions, executed one after the other by the CPU. Only one
instruction may be executed at any moment in time.
In the simplest sense, parallel computing is the simultaneous use of multiple compute resources
to solve a computational problem.
The compute resources can include:
A single computer with multiple processors;
An arbitrary number of computers connected by a network;
A combination of both.
The computational problem usually demonstrates characteristics such as the ability to be:
Broken apart into discrete pieces of work that can be solved simultaneously;
Execute multiple program instructions at any moment in time;
Solved in less time with multiple compute resources than with a single compute resource.
Parallel computing is an evolution of serial computing that attempts to emulate what has always been
the state of affairs in the natural world: many complex, interrelated events happening at the same
time, yet within a sequence. Some examples:
Planetary and galactic orbits
Weather and ocean patterns
Tectonic plate drift
Rush hour traffic in LA
Automobile assembly line
Daily operations within a business
Building a shopping mall
Ordering a hamburger at the drive through.
Traditionally, parallel computing has been considered to be "the high end of computing" and has been
motivated by numerical simulations of complex systems and "Grand Challenge Problems" such as:
weather and climate
chemical and nuclear reactions
biological, human genome
geological, seismic activity
mechanical devices - from prosthetics to spacecraft

electronic circuits
manufacturing processes
Today, commercial applications are providing an equal or greater driving force in the development of
faster computers. These applications require the processing of large amounts of data in sophisticated
ways. Example applications include:
parallel databases, data mining
oil exploration
web search engines, web based business services
computer-aided diagnosis in medicine
management of national and multi-national corporations
advanced graphics and virtual reality, particularly in the entertainment industry
networked video and multi-media technologies
collaborative work environments
Ultimately, parallel computing is an attempt to maximize the infinite but seemingly scarce
commodity called time.
2. Why do we need Use Parallel Computing?
(April 2013)
There are two primary reasons for using parallel computing:

Save time - wall clock time
Other reasons might include:

Taking advantage of non-local resources - using available compute resources on a wide area
network, or even the Internet when local compute resources are scarce.
Cost savings - using multiple "cheap" computing resources instead of paying for time on a
supercomputer.
Overcoming memory constraints - single computers have very finite memory resources. For
large problems, using the memories of multiple computers may overcome this obstacle.
Limits to serial computing - both physical and practical reasons pose significant constraints to simply
building ever faster serial computers:
Transmission speeds - the speed of a serial computer is directly dependent upon how fast data
can move through hardware. Absolute limits are the speed of light (30 cm/nanosecond) and
the transmission limit of copper wire (9 cm/nanosecond). Increasing speeds necessitate
increasing proximity of processing elements.
Limits to miniaturization - processor technology is allowing an increasing number of

transistors to be placed on a chip. However, even with molecular or atomic-level components,
a limit will be reached on how small components can be.
Economic limitations - it is increasingly expensive to make a single processor faster. Using a
larger number of moderately fast commodity processors to achieve the same (or better)
performance is less expensive.
The future: during the past 10 years, the trends indicated by ever faster networks, distributed
systems, and multi-processor computer architectures (even at the desktop level) suggest that
parallelism is the future of computing
3. What is the Need of high speed computing?
The traditional scientific paradigm is first to do theory (say on paper), and then lab experiments to
confirm or deny the theory.
The traditional engineering paradigm is first to do a design (say on paper), and then build a
laboratory prototype.
Both paradigms are being replacing by numerical experiments and numerical prototyping.
There are several reasons for this.
Real phenomena are too complicated to model on paper (eg. climate prediction).
Real experiments are too hard, too expensive, too slow, or too dangerous for a laboratory (eg oil
reservoir simulation, large wind tunnels, overall aircraft design, galactic evolution, whole factory
or product life cycle design and optimization, etc.).
Scientific and engineering problems requiring the most computing power to simulate are
commonly called "Grand Challenges like predicting the climate 50 years hence, are estimated to
require computers computing at the rate of 1 Tflop = 1 Teraflop = 10^12 floating point operations
per second, and with a memory size of 1 TB = 1 Terabyte = 10^12 bytes. Here is some commonly
used notation we will use to describe problem sizes:
1 Mflop = 1 Megaflop = 10^6 floating point operations per second

1 Gflop = 1 Gigaflop = 10^9 floating point operations per second
1 Tflop = 1 Teraflop = 10^12 floating point operations per second
1 MB = 1 Megabyte = 10^6 bytes
1 GB = 1 Gigabyte = 10^9 bytes
1 TB = 1 Terabyte = 10^12 bytes
1 PB = 1 Petabyte = 10^15 bytes
How do we Increase the speed of Computers.

We can increase the speed of computers in several ways.
By increasing the speed of the processing element using faster semiconductor technology (by
using advanced technology)
By architecture methods. It in turn we can increase the speed of computer by applying
parallelism.
Use parallelism in Single processor
overlap the execution of number of instructions by pipelining or by using multiple functional
units
Use parallelism in the problem

Use number of interconnected processors to work cooperatively to solve the problem.
4. Describe about History of parallel computers:
(November 2014)
A brief history of parallel computers are given below

VECTOR SUPERCOMPUTERS:
Glory days: 76-90

Famous examples: Cray machines
Characterized by:
The fastest clock rates, because vector pipelines can be very simple.
Vector processing.
Quite good vectorizing compilers.
High price tag; small market share.
Not always scalable because of shared-memory bottleneck (vector processors need more data per
cycles than conventional processors). Vector processing is back in various forms: SIMD
extensions of commodity microprocessors (e.g. Intel's SSE), vector processors for game consoles
(Cell), multithreaded vector processors (Cray), etc.
Vector processors went down temporarily because of:
Market issues, price/performance, microprocessor revolution, commodity microprocessors.
Not enough parallelism for biggest problems. Hard to vectorize/parallelize automatically
Didn't scale down.
MPPs
Glory days: 90-96
Famous examples: Intel hypercubes and Paragon, TMC Connection Machine, IBM SP, Cray/SGI
T3E.
Characterized by:
Scalable interconnection network, up to 1000's of processors. We'll discuss these networks shortly
Commodity (or at least, modest) microprocessors.
Message passing programming paradigm.
Killed by:
Small market niche, especially as a modest number of processors can do more and more.
Programming paradigm too hard.
Relatively slow communication (especially latency) compared to ever-faster processors (this is
actually no more and no less than another example of the memory wall).
Today
A state of flux in hardware.
But more stability in software, e.g., MPI and OpenMP.
Machines are being sold, and important problems are being solved, on all of the following:
Vector SMPs, e.g., Cray X1, Hitachi, Fujitsu, NEC.
SMPs and ccNUMA, e.g., Sun, IBM, HP, SGI, Dell, hundreds of custom boxes.
Distributed memory multiprocessors, e.g., Cray XT3, IBM Blue Gene.
Clusters: Beowulf (Linux) and many manufacturers and assemblers.
A complete top-down view: At the highest level you have either a distributed memory
architecture with a scalable interconnection network, or an SMP architecture with a bus.
A distributed memory architecture may or may not provide support for a global memory
consistency model (such as cache coherence, software distributed shared memory, coherent
RDMA, etc.). On an SMP architecture you expect hardware support for cache coherence.
A distributed memory architecture can be built from SMP or even (rarely) ccNUMA boxes. Each
box is treated as a tightly coupled node (with local processors and uniformly accessed shared
memory). Boxes communicate via message passing, or (less frequently) with hardware or
software memory coherence schemes. Both on distributed and on shared memory architectures,
the processors themselves may support an internal form of task or data parallelism. Processors
may be vector processors, commodity microprocessors with multiple cores, or multiple threads
multiplexed over a single core, heterogeneous multicore processors, etc.
Programming: Typically MPI is supported over both distributed and shared-memory substrates
for portability (large existing base of code written and optimized in MPI). OpenMP and POSIX
threads are almost always available on SMPs and ccNUMA machines. OpenMP implementations
over distributed memory machines with software support for cache coherence also exist, but
scaling these implementation is hard and is a subject of ongoing research.
Future
The end of Moore's Law?
Nanoscale electronics
Exotic architectures? Quantum, DNA/molecular.
5. Explain various methods for solving problems in parallel.
(November 2014)
Solving a simple job can be solved in parallel in many ways.

Method 1: Utilizing temporal parallelism
Consider there are 1000 who appeared for the exam. There are 4 questions in each answer book. If a
teacher is to correct these answer books, the following instructions are given to them.
1. Take an answer book from the pile of answer books.

2. Correct the answer to Q1,namely A1.
3. Repeat step 2 for answers to Q2,Q3,Q4 namely A2,A3,A4.
4. Add marks.
5. Put answer book in pile of corrected answer books.
6. Repeat steps 1 to 5 until no answer books are left.
Ask 4 teachers to correct each answer book by sitting in one line.
The first teacher corrects answer Q1,namely A1 of first paper and passes the paper to the second
teacher.
When the first three papers are corrected, some are idle.
Time taken to correct A1=Time to correct A2= Time to correct A3= Time to correct A4=5
minutes. Then first answer book takes 20 min.
Total time taken to correct 1000 papers will be 20+(999*5)=5015 min.This is about 1/4th of the
time taken.
Temporal means pertaining to time.
The method is correct if:
Jobs are identical.
Independent tasks are possible.
Time is same.
No. of tasks is small compared to total no of jobs.
Let no of jobs=n
Time to do a job=p
Each job is divided into k tasks
Time for each task=p/k
Time to complete n jobs with no pipeline processing =np
Time complete n jobs with pipeline processing of k teachers=p+(n-1)p/k=p*[(k+n-1)/k]
Speedup due to pipeline processing=[np/p(k+n-1)/k]=[k/1+(k-1)/n]
Problems encountered:
Synchronization:
Identical time
Bubbles in pipeline
Bubbles are formed
Fault tolerance
Does not tolerate.
Inter task communication

small
Scalability
Cant be increased.
Method 2: Utilizing Data Parallelism

Divide the answer books into four piles and give one pile to eachteacher.
Each teacher takes 20 min to correct an answer book,the time taken for 1000 papers is 5000
min.
Each teacher corrects 250 papers but simultaneously.
Let no of jobs=n
Time to do a job=p
Let there be k teachers
Time to distribute=kq
Time to complete n jobs by single teacher=np
Time to complete n jobs by k teachers=kq+np/k

Speed up due to parallel processing=np/kq+np/k=knp/k8k*q+np=k/1+(kq/np)
Advantages:
No Synchronization:
No Bubbles in pipeline
More Fault tolerance
No communication
Disadvantages:
Static assignment
Partitionable
Time to divide jobs is small
METHOD 3: Combined Temporal and Data Parallelism:

Combining method 1 and 2 gives this method.
Two pipelines of teachers are formed and each pipeline is given half of total no of jobs.
Halves the time taken by single pipeline.
Reduces time to complete set of jobs.
Very efficient for numerical computing in which a no of long vectors and large matrices are used
as data and could be processed.
METHOD 4: Data Parallelism with Dynamic Assignment
A head examiner gives one answer book to each teacher.
All teachers simultaneously correct the paper.
A teacher who completes goes to head examiner for another paper.
If second completes at the same time, then he queues up in front of head examiner.
Advantages:
Balancing of the work assigned to each teacher.
Teacher is not forced to be idle.
No bubbles
Overall time is minimized
Disadvantages:
Teachers Have To Wait In The Queue.
Head examiner can become bottle neck
Head examiner is idle after handing the papers.
Difficult to increase the number of teachers
If speedup of a method is directly proportional to the number, then the method is said to scale well.
Let total no of papers=n
Let there be k teachers
Time waited to get paper=q
Time for each teacher to get, grade and return a paper=(q+p)
Total time to correct papers by k teachers=[n(q+p)/k]
Speed up due to parallel processing=np/[n(q+p)/k]=k/[1+(q/p)]
METHOD 5: Data Parallelism with Quasi-Dynamic Scheduling

Method 4 can be made better by giving each teacher unequal sets of papers to correct. Teacher 1,2,3,4
may be given with 7, 9, 11, 13 papers. When finish that further papers will be given. This randomizes the
job completion and reduces the probability of queue. Each job is much smaller compared to the time to
actually do the job. This method is in between purely static and purely dynamic schedule. The jobs are
coarser grain in the sense that a bunch of jobs are assigned and the completion time will be more than if
one job is assigned.
Comparison between temporal and data parallelism:

TEMPORAL PARALLELISM
(April 2013)
DATA PARALLELISM
Independent task
Full jobs are assigned
Tasks take equal time
Tasks take different time
Bubbles leads to idling
No bubbles
Task assignment is static
Task assignment is static, dynamic or
Not tolerant to processor
quasi dynamic
Efficient with fine grained
Tolerates to processor
Efficient with coarse grained
Data parallel processing with specialized processor:

Data Parallel Processing is more tolerant but requires each teacher to be capable of correcting answers to
all questions with equal case.
METHOD 6: Specialist Data Parallelism
There is a head examiner whop dispatches answer papers to teachers. We assume that teacher 1(T1)
grades A1, teacher 2(T2) grades A2 and teacher i(Ti) grades Ai to question Qi.
Procedure:
Give one answer book to T1,T2,T3,T4
When a corrected answer paper is returned check if all questions are graded. If yes add marks and
put the paper in the output pile.
If no check which questions are not graded
For each I,if Ai is ungraded and teacher Ti is idle send it to teacher Ti or if any other teacher Tp is
idle.
Repeat steps 2,3 and 4 until no answer paper remains in input pile
METHOD 7: Coarse Grained Specialist Temporal Parallelism

All teachers are independently and simultaneously at their pace. That teacher will end up spending a lot
of time inefficiently waiting for other teachers to complete their work.
Procedure:
Answer papers are divided into 4 equal piles and put in the in-trays of each teacher. Each teacher repeats
4 times simultaneously steps 1 to 5.
For teachers Ti(i=1 to 4) do in parallel
Take an answer paper from in-tray
Grade answer Ai to question Qi and put in out-tray
Repeat steps 1 and 2 till no papers are left
Check if teacher (i+1)mod4s in-tray is empty.
As soon as it is empty, empty own out-tray into in-tray of that teacher.
METHOD 8: Agenda Parallelism
Answer book is thought as an agenda of questions to be graded. All teachers are asked to work on the
first item on agenda, namely grade the answer to first question in all papers. Head examiner gives one
paper to each teacher and asks him to grade the answer A1 to Q1.When a teacher finishes this, he is given
with another paper. This is data parallel method with dynamic schedule and fine grain tasks.
6. Briefly explain about Inter Task Dependency with example
The following assumptions are made in evolving tasks to teacher:
The answer to a question is independent of answers to other questions
Teachers do not have to interact
The same instructions are used to grad all answer books
Tasks are inter related. Some tasks are done independently and simultaneously while others have to wait
for completion of previous tasks. The inter relations of various tasks of a job may be represented
graphically as a task graph.
Procedure: Recipe for Chinese vegetable fried rice:
T1: Clean and wash rice
T2: Boil water in a vessel with 1 teaspoon salt
T3: Put rice in boiling water with some oil and cook till soft
T4: Drain rice and cool
T5: Wash and scrape carrots
T6: Wash and string French beans
T7: Boil water with teaspoon salt in 2 vessels
T8: Drop carrots and French beans in boiling water
T9: Drain and cool carrots and French beans
T10: Dice carrots
T11: Dice French beans
T12: Peel onions and dice into small pieces
T13: Clean cauliflower .Cut into small pieces.
T14: Heat oil in iron pan and fry diced onion cauliflower for 1 min in heated oil
T15: Add diced carrots and French beans to above and fry for 2 min.
T16: Add cooled cooked rice, chopped onions and soya sauce to the above and stir and fry for 5 min.
There are 16 tasks in this,in that they have to be carried out in sequence. A graph showing the
relationship among the tasks is given
7. Explain the Various Computation Models?
RAM
PRAM (parallel RAM)
Interconnection Network
Combinatorial Circuits
Parallel and Distributed Computation
Many interconnected processors working concurrently
Connection machine
Internet
(April 2013)
Types of multiprocessing frameworks
Parallel
Distributed
Technical aspects
Parallel computers (usually) work in tight syncrony, share memory to a large extent and have a
very fast and reliable communication mechanism between them.
Distributed computers are more independent, communication is less Frequent and less
synchronous, and the cooperation is limited.
Purposes
Parallel computers cooperate to solve more efficiently (possibly) Difficult problems
Distributed computers have individual goals and private activities. Sometime communications
with other ones are needed. (e. G. Distributed data base operations).
The RAM Sequential Model
RAM is an acronym for Random Access Machine
RAM consists of
A memory with M locations.
Size of M can be as large as needed.

A processor operating under the control of a sequential program which can
load data from memory
store date into memory
execute arithmetic & logical computations on data.

A memory access unit (MAU) that creates a path from the processor to an arbitrary
memory location.
RAM Sequential Algorithm Steps
A READ phase in which the processor reads datum from a memory location and copies it into a
register.
A COMPUTE phase in which a processor performs a basic operation on data from one or two of
its registers.
A WRITE phase in which the processor copies the contents of an internal register into a memory
location.
Explain the PRAM Model of Computation

8. PRAM (Parallel Random Access Machine)
Let P1, P2 , ... , Pn be identical processors
Each processor is a RAM processor with a private local memory.
(April 2014)
The processors communicate using m shared (or global) memory locations, U1, U2, ..., Um.
Allowing both local & global memory is typical in model study.
Each Pi can read or write to each of the m shared memory locations.
All processors operate synchronously (i.e. using same clock), but can execute a different sequence
of instructions.
Some authors inaccurately restrict PRAM to simultaneously executing the same sequence
of instructions (i.e., SIMD fashion)
Each processor has a unique index called, the processor ID, which can be referenced by the
processors program.
Often an unstated assumption for a parallel model
Each PRAM step consists of three phases, executed in the following order:
A read phase in which each processor may read a value from shared memory
A compute phase in which each processor may perform basic arithmetic/logical operations
on their local data.
A write phase where each processor may write a value to shared memory.
Note that this prevents reads and writes from being simultaneous.
Above requires a PRAM step to be sufficiently long to allow processors to do different

arithmetic/logic operations simultaneously.
PRAM Memory Access Methods
Exclusive Read (ER): Two or more processors cannot simultaneously read the same memory
location.
Concurrent Read (CR): Any number of processors can read the same memory location
simultaneously.
Exclusive Write (EW): Two or more processors can not write to the same memory location
simultaneously.
Concurrent Write (CW): Any number of processors can write to the same memory location
simultaneously.
Variants for Concurrent Write
Priority CW: The processor with the highest priority writes its value into a memory location.
Common CW: Processors writing to a common memory location succeed only if they write the
same value.
Arbitrary CW: When more than one value is written to the same location, any one of these values
(e.g., one with lowest processor ID) is stored in memory.
Random CW: One of the processors is randomly selected write its value into memory.
PONDICHERRY UNIVERSITY QUESTIONS

2 Marks:
1.
2.
3.
4.
5.
6.
7.
8.
What is parallel computer? (April 2013) (Q. No. 26, Ref.Pg.No.6)

Define data parallelism. (April 2013) (Q. No. 13, Ref.Pg.No.4)
Compare temporal and data parallel processing. (November 2014) (Q. No. 27, Ref.Pg.No.7)
Greater the value of speedup, better the parallel algorithm justify. (November 2014)
What is high performance computing? (April 2014) (Q. No. 1, Ref.Pg.No.1)
What do you mean by data parallelism? (April 2014) (Q. No. 13, Ref.Pg.No.4)
Explain the need for high speed computing. (November 2014) (Q. No. 17, Ref.Pg.No.5)
Specify the types of parallelism that have been seen software. (November 2014) (Q. No. 28,
Ref.Pg.No.7)
11 Marks:
1. Explain about the models of Computation. (April 2013) (Q. No. 7, Ref.Pg.No.22)
2. Write a comparison of Temporal and Parallel Processing. (April 2013) (Q. No. 5, Ref.Pg.No.17)
3. Consider an examination paper has 4 questions to be answered and there are 1000 answer books.
Illustrate how data parallel processing with specialization processor is done for the above problem.
(November 2013)
4. Discuss the various abstract machine models for parallel computers in detail. (November 2013)
5. A. Compare Temporal and Data Parallelism. (April 2014) (Q. No. 5, Ref.Pg.No.17)
B. Compare BSP and PRAM.
6. Explain the PRAM Model of Computation. (April 2014) (Q. No. 8, Ref.Pg.No.20)
7. Discuss the history of past and present parallel Computers. (November 2014) (Q. No. 4,
Ref.Pg.No.11)
8. Discuss the various parallel computing models. (November 2014) (Q. No. 5, Ref.Pg.No.13)

Unit - I

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit - I

Uploaded by

Copyright:

Available Formats

Department of Computer Science and Engineering

Subject Name: HIGH PERFORMANCE COMPUTING

Subject Code: CS T82

Mr. J. Amudhavel, AP/CSE

The process of utilizing computer technology to complete a task.

4. Why do we need parallel Computing?

Time to develop/rewrite code.

9. How to Classify Shared and Distributed Memory?

All processors have access to a pool of

Memory is local to each processor

Data exchange by message passing

Example: SGI Altix, IBM P5 nodes

Example: Clusters with single-socket

10. Define Hybrid system

A limited number, N, of processors have access to a common pool of shared memory

11. Define Multi Core Systems.

Extension of hybrid model

12. Define Accelerated Systems

Calculations made in both CPU and accelerator

13. What is data parallelism?

(Apr 2013, 2014)

14. Define Stored Program Concept.

Memory is used to store both program and data instructions

15. What is Parallel Computing?

16. Why we use Parallel Computing?

Save time - wall clock time

Solve larger problems

17. Why do we need high speed Parallel Computing?

18. How do we increase the speed of Parallel Computers?

21. Define Temporal Parallelism?

22. State the advantages & Disadvantages of temporal Parallelism?

Bubbles in pipeline Bubbles are formed

Fault tolerance: does not tolerate.

Inter task communication small

Scalability: Cant be increased.

23. State the advantages & Disadvantages of Data Parallelism?

24. Define Data Parallelism?

25. Define Inter task Dependency?

The following assumptions are made in evolving tasks to teacher:

26. What is parallel Computer?

Full jobs are assigned

Tasks take equal time

Tasks take different time

Bubbles leads to idling

Task assignment is static

Task assignment is static, dynamic or

Not tolerant to processor

Efficient with fine grained

Additionally, software has been written for serial computation:

To be executed by a single computer having a single Central Processing Unit (CPU);

mechanical devices - from prosthetics to spacecraft

2. Why do we need Use Parallel Computing?

There are two primary reasons for using parallel computing:

Other reasons might include:

Limits to miniaturization - processor technology is allowing an increasing number of

1 Mflop = 1 Megaflop = 10^6 floating point operations per second

How do we Increase the speed of Computers.

Use parallelism in the problem

4. Describe about History of parallel computers:

A brief history of parallel computers are given below

Glory days: 76-90

Solving a simple job can be solved in parallel in many ways.

1. Take an answer book from the pile of answer books.

The method is correct if: