You are on page 1of 40

PRESENTATION ON PARALLEL PROCESSORS

INTRODUCTION
A parallel processor is a processor that performs concurrent* data processing

tasks, which results in lesser execution time.


Parallel processing involves simultaneous computations in the CPU for the

purpose of increasing its computational speed. Instead of processing each

instruction sequentially as in conventional computers. Parallel processing is


established by distributing the data among the multiple functional units.
For example, while an instruction is being executed in the ALU, the next

instruction can be read from memory. The arithmetic, logic, and shift operations
can be separated into three units and the operand diverted to each unit under the supervision of a control unit.

Processor with multiple functional units


Adder- subtractor

Integer multiply
Logic unit To memory Processor registers Shift unit Incrementer Floating point add-subtract

Floating point multiply


Floating point divide
www.ustudy.in

The figure shows one possible way of separating the execution unit into eight

functional units.
They operands in the registers are applied to one of the units depending on the

operation specified by the instruction.


The adder-subtractor and integer multiplier perform the arithmetic operations with

integer number.
The floating point operations are separated into three circuits operating in parallel. The logic, shift and increment operations can be performed concurrently on

different data.
All units are independent of each other. So one number can be incremented while

another number is being shifted.

ADVANTAGES : Lesser execution time, so higher throughput, which is the maximum number of results that can be generated per unit time by a processor. Parallel processing is much faster than sequential processing when it comes to doing repetitive calculations on vast amounts of data. This is because a parallel processor is capable of multithreading on a large scale, and can therefore simultaneously process several streams of data. This makes parallel processors suitable for graphics cards since the calculations required for generating the millions of pixels per second are all repetitive. Disadvantages:- More hardware required, also more power requirements. Not good for low power and mobile devices.

CLASSIFICATION
There are variety of ways that parallel processing can

be classified. It can be based on the Internal organization of the processor The interconnection structure between processors The flow of information through the system

Micheal J. Flynns classification


one of the earliest classification systems for parallel (and

sequential) computers and programs, now known as Flynn's taxonomy. it is the organization of computer systems by o the number of instructions and o data sets that are manipulated simultaneously. Flynns classification divides computers into four major groups as follows: Single Instruction, Single Data (SISD) Single Instruction, Multiple Data (SIMD) Multiple Instruction, Single Data (MISD) Multiple Instruction, Multiple Data (MIMD)

Single Instruction, Single Data (SISD)


SISD represents a serial (non-parallel) computer, containing

a control unit, a processor unit, and a memory unit. Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle Single data: only one data stream is being used as input during any one clock cycle

Instructions are executed sequentially

and the system may or may not have internal parallel processing capabilities. Parallel processing in this case may be achieved by means of multiple functional units or by pipeline processing. This is the oldest and even today, the most common type of computer Examples: older generation mainframes, minicomputers and workstations; most modern day PCs.

Single Instruction, Multiple Data (SIMD)


A type of parallel computer It represents an organization that includes many processing units under the supervision of a common control unit. Single instruction: All processing units execute the same instruction at any given clock cycle Multiple data: Each processing unit can operate on a different data element The shared memory unit must contain multiple modules so that it can communicate with all the processors simultaneously. Best suited for specialized problems characterized by a high degree of

regularity, such as graphics/image processing. Examples: Processor Arrays: Connection Machine CM-2, MasPar MP-1 & MP-2, ILLIAC IV Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2, Hitachi S820, ETA10 Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units.

Multiple Instruction, Single Data (MISD)


MISD structure is only of theoretical interest since no

practical system has been constructed using this organization. single data stream is fed into multiple processing units. Each processing unit operates on the data independently via independent instruction streams. Few actual examples of this class of parallel computer have ever existed. One is the experimental Carnegie-Mellon C.mmp computer (1971). Some conceivable uses might be: Multiple frequency filters operating on a single signal stream. Multiple cryptography algorithms attempting to crack a single coded message.

Multiple Instruction, Multiple Data (MIMD)


MIMD organization refers to a computer system capable of processing

several programs at the same time. Most multiprocessor and multicomputer systems can be classified in this category. Currently, the most common type of parallel computer. Most modern computers fall into this category. Multiple Instruction: every processor may be executing a different instruction stream Multiple Data: every processor may be working with a different data stream Execution can be synchronous or asynchronous, deterministic or nondeterministic Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs. Note: many MIMD architectures also include SIMD execution subcomponents

A superscalar architecture is one in which several instructions can be

initiated simultaneously and executed independently. They have the ability to initiate multiple instructions during the same clock cycle.

A superscalar architecture consists of a number of pipelines that are working in parallel.

PIPELINE
A pipeline is a set of data processing elements connected

in series, so that the output of one element is the input of the next one.

PIPELINING
Pipelining allows the processor to read a new instruction from memory before it is finished processing the current one. As an instruction goes through each stage, the next instruction follows it does not need to wait until it completely finishes. Pipelining saves time by ensuring that the microprocessor can start the execution of a new instruction before completing the current or previous

ones. However, it can still complete just one instruction per clock cycle.

ADVANTAGES
Allows for instruction execution rate to exceed the clock rate (CPI of less than 1).
It thereby allows faster CPU throughput than

would otherwise be possible at the same clock rate.


THROUGHPUT- It is maximum number of instructions that can be carried out at a given period of time.

Superscalar Architectures
A typical Superscalar processor fetches and decodes the

incoming instruction stream several instructions at a time.

Superscalar Execution

Instruction-Level Parallelism
Superscalar processors are designed to exploit more instruction-level parallelism in user programs. For example,
load add add R1 R2 R3 R3, 1 R4 R4, R2 add R3 R3, 1 add R4 R3, R2 store [R4] R0

The three instructions on the left are independent, and in theory all three could be executed in parallel. The three instructions on the right cannot be executed in parallel because the second instruction uses the result of the first, and the third instruction uses the result of the second.

Fetching and dispatching two instructions per cycle

(degree 2)

One oating point and two integer operations are issued and executed simultaneously; each unit is pipelined and executes several operations in different pipeline stages.

Hardware Organization of a superscalar processor

Some Architectures
PowerPC 604
six independent execution units:

Branch execution unit Load/Store unit 3 Integer units Floating-point unit

in-order issue register renaming

Power PC 620
provides in addition to the 604 out-of-order issue

Pentium
three independent execution units:

2 Integer units Floating point unit

in-order issue

Intel P5 Microarchitecture

Used in initial Pentium processor Could execute up to 2 instructions simultaneously

PIPELINING:
Pipelining is a technique of decomposing a sequential process

(instruction) into sub operations and each of the sub operations get executed in a special dedicated segment that operates concurrently with all other segments. Each segment performs partial processing dictated by the way the task is partitioned. The result obtained from the computation in each segment is transferred to the next segment in the pipeline.

SUPERPIPELINING:
Superpipelining is the breaking of longer stages of a

pipeline into smaller stages and this shortens the clock period per instruction. Therefore more number of instructions can be executed in the same time as compared to pipelined structure. The breaking of stages increases the efficiency as clock time is determined by the longest stage.

A normal pipelined case :

Fig. : pipe latency governed by memory access.

A super pipelined case :

Fig. : pipe latency where time requirement of each stage is same

TIMING DIAGRAM :

Comparison of clock time per cycle:

Some processors which have super pipelined architecture are :MIPS R400,Intel Net Burst,ARM11 core.
ARM cores are famous for their simple and cost-effective

design. However, ARM cores have also evolved and show superpipelining characteristics in their architectures and have architectural features to hide the possible long pipeline stalls.The ARM11 (specifically, the ARM1136JF) is a high performance and low-power processor which is equipped with eight stage pipelining. The core consists of two fetch stages, one decode stage, one issue stage, and four stages for the integer pipeline.

The eight stages of ARM11 core are :

DIFFERENCE BETWEEN SUPERSCALING AND SUPER PIPELINING


SUPERSCALING : It creates multiple pipelines within

a processor, allowing the CPU to execute multiple instructions simultaneously.


SUPERPIPELINING : It breaks the instruction

pipeline into smaller pipeline stages , allowing the CPU to start executing the next instruction before completing the previous one. The processor can run multiple instructions simultaneously, with each instruction being at a different stage of completion.

ASPECTS 1. APPROACH

SUPERSCALING Dynamically issues multiple instruction per cycle.

SUPERPIPELINING Divides the long latency stages of the pipeline into shorter stages.

2. INSTRUCTION ISSUE RATE

Multiple

Multiple (different instructions at different stages of completion)

3.EFFECTS

Effects the clock per instruction (CPI) term of the performance equation Complex design issues

Effects the clock cycle time term of the performance equation.

4.DIFFICULTY OF DESIGN 5.ADDITIONAL AIDS

Relatively easier design No additional hardware units required.

Additional hardware units required like the fetch units.

*The performance equation of a microprocessor : Execution Time = IC *CPI * clock cycle time

INSTRUCTION ISSUE STYLE:


Both superscaling and superpipelining follow dynamic

instruction scheduling. In dynamic scheduling, the instructions are fetched sequentially in program order. However, those instructions are decoded and stored in a scheduling window of a processor execution core. After decoding the instructions, the processor core obtains the dependency information between the instructions and can also identify the instructions which are ready for execution.

CONCLUSION:
From all these we can conclude that the techniques of

parallel processing , superscaling and superpipelining are different architectural improvements introduced to increase the efficiency of the modern day computers.

You might also like