Professional Documents
Culture Documents
INTRODUCTION
A parallel processor is a processor that performs concurrent* data processing
instruction can be read from memory. The arithmetic, logic, and shift operations
can be separated into three units and the operand diverted to each unit under the supervision of a control unit.
Integer multiply
Logic unit To memory Processor registers Shift unit Incrementer Floating point add-subtract
The figure shows one possible way of separating the execution unit into eight
functional units.
They operands in the registers are applied to one of the units depending on the
integer number.
The floating point operations are separated into three circuits operating in parallel. The logic, shift and increment operations can be performed concurrently on
different data.
All units are independent of each other. So one number can be incremented while
ADVANTAGES : Lesser execution time, so higher throughput, which is the maximum number of results that can be generated per unit time by a processor. Parallel processing is much faster than sequential processing when it comes to doing repetitive calculations on vast amounts of data. This is because a parallel processor is capable of multithreading on a large scale, and can therefore simultaneously process several streams of data. This makes parallel processors suitable for graphics cards since the calculations required for generating the millions of pixels per second are all repetitive. Disadvantages:- More hardware required, also more power requirements. Not good for low power and mobile devices.
CLASSIFICATION
There are variety of ways that parallel processing can
be classified. It can be based on the Internal organization of the processor The interconnection structure between processors The flow of information through the system
sequential) computers and programs, now known as Flynn's taxonomy. it is the organization of computer systems by o the number of instructions and o data sets that are manipulated simultaneously. Flynns classification divides computers into four major groups as follows: Single Instruction, Single Data (SISD) Single Instruction, Multiple Data (SIMD) Multiple Instruction, Single Data (MISD) Multiple Instruction, Multiple Data (MIMD)
a control unit, a processor unit, and a memory unit. Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle Single data: only one data stream is being used as input during any one clock cycle
and the system may or may not have internal parallel processing capabilities. Parallel processing in this case may be achieved by means of multiple functional units or by pipeline processing. This is the oldest and even today, the most common type of computer Examples: older generation mainframes, minicomputers and workstations; most modern day PCs.
regularity, such as graphics/image processing. Examples: Processor Arrays: Connection Machine CM-2, MasPar MP-1 & MP-2, ILLIAC IV Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2, Hitachi S820, ETA10 Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units.
practical system has been constructed using this organization. single data stream is fed into multiple processing units. Each processing unit operates on the data independently via independent instruction streams. Few actual examples of this class of parallel computer have ever existed. One is the experimental Carnegie-Mellon C.mmp computer (1971). Some conceivable uses might be: Multiple frequency filters operating on a single signal stream. Multiple cryptography algorithms attempting to crack a single coded message.
several programs at the same time. Most multiprocessor and multicomputer systems can be classified in this category. Currently, the most common type of parallel computer. Most modern computers fall into this category. Multiple Instruction: every processor may be executing a different instruction stream Multiple Data: every processor may be working with a different data stream Execution can be synchronous or asynchronous, deterministic or nondeterministic Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs. Note: many MIMD architectures also include SIMD execution subcomponents
initiated simultaneously and executed independently. They have the ability to initiate multiple instructions during the same clock cycle.
PIPELINE
A pipeline is a set of data processing elements connected
in series, so that the output of one element is the input of the next one.
PIPELINING
Pipelining allows the processor to read a new instruction from memory before it is finished processing the current one. As an instruction goes through each stage, the next instruction follows it does not need to wait until it completely finishes. Pipelining saves time by ensuring that the microprocessor can start the execution of a new instruction before completing the current or previous
ones. However, it can still complete just one instruction per clock cycle.
ADVANTAGES
Allows for instruction execution rate to exceed the clock rate (CPI of less than 1).
It thereby allows faster CPU throughput than
Superscalar Architectures
A typical Superscalar processor fetches and decodes the
Superscalar Execution
Instruction-Level Parallelism
Superscalar processors are designed to exploit more instruction-level parallelism in user programs. For example,
load add add R1 R2 R3 R3, 1 R4 R4, R2 add R3 R3, 1 add R4 R3, R2 store [R4] R0
The three instructions on the left are independent, and in theory all three could be executed in parallel. The three instructions on the right cannot be executed in parallel because the second instruction uses the result of the first, and the third instruction uses the result of the second.
(degree 2)
One oating point and two integer operations are issued and executed simultaneously; each unit is pipelined and executes several operations in different pipeline stages.
Some Architectures
PowerPC 604
six independent execution units:
Power PC 620
provides in addition to the 604 out-of-order issue
Pentium
three independent execution units:
in-order issue
Intel P5 Microarchitecture
PIPELINING:
Pipelining is a technique of decomposing a sequential process
(instruction) into sub operations and each of the sub operations get executed in a special dedicated segment that operates concurrently with all other segments. Each segment performs partial processing dictated by the way the task is partitioned. The result obtained from the computation in each segment is transferred to the next segment in the pipeline.
SUPERPIPELINING:
Superpipelining is the breaking of longer stages of a
pipeline into smaller stages and this shortens the clock period per instruction. Therefore more number of instructions can be executed in the same time as compared to pipelined structure. The breaking of stages increases the efficiency as clock time is determined by the longest stage.
TIMING DIAGRAM :
Some processors which have super pipelined architecture are :MIPS R400,Intel Net Burst,ARM11 core.
ARM cores are famous for their simple and cost-effective
design. However, ARM cores have also evolved and show superpipelining characteristics in their architectures and have architectural features to hide the possible long pipeline stalls.The ARM11 (specifically, the ARM1136JF) is a high performance and low-power processor which is equipped with eight stage pipelining. The core consists of two fetch stages, one decode stage, one issue stage, and four stages for the integer pipeline.
pipeline into smaller pipeline stages , allowing the CPU to start executing the next instruction before completing the previous one. The processor can run multiple instructions simultaneously, with each instruction being at a different stage of completion.
ASPECTS 1. APPROACH
SUPERPIPELINING Divides the long latency stages of the pipeline into shorter stages.
Multiple
3.EFFECTS
Effects the clock per instruction (CPI) term of the performance equation Complex design issues
*The performance equation of a microprocessor : Execution Time = IC *CPI * clock cycle time
instruction scheduling. In dynamic scheduling, the instructions are fetched sequentially in program order. However, those instructions are decoded and stored in a scheduling window of a processor execution core. After decoding the instructions, the processor core obtains the dependency information between the instructions and can also identify the instructions which are ready for execution.
CONCLUSION:
From all these we can conclude that the techniques of
parallel processing , superscaling and superpipelining are different architectural improvements introduced to increase the efficiency of the modern day computers.