You are on page 1of 94

Lecture 2: Fundamentals

of Computer Design
Kai Bu
kaibu@zju.edu.cn
http://list.zju.edu.cn/kaibu/comparch

Chapter 1

Transition from single processor to


multiple processors;
Quantitative approach: empirical
observations (of programs,
experimentations, simulation) as its
tools;

Outline
Classes of computers
Parallelism
Instruction Set Architecture
Trends
Dependability
Performance Measurement

Outline
Classes of computers
Parallelism
Instruction Set Architecture
Trends
Dependability
Performance Measurement

5 Classes of Computers

PMD: Personal Mobile Device


Wireless devices with multimedia user
interfaces
cell phones, tablet computers, etc.
a few hundred dollars

PMD Characteristics
Cost effectiveness
less expensive packaging;
absence of fan for cooling

Responsiveness & Predictability


real-time performance: a maximum execution time for each
app segment;
soft real-time: average time constraint tolerate occasionally
missed time constraint on an event.

Memory efficiency
optimize code size

Energy efficiency
battery power, heat dissipation

Desktop Computing
Largest market share
low-end netbooks: $x00

high-end workstations: $x000

Desktop Characteristics
Price-Performance
combination of performance and price;
compute performance
graphics performance
The most important to customers,
and hence to computer designers

Servers
Provide large-scale and reliable file and
computing services (to desktops)
Constitute the backbone of large-scale
enterprise computing

Servers Characteristics
Availability
against server failure
Scalability
in response to increasing demand with
scaling up computing capacity,
memory, storage, and I/O bandwidth
Efficient throughput
toward more requests handled in a unit
time

Why Server Availability

Clusters/WSCs
Warehouse-Scale Computers
collections of desktop computers or servers
connected by local area networks
to act as a single larger computer
Characteristics
price-performance, power, availability

Embedded Computers
hide everywhere

Embedded vs Nonembedded
Dividing line
the ability to run third-party software
Embedded computers primary goal
meet the performance need at a
minimum price;
rather than achieve higher performance
at a higher price

Outline
Classes of computers
Parallelism
Instruction Set Architecture
Trends
Dependability
Performance Measurement

Application Parallelism
DLP: Data-Level Parallelism
many data items being operated on at
the same time
TLP: Task-Level Parallelism
tasks of work created to be operate
independently and largely in parallel

Hardware Parallelism
Computer hardware exploits two kinds
of application parallelism in four major
ways:
Instruction-Level Parallelism
Vector Architectures and GPUs
Thread-Level Parallelism
Request-Level Parallelism

Hardware Parallelism
Instruction-Level Parallelism
exploits data-level parallelism
at modest levels pipelining;
at medium levels speculative exec;

Hardware Parallelism
Vector Architectures &
GPUs (Graphic Process Units)
exploit data-level parallelism
apply a single instruction to a collection
of data in parallel

Hardware Parallelism
Thread-Level Parallelism
exploits either DLP or TLP
in a tightly coupled hardware model
that allows for interaction among
parallel threads

Hardware Parallelism
Request-Level Parallelism
exploits parallelism among largely
decoupled tasks specified by the
programmer or the OS

Classes of Parallel Architectures


by Michael Flynn
according to the parallelism
in the instruction and data
streams called for by the
instructions at the most
constrained component of
the multiprocessor:
SISD, SIMD, MISD, MIMD

SISD
Single instruction stream, single data
stream uniprocessor
Can exploit instruction-level parallelism

SIMD
Single instruction stream, multiple data
stream
The same instruction is executed by
multiple processors using different
data streams.
Exploits data-level parallelism
Data memory for each processor;
whereas a single instruction memory
and control processor.

MISD
Multiple instruction streams, single
data stream
No commercial multiprocessor of this
type yet

MIMD
Multiple instruction streams, multiple
data streams
Each processor fetches its own
instructions and operates on its own
data.
Exploits task-level parallelism

Outline
Classes of computers
Parallelism
Instruction Set Architecture
Trends
Dependability
Performance Measurement

Instruction Set Architecture


ISA
actual programmer-visible instruction
set
the boundary between software and
hardware
7 major dimensions

ISA: Class
Most are general-purpose register
architectures with operands of either
registers or memory locations
Two popular versions
register-memory ISA: e.g., 80x86
many instructions can access
memory
load-store ISA: e.g., ARM, MIPS
only load or store instructions can
access memory

ISA: Memory Addressing


Byte addressing
Aligned address
object width: s bytes
address: A
aligned if A mod s = 0

Each misaligned object


requires two memory accesses

ISA: Addressing Modes


Specify the address of a memory
object
Register, Immediate, Displacement

ISA: Types and Sizes of OPerands


Type

Size in bits

ASCII character

Unicode character
Half word

16

Integer
word

32

Double word
Long integer

64

IEEE 754 floating point


single precision

32

IEEE 754 floating point


double precision

64

Floating point
extended double precision

80

MIPS64 Operations
Data transfer

MIPS64 Operations
Arithmetic Logical

MIPS64 Operations
Control

MIPS64 Operations
Floating point

ISA: Control Flow Instructions


Types:
conditional branches
unconditional jumps
procedure calls
returns
Branch address: add an address field to
PC (program counter)

ISA: Encoding an ISA


Fixed length: ARM, MIPS 32 bits
Variable length: 80x86 1~18 bytes

http://en.wikipedia.org/wiki/MIPS_architecture
Start with a 6-bit opcode.
R-type:
three registers,
a shift amount field,
and a function field;
I-type:
two registers,
a 16-bit immediate value;
J-type:
a 26-bit jump target.

Computer Architecture
ISA

Organization

actual programmer
high-level aspects
visible instruction set; of computer design:
boundary between sw
memory system,
and hw;
memory
interconnect,
design of internal
processor or CPU;

Hardware

computer specifics:
logic design,
packaging tech;

Outline
Classes of computers
Parallelism
Instruction Set Architecture
Trends
Dependability
Performance Measurement

Five Critical
Implementation Technologies
Integrated circuit logic technology
Semiconductor DRAM
Semiconductor flash
Magnetic disk technology
Network technology

Integrated circuit logic


technology
Moores Law: a growth rate in
transistor count on
a chip of about
40% to 55%
per year
doubles every
18 to 24 months

Semiconductor DRAM
Capacity per DRAM chip doubles
roughly every 2 or 3 years

Semiconductor Flash
Electronically erasable programmable
read-only memory
Capacity per Flash chip doubles roughly
every two years
In 2011, 15 to 20 times cheaper per bit
than DRAM

Magnetic Disk Technology


Since 2004, density doubles every
three years
15 to 20 times cheaper per bit than
Flash
300 to 500 times cheaper per bit than
DRAM
For server and warehouse scale storage

Network Technology
Switches
Transmission systems

Performance Trends
Bandwidth/Throughput
the total amount of work done in a
given time;
Latency/Response Time
the time between the start and the
completion of an event;

Bandwidth over Latency

Trends in Power and Energy


Power = Energy per unit time
1 watt = 1 joule per second
energy to execute a workload =
avg power x execution time
Three primary concerns
the max power for a processor
sustained power consumption
energy and energy efficiency

Trends in Power and Energy


Sustained power consumption
Metric: TDP
Thermal Design Power
determines cooling requirement
Heat management
1. reduce clock rate and hence power
as the thermal temperature approaches
the junction temperature limit;
2. if 1 is not working, power down the
chip.

Trends in Power and Energy


Energy and Energy Efficiency
energy to execute a workload =
avg power x execution time
Example
processor A with 20% higher avg
power consumption than processor B;
but A executes the task with 70% of
the time by B;
A or B is more efficient?

Trends in Power and Energy


Example
processor A with 20% higher avg
power consumption than processor B;
but A executes the task with 70% of
the time by B;
A or B is more efficient?
EnergyConsumptionA
=1.2 x 0.7 x EnergyConsumptionB
=0.84 x EnergyConsumptionB

Trends in Power and Energy


Primary energy consumption within a
microprocessor is for switching
transistors dynamic energy

logic transistion: 0->1->0 or 1->0->1


The energy of a single transition

Trends in Power and Energy


The power required per transistor

For a fixed task, slowing clock rate


(frequency) reduces power, but not
energy.

Trends in Power and Energy


Example
some microprocessors with adjustable
voltage;
15% reduction in voltage -> 15%
reduction in frequency;
the impact on dynamic energy and
dynamic power?

Trends in Power and Energy


Answer

Trends in Power and Energy


Challenges
distributing the power
removing the heat
preventing hot spots
potential research topics

Trends in Power and Energy


Energy-efficiency improvement
techniques
1. do nothing well
turn off the clock of inactive modules
2. DVFS: dynamic voltage-frequency
scaling
scale down clock frequency and voltage
during periods of low activity

DVFS

Trends in Power and Energy


Energy-efficiency improvement
techniques
3. design for typical case
PMDs, laptops often idle
memory and storage with low power
modes to save energy
4. overclocking
the chip runs at a higher clock rate for
a short time until temperature rises

Trends in Cost
Cost of an Integrated Circuit
wafer for test; chopped into dies for
packaging

Trends in Cost
Cost of an Integrated Circuit

percentage of
manufactured devices
that survives the
testing procedure

Trends in Cost
Cost of an Integrated Circuit

Trends in Cost
Cost of an Integrated Circuit

Intel Core i7 Die

Trends in Cost
Example

Trends in Cost
Example

Trends in Cost
Cost of an Integrated Circuit

N: process-complexity factor for


measuring manufacturing difficulty

Outline
Classes of computers
Parallelism
Instruction Set Architecture
Trends
Dependability
Performance Measurement

Dependability
SLA: service level agreements
System states: up or down
Service states
service accomplishment
failure
restoration
service interruption

Dependability
Two measures of dependability
Module reliability
Module availability

Dependability
Two measures of dependability
Module reliability
continuous service accomplishment
from a reference initial instant
MTTF: mean time to failure
MTTR: mean time to repair
MTBF: mean time between failures
MTBF = MTTF + MTTR

Dependability
Two measures of dependability
Module reliability
FIT: failures in time
failures per billion hours
MTTF of 1,000,000 hours
= 109/106
= 1000 FIT

Dependability
Two measures of dependability
Module availability

Dependability
Example

Dependability
Answer

Outline
Classes of computers
Parallelism
Instruction Set Architecture
Trends
Dependability
Performance Measurement

Measuring Performance
Execution time
the time between the start and the
completion of an event
Throughput
the total amount of work done in a
given time

Measuring Performance
Computer X and Computer Y
X is n times faster than Y

Quantitative Principles
Parallelism
Locality
temporal locality: recently accessed
items are likely to be accessed in the
near future;
spatial locality: items whose
addresses are near one another tend to
be referenced close together in time

Quantitative Principles
Amdahls Law

Quantitative Principles
Amdahls Law: two factors
1. Fractionenhanced:
e.g., 20/60 if 20 seconds out of a 60second program to enhance
2. Speedupenhanced:
e.g., 5/2 if enhanced to 2 seconds
while originally 5 seconds

Quantitative Principles
Example

Quantitative Principles
The Processor Performance Equation

Quantitative Principles
Example

Quantitative Principles
Example

Reading
Chapter 1.8, 1.10 1.13

You might also like