Lec02 Fundamentals

Lecture 2: Fundamentals
of Computer Design
Kai Bu
kaibu@zju.edu.cn
http://list.zju.edu.cn/kaibu/comparch
Chapter 1
Transition from single processor to

multiple processors;
Quantitative approach: empirical
observations (of programs,
experimentations, simulation) as its
tools;
Outline
Classes of computers
Parallelism
Instruction Set Architecture
Trends
Dependability
Performance Measurement
Outline
Parallelism
Trends
Dependability
5 Classes of Computers
PMD: Personal Mobile Device

Wireless devices with multimedia user
interfaces
cell phones, tablet computers, etc.
a few hundred dollars
PMD Characteristics
Cost effectiveness
less expensive packaging;
absence of fan for cooling
Responsiveness & Predictability

real-time performance: a maximum execution time for each
app segment;
soft real-time: average time constraint tolerate occasionally
missed time constraint on an event.
Memory efficiency
optimize code size
Energy efficiency
battery power, heat dissipation
Desktop Computing
Largest market share
low-end netbooks: $x00
high-end workstations: $x000
Desktop Characteristics
Price-Performance
combination of performance and price;
compute performance
graphics performance
The most important to customers,
and hence to computer designers
Servers
Provide large-scale and reliable file and
computing services (to desktops)
Constitute the backbone of large-scale
enterprise computing
Servers Characteristics
Availability
against server failure
Scalability
in response to increasing demand with
scaling up computing capacity,
memory, storage, and I/O bandwidth
Efficient throughput
toward more requests handled in a unit
time
Why Server Availability
Clusters/WSCs
Warehouse-Scale Computers
collections of desktop computers or servers
connected by local area networks
to act as a single larger computer
Characteristics
price-performance, power, availability
Embedded Computers
hide everywhere
Embedded vs Nonembedded
Dividing line
the ability to run third-party software
Embedded computers primary goal
meet the performance need at a
minimum price;
rather than achieve higher performance
at a higher price
Outline
Parallelism
Trends
Dependability
Application Parallelism
DLP: Data-Level Parallelism
many data items being operated on at
the same time
TLP: Task-Level Parallelism
tasks of work created to be operate
independently and largely in parallel
Hardware Parallelism
Computer hardware exploits two kinds
of application parallelism in four major
ways:
Instruction-Level Parallelism
Vector Architectures and GPUs
Thread-Level Parallelism
Request-Level Parallelism
Instruction-Level Parallelism
exploits data-level parallelism
at modest levels pipelining;
at medium levels speculative exec;
Vector Architectures &
GPUs (Graphic Process Units)
exploit data-level parallelism
apply a single instruction to a collection
of data in parallel
Thread-Level Parallelism
exploits either DLP or TLP
in a tightly coupled hardware model
that allows for interaction among
parallel threads
Request-Level Parallelism
exploits parallelism among largely
decoupled tasks specified by the
programmer or the OS
Classes of Parallel Architectures

by Michael Flynn
according to the parallelism
in the instruction and data
streams called for by the
instructions at the most
constrained component of
the multiprocessor:
SISD, SIMD, MISD, MIMD
SISD
Single instruction stream, single data
stream uniprocessor
Can exploit instruction-level parallelism
SIMD
Single instruction stream, multiple data
stream
The same instruction is executed by
multiple processors using different
data streams.
Exploits data-level parallelism
Data memory for each processor;
whereas a single instruction memory
and control processor.
MISD
Multiple instruction streams, single
data stream
No commercial multiprocessor of this
type yet
MIMD
Multiple instruction streams, multiple
data streams
Each processor fetches its own
instructions and operates on its own
data.
Exploits task-level parallelism
Outline
Parallelism
Trends
Dependability

ISA
actual programmer-visible instruction
set
the boundary between software and
hardware
7 major dimensions
ISA: Class
Most are general-purpose register
architectures with operands of either
registers or memory locations
Two popular versions
register-memory ISA: e.g., 80x86
many instructions can access
memory
load-store ISA: e.g., ARM, MIPS
only load or store instructions can
access memory
ISA: Memory Addressing

Byte addressing
Aligned address
object width: s bytes
address: A
aligned if A mod s = 0
Each misaligned object

requires two memory accesses
ISA: Addressing Modes

Specify the address of a memory
object
Register, Immediate, Displacement
ISA: Types and Sizes of OPerands

Type
Size in bits
ASCII character
Unicode character
Half word
16
Integer
word
32
Double word
Long integer
64
IEEE 754 floating point

single precision
32
IEEE 754 floating point

double precision
64
Floating point
extended double precision
80
MIPS64 Operations
Data transfer
MIPS64 Operations
Arithmetic Logical
MIPS64 Operations
Control
MIPS64 Operations
Floating point
ISA: Control Flow Instructions

Types:
conditional branches
unconditional jumps
procedure calls
returns
Branch address: add an address field to
PC (program counter)
ISA: Encoding an ISA

Fixed length: ARM, MIPS 32 bits
Variable length: 80x86 1~18 bytes
http://en.wikipedia.org/wiki/MIPS_architecture
Start with a 6-bit opcode.
R-type:
three registers,
a shift amount field,
and a function field;
I-type:
two registers,
a 16-bit immediate value;
J-type:
a 26-bit jump target.
Computer Architecture
ISA
Organization
actual programmer
high-level aspects
visible instruction set; of computer design:
boundary between sw
memory system,
and hw;
memory
interconnect,
design of internal
processor or CPU;
Hardware
computer specifics:
logic design,
packaging tech;
Outline
Parallelism
Trends
Dependability
Five Critical
Implementation Technologies
Integrated circuit logic technology
Semiconductor DRAM
Semiconductor flash
Magnetic disk technology
Network technology
Integrated circuit logic

technology
Moores Law: a growth rate in
transistor count on
a chip of about
40% to 55%
per year
doubles every
18 to 24 months
Semiconductor DRAM
Capacity per DRAM chip doubles
roughly every 2 or 3 years
Semiconductor Flash
Electronically erasable programmable
read-only memory
Capacity per Flash chip doubles roughly
every two years
In 2011, 15 to 20 times cheaper per bit
than DRAM
Magnetic Disk Technology

Since 2004, density doubles every
three years
15 to 20 times cheaper per bit than
Flash
300 to 500 times cheaper per bit than
DRAM
For server and warehouse scale storage
Network Technology
Switches
Transmission systems
Performance Trends
Bandwidth/Throughput
the total amount of work done in a
given time;
Latency/Response Time
the time between the start and the
completion of an event;
Bandwidth over Latency
Trends in Power and Energy

Power = Energy per unit time
1 watt = 1 joule per second
energy to execute a workload =
avg power x execution time
Three primary concerns
the max power for a processor
sustained power consumption
energy and energy efficiency

Sustained power consumption
Metric: TDP
Thermal Design Power
determines cooling requirement
Heat management
1. reduce clock rate and hence power
as the thermal temperature approaches
the junction temperature limit;
2. if 1 is not working, power down the
chip.

Energy and Energy Efficiency
energy to execute a workload =
avg power x execution time
Example
processor A with 20% higher avg
power consumption than processor B;
but A executes the task with 70% of
the time by B;
A or B is more efficient?

Example
processor A with 20% higher avg
power consumption than processor B;
but A executes the task with 70% of
the time by B;
A or B is more efficient?
EnergyConsumptionA
=1.2 x 0.7 x EnergyConsumptionB
=0.84 x EnergyConsumptionB

Primary energy consumption within a
microprocessor is for switching
transistors dynamic energy
logic transistion: 0->1->0 or 1->0->1

The energy of a single transition

The power required per transistor
For a fixed task, slowing clock rate

(frequency) reduces power, but not
energy.

Example
some microprocessors with adjustable
voltage;
15% reduction in voltage -> 15%
reduction in frequency;
the impact on dynamic energy and
dynamic power?

Answer

Challenges
distributing the power
removing the heat
preventing hot spots
potential research topics

Energy-efficiency improvement
techniques
1. do nothing well
turn off the clock of inactive modules
2. DVFS: dynamic voltage-frequency
scaling
scale down clock frequency and voltage
during periods of low activity
DVFS

Energy-efficiency improvement
techniques
3. design for typical case
PMDs, laptops often idle
memory and storage with low power
modes to save energy
4. overclocking
the chip runs at a higher clock rate for
a short time until temperature rises
Trends in Cost
Cost of an Integrated Circuit
wafer for test; chopped into dies for
packaging
Trends in Cost
percentage of
manufactured devices
that survives the
testing procedure
Trends in Cost
Trends in Cost
Intel Core i7 Die
Trends in Cost
Example
Trends in Cost
Example
Trends in Cost
N: process-complexity factor for

measuring manufacturing difficulty
Outline
Parallelism
Trends
Dependability
Dependability
SLA: service level agreements
System states: up or down
Service states
service accomplishment
failure
restoration
service interruption
Dependability
Two measures of dependability
Module reliability
Module availability
Dependability
Module reliability
continuous service accomplishment
from a reference initial instant
MTTF: mean time to failure
MTTR: mean time to repair
MTBF: mean time between failures
MTBF = MTTF + MTTR
Dependability
Module reliability
FIT: failures in time
failures per billion hours
MTTF of 1,000,000 hours
= 109/106
= 1000 FIT
Dependability
Module availability
Dependability
Example
Dependability
Answer
Outline
Parallelism
Trends
Dependability
Measuring Performance
Execution time
the time between the start and the
completion of an event
Throughput
the total amount of work done in a
given time
Measuring Performance
Computer X and Computer Y
X is n times faster than Y
Quantitative Principles
Parallelism
Locality
temporal locality: recently accessed
items are likely to be accessed in the
near future;
spatial locality: items whose
addresses are near one another tend to
be referenced close together in time
Amdahls Law
Amdahls Law: two factors
1. Fractionenhanced:
e.g., 20/60 if 20 seconds out of a 60second program to enhance
2. Speedupenhanced:
e.g., 5/2 if enhanced to 2 seconds
while originally 5 seconds
Example
The Processor Performance Equation
Example
Example
Reading
Chapter 1.8, 1.10 1.13

Lec02 Fundamentals

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec02 Fundamentals

Uploaded by

Copyright:

Available Formats

Lecture 2: Fundamentals

Transition from single processor to

PMD: Personal Mobile Device

Responsiveness & Predictability

high-end workstations: $x000

Why Server Availability

Classes of Parallel Architectures

Instruction Set Architecture

ISA: Memory Addressing

Each misaligned object

ISA: Addressing Modes

ISA: Types and Sizes of OPerands

IEEE 754 floating point

IEEE 754 floating point

ISA: Control Flow Instructions

ISA: Encoding an ISA

Integrated circuit logic

Magnetic Disk Technology

Bandwidth over Latency

Trends in Power and Energy

Trends in Power and Energy

Trends in Power and Energy

Trends in Power and Energy

Trends in Power and Energy

logic transistion: 0->1->0 or 1->0->1

Trends in Power and Energy

For a fixed task, slowing clock rate

Trends in Power and Energy

Trends in Power and Energy

Trends in Power and Energy

Trends in Power and Energy

Trends in Power and Energy

Intel Core i7 Die

N: process-complexity factor for

You might also like