Professional Documents
Culture Documents
Who is responsible for your learning of Computer Organization? Some aphorisms on Teaching philosophy: I do not teach my pupils. I provide conditions in which they can learn- Albert Einstein I hear and I forget. I see and I remember. I do and I understand Chinese proverb "Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime." -- Chinese proverb
100 Even if you are fascinating.. People only remember the Percent first 15 minutes of 50 of what you say Students Paying Attention
Approach
Proactive
Anatomy of a computer and our focus here Computer Organization versus Architecture Instruction sets Different components of a computer and their interworking Computer Performance Issues
Desktop Applications Emphasis on performance of integer and Floating Point (FP) data types Little regard for program (code) size and power consumption Server Applications Database, file system, web applications, time-sharing FP (Floating Point) performance is much less important than integer and character strings Little regard for program (code) size and power consumption Embedded Applications Digital Signal Processors (DSPs), media processors, control High value placed on program size and power consumption Less memory, is cheaper and lower power Reduce chip costs: FP instructions may be optional
Software Hardware
Assembler Processor Memory I/O system Datapath Control Digital Design Circuit Design
transistors
Why a Compiler?
In Paris they simply stared Why High Level Language when I spoke to them in Ease of thinking and coding in an French; I never did English/Math like language succeed in making those Enhanced productivity because of idiots understand their the ease to debug and validate own language. Maintainability Target independent development Mark Twain, The Availability of optimizing compilers Innocents Abroad, 1869
What is in a Computer?
Components:
processor (datapath, control) input (mouse, keyboard) output (display, printer) memory (cache (SRAM), main memory (DRAM))
Our
Output
Display, Printer
Motherboard LayOut
Processor Logic capacity: about 30% ~ 35% per year Clock rate : about 30% per year Memory DRAM: Dynamic Random Access Memory Capacity: about 60% per year (4x every 3 years) Memory speed: about 10% per year Cost per bit: improves about 25% per year Disk Capacity: about 60% ~ 100% per year Speed: about 10% per year Network Bandwidth 10 Mb ------(10 years)-- 100Mb ------(5 years)-- 1 Gb
K = 1024 (210 )
Year
Course Outline
Topic # weeks Introduction to Computer Organization (1) Computer Instructions (2) Arithmetic and Logic Unit (1) Performance Analysis (1) Data Path and Control (2) Performance Enhancement with Pipelining (2) Memory Hierarchy and Virtual Memory Concepts (2) Storage, Networks, and other Peripherals (1) Engineering Design with Microcomputers (2)
Course Objectives
Know about the different software and hardware components of a digital computer . Comprehend how different components of the digital computer collaborate to produce the end result in an application development process Apply principles of logic design to digital computer design. Analyze digital computer and decompose it into modules and lower level logical blocks involving both combinational and sequential circuit elements. Synthesize various components of computer's Arithmetic Logic Unit, Control Units, and Data Paths Understand and Assess (evaluate) computer CPU performance, and learn methods to enhance computer performance.
We will have a quick look at MIPS language MIPS- Not to be confused with million instructions per second MIPS- Microprocessor without Interlocked Pipelined Stages- a RISC (Reduced Instruction Set Computer) processor developed by MIPS Technologies. By 1990 1 out of 3 RISC processors was using MIPS; Architecture also called MIPS CISCO routers, Nintendo 64, Sony Play Station, Play Station 2, etc. use MIPS designs
The difference between mediocre and star programmers is that star programmers understand assembly language, whether or not they use it on a daily basis. Assembly language is the language of the computer itself. To be a programmer without ever learning assembly language is like being a professional race car driver without understanding how your carburetor works. To be a truly successful programmer, you have to understand exactly what the computer sees when it is running a program. Nothing short of learning assembly language will do that for you. Assembly language is often seen as a black art among today's programmers - with those knowing this art being more productive, more knowledgeable, and better paid, even if
many registers? How big a memory could be supported? What is memory word size? How to handle data in RAM? Non-architectural design/implementation issue that vary from design to design: Roles of registers
Instructions
The words of a computers language are called instructions Instructions set The vocabulary of a computers language is called instruction set Instruction Set Architecture (ISA) The set of instructions a particular CPU implements is an Instruction Set Architecture.
hardware
ISA Sales
Early trend was to add more and more instructions to new CPUs to do elaborate operations CISC (Complex Instruction Set Computer) The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. VAX architecture had an instruction to multiply polynomials! RISC philosophy (Cocke IBM, Patterson, Hennessy, 1980s) Reduced Instruction Set Computer Keep the instruction set small and simple, makes it easier to build fast hardware. Let software do complicated operations by composing simpler ones.
Instruction Categories Load/Store Computational Jump and Branch Floating Point coprocessor Memory Management Special
Registers R0 - R31
PC HI LO funct
immediate
jump target
0 1 2-3 4-7
The constant value 0 Assembler Temporary Values for function results Expression Evaluation Arguments
N.A. No No No
$t0 -$t7 8-15 Temporaries No $s0 -$s7 16-23 Saved Temporaries YES $t8 -$t9 24-25 Temporaries No $k0 -$k1 26-27 Reserved for OS kernel No $gp (28) global pointer, $sp (29) stack pointer, $fp (30) frame pointer, $ra (31)return address are all preserved across a call
Simple operations
Compute
f = (a+b)-(c-d) assuming these variables are in some $s registers Memory operation- base register concept Why a multiplication factor of 4 is required for n the array element- Answer:-Memory addresses are in MIPS are byte addresses.
Two steps in this dotted area can be Output : merged together Machine X into a single step Assembly Program to sort 10 numbers Input
Assembler (Machine-X code to translate any Machine X Assembly Program into Machine X code)
Machine X
Output
Machine X
Machine X
Output
Input
language- expansion of the acronym No of registers and architecture in general The 3 Instruction formats and the various fields (e.g. rs, rt, rd, shamt, etc.) Now, we proceed along with
MIPS Assembly Instruction formats Coding simple problems and translating into MIPS machine code
Simple Statements
C
Code: d = (a + b) (c + d)
Machine
code assuming a, b, c, and d are in MIPS registers code assuming that a, b, c, and d are in consecutive memory locations from a given starting address (use lw, sw)
Machine
assembly code for a typical C-code to add 100 numbers as follows: // Read 100 numbers into an array A sum = 0; for (i = 0; i < 100; i++) { sum = sum + A[i]; } // Print sum
Procedure Calls
Caller
and Callee- who should preserve which registers? Leaf and recursive procedure examples for explaining the conventions, and jla and jr instructions.
SPIM
Courtesy: Prof. Jerry Breecher Clark University Appendix A
MIPS Simulation
SPIM is a simulator. Reads a MIPS assembly language program. Simulates each instruction. Displays values of registers and memory. Supports breakpoints and single stepping. Provides simple I/O for interacting with user.
SPIM Versions
SPIM is the command line version. XSPIM is x-windows version (Unix workstations). There is also a windows version. You can use this at home and it can be downloaded from: http://www.cs.wisc.edu/~larus/spim.html.
In fact, theres a tutorial for a good chunk of the ISA portion of this course at: http://chortle.ccsu.edu/AssemblyTutorial/tutorialContents.html
Here are a couple of other good references you can look at: Patterson_Hennessy_AppendixA.pdf
And http://babbage.clarku.edu/~jbreecher/comp_org/labs/Introduction_To_SPIM.pdf
SPIM Program
MIPS assembly language. Must include a label main this will be called by the SPIM startup code (allows you to have command line arguments). Can include named memory locations, constants and string literals in a data segment.
General Layout
Data definitions start with .Data directive. Code definition starts with .Text directive. Text is the traditional name for the memory that holds a program. Usually have a bunch of subroutine definitions and a main.
Simple Example
.data .word 0 .text .align 2 .globl main main: lw $a0,foo # data memory # 32 bit variable # program memory # word alignment # main is global
foo:
Data Definitions
You can define variables/constants with: .word : defines 32 bit quantities. .byte: defines 8 bit quantities. .asciiz: zero-delimited ascii strings. .space: allocate some bytes.
Data Examples
.data prompt: .asciiz Hello World\n msg: .asciiz The answer is x: .space 4 y: .word 4 str: .space 100
Simple I/O
SPIM provides some simple I/O using the syscall instruction. The specific I/O done depends on some registers.
I/O Functions
System call is used to communicate with the system and do simple I/O. $v0 Load arguments (if any) into registers $a0, $a1 or $f12 (for floating point). do: syscall Results returned in registers $v0 or $f0.
# Upon return from the syscall, $v0 has the integer typed by # a human in the SPIM console # Now print that same integer move $a0,$v0 # Get the number to be printed into register li $v0,1 # Indicate were doing a write-integer syscall
Printing A String
msg:
main:
.data pseudoinstruction: load immediate .asciiz SPIM IS FUN .text .globl li $v0,4 la $a0,msg pseudoinstruction: load address syscall jr $ra
The MIPS equivalent of the CProgram with Read and Sum Loops
.data A: .word 0 #Create space for the first word A[0] and initialize it to 0 .space 16 #Create space for 4 more words A[1] .. A[4] msg: .asciiz "The sum of 5 numbers is: " .text main: la $t0, A #Store in $t0 the address of A[0], the first of five words li $t1, 0 #Store in $t1, the initial value of loop variable li $t2, 4 #Store in $t2, the final value of loop variable li $t3, 0 #Initialize $t3 that increments by 4 with each word read loop: add $t4, $t0, $t3 #Put in $t4 the address of next word li $v0, 5 # Initialize $v0 for Read syscall sw $v0,($t4) # put the new integer read into the word location pointed by $t4 addi $t3, $t3, 4 #increment $t3 by 4 for calculation of next word address addi $t1, 1 ble $t1, $t2, loop (continued to next slide )
The MIPS equivalent of the CProgram with Read and Sum Loops
Continued from previous slide. li $t1, 0 #Do the same initialization for identical loop at addLoop li $t2, 4 li $t3, 0 li $s0, 0 addloop: add $t4, $t0, $t3 lw $t5, ($t4) #Read the integer at address in $t4 into $t5 add $s0, $s0, $t5 #Update the partial sum in $s0 by adding the new integer addi $t3, $t3, 4 addi $t1, 1 ble $t1, $t2, addloop li $v0, 4 la $a0, msg syscall #Make System Ready to print String #Load starting address (msg) of the string into $a0- argument register
li $v0, 1 #Make System Ready to print the integer (sum) move $a0, $s0 syscall
SPIM Subroutines
The stack is set up for you just use $sp. You can view the stack in the data window. main is called as a subroutine (have it return using jr $ra). For now, dont worry about details. But the next few pages do some excellent example of how stacks all work.
Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS)
print: move $t0, $v0 li $v0, 1 #Prepare for print move $a0, $t0 syscall j last
Processor Design - 1
Adopted from notes by David A. Patterson, John Kubiatowicz, and others. Copyright 2001 University of California at Berkeley
90
Outline of Slides
Overview Design
a processor: step-by-step Requirements of the instruction set Components and clocking Assembling an adequate Data path Controlling the data path
91
The Big Picture: Where Are We Now? The five classic components of a computer
Processor Input Control Datapath Memory Output
Todays
machine design
92
Arithmetic technology
Chapter 5.1 - Processor Design 1
The CPU
Processor Datapath:
(CPU): the active part of the computer, which does all the work (data manipulation and decision-making) portion of the processor which contains hardware necessary to perform operations required by the processor (the brawn)
Control:
portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain)
93
of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction
CPI
Processor
design (datapath and control) will determine: Clock cycle time Clock cycles per instruction we will do Today: Single cycle processor: Advantage: One clock cycle per instruction Disadvantage: long cycle time
Inst. Count
Cycle Time
What
94
All MIPS instructions are 32 bits long. The three instruction formats: 31 26 21 16 11 6 R-type op rs rt rd shamt
0 funct 6 bits 0 0
I-type J-type
5 bits 21 rs 5 bits
5 bits 16 rt 5 bits
5 bits
96
The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction Chapter 5.1 - Processor Design 1
31
26 op 6 bits
21 rs 5 bits
16 rt 5 bits
6 funct 6 bits
OR Immediate:
- ori
31
26 op 6 bits
BRANCH:
31
26 op 6 bits
All start by fetching the instruction op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 inst ADDU SUBU ORi LOAD STORE BEQ = MEM[ PC ]
Register Transfers R[rd] R[rs] + R[rt]; PC PC + 4 R[rd] R[rs] R[rt]; PC PC + 4 R[rt] R[rs] | zero_ext(Imm16); PC PC + 4 PC PC + 4 PC PC + 4
Chapter 5.1 - Processor Design 1
98
Registers
(32 x 32)
PC Extender Add
99
Add
Elements
Elements
Clocking
methodology
100
OP Sum Carry A B 32 32
A Y B
32 32
Adder
32
ALU
32
Result
MUX
Select
ALU
A B
32 32
MUX
32
Chapter 5.1 - Processor Design 1
101
Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW
Register is selected by: RA (number) selects the register to put on busA (data) RB (number) selects the register to put on busB (data) RW (number) selects the register to be written via busW (data) when Write Enable is 1
Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: RA or RB valid busA or busB valid after access time.
102
Memory (idealized) One input bus: Data In One output bus: Data Out
Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus
Data In 32 Clk
Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid Data Out valid after access time.
Chapter 5.1 - Processor Design 1
103
a single main memory, both large and fast 1: large memories are slow while fast memories are small
Problem
Example:
Solution:
memory
Cache:
a small, fast memory; Holds a copy of part of a larger, slower memory Imem, Dmem are really separate caches memories
104 Chapter 5.1 - Processor Design 1
circuits: no memory
Output
Sequential
How
to ensure memory element is updated neither too soon, nor too late? Recall hardware multiplier
105
Product/multiplier register is the writable memory element Gate propagation delay means ALU result takes time to stabilize; Delay varies with inputs Must wait until result stable before write to product/multiplier register else get garbage Chapter 5.1 - Processor Design 1 How to be certain ALU output is stable?
Clock:
level-triggered - store clock high (low) edge-triggered - store only on clock edge
We will use negative (falling) edge-triggered methodology
106 Chapter 5.1 - Processor Design 1
instruction
datapath instruction execution stable step 1/step 2/step 3/step 4/step 5 register(s) written
107 Chapter 5.1 - Processor Design 1
SR-Latches
SR-latch with NOR Gates S = 1 and R = 1 not allowed
108
SR-Latches
SR-latch with NAND Gates, also known as SR -latch S = 0 and R = 0 not allowed
109
C = 0, no change of state;
110
C = 1, change is allowed;
D-Latches
113
Design/synthesis based on pulsed-sequential circuits All combinational inputs remain at constant levels and only clock signal appears as a pulse with a fixed period Tcc All storage elements are clocked by the same clock edge Cycle time T = CLK-to-q + longest delay path + Setup time + cc clock skew Chapter 5.1 - Processor Design 1 (CLK-to-q + shortest delay path - clock skew) > hold time
Register
Instruction Read
114
Smaller stages are easier to design Easy to optimize (change) one stage without touching the others
115
No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cachememory hierarchy) Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4)
Chapter 5.1 - Processor Design 1
116
upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) first, read the Opcode to determine instruction type and field lengths second, read in data from all necessary registers
-for
117
real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt) what about loads and stores?
-lw
$t0, 40($t1) -the address we are accessing in memory = the value in $t1 + the value 40 -so we do this addition in this stage
118 Chapter 5.1 - Processor Design 1
4: Memory Access
actually only the load and store instructions do anything during this stage; the others remain idle since these instructions have a unique step, we need this extra stage to account for them as a result of the cache system, this stage is expected to be just as fast (on average) as the others
Chapter 5.1 - Processor Design 1
119
5: Register Write
most instructions write the result of some computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps?
-dont
write anything into a register at the end -these remain idle during this fifth stage
Chapter 5.1 - Processor Design 1
120
+4
imm
121
Data memory
Chapter 5.1 - Processor Design 1
rd rs rt
registers
PC
# r3 = r1+r2
Stage 1: fetch this instruction, incr. PC ; Stage 2: decode to find its an add, then read registers $r1 and $r2 ; Stage 3: add the two values retrieved in Stage 2; Stage 4: idle (nothing to write to memory) ; Stage 5: write result of Stage 3 into register $r3 ;
Chapter 5.1 - Processor Design 1
122
registers
PC
+4
123
$r3, $r1, 17
Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its an slti, then read register $r1 Stage 3: compare value retrieved in Stage 2 with the integer 17 Stage 4: go idle Stage 5: write the result of Stage 3 in register $r3
124
registers
reg[1] ALU
PC
+4
17
125
Data memory
reg[1]-17
$r3, 17($r1)
Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its a sw, then read registers $r1 and $r3 Stage 3: add 17 to value in register $41 (retrieved in Stage 2) Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3 Stage 5: go idle (nothing to write into a register)
126
Example: sw Instruction
instruction memory Data memory MEM[r1+17]<=r3
x 1 3 imm
registers
PC
+4
17
SW r3, 17(r1)
127
128
$r3, 17($r1)
Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its a lw, then read register $r1 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) Stage 4: read value from memory address compute in Stage 3 Stage 5: write value found in Stage 4 into register $r3
129
Example: lw Instruction
instruction memory
x 1 3 imm
registers
reg[1]
Data memory
+4
17
LW r3, 17(r1)
130
MEM[r1+17]
reg[1]+17 ALU
PC
Datapath Summary
The A
instruction memory
ALU
+4
imm
opcode, funct
Controller
131 Chapter 5.1 - Processor Design 1
Data memory
rd rs rt
registers
PC
The common operations Fetch the Instruction: mem[PC] Update the program counter: Sequential Code: PC PC + 4 Branch and Jump: PC something else
Clk
132
31 op 6 bits
26
21
rs rt rd shamt funct 5 bits 5 bits 5 bits 5 bits 6 bits Rd Rs Rt ALU RegWr 5 5 5 ctr busA Rw Ra Rb busW 32 Result 32 32-bit 32 3 Registers busB Clk 2 32
ALU
133
Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value
134
busA 32 busB 32
ALUct r 3 2 Result
ALU
31
immediate 0000000000000000 16 bits 16 bits R[rt] R[rs] op ZeroExt[ imm16 ] Rd Rt RegDst Mux Rs Rt? ALUct RegWr 5 5 5 r busA Rw Ra Rb busW 32 Result 32 32-bit 32 32 Registers busB Clk 32
ALU
Mux
ZeroExt
135
imm16
16
32
ALUSrc
Load Operations
R[rt] Mem[R[rs] + SignExt[imm16]]; Example: lw rt, rs, imm16
31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 11 immediate 16 bits rd 0
ALU ctr
W_Src 32 MemWr
ALU
M ux
imm16 16
136
Mux
Extender
ExtOp
Store Operations
Mem[ R[rs] + SignExt[imm16] R[rt] ]; Example: sw rt, rs, imm16
31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits immediate 16 bits ALU ctr MemWr W_Src 0
ALU
32
M ux
Mux
imm16
137
16
32
Extender
ExtOp
21 rs 5 bits
mem[PC]
Equal R[rs] == R[rt] Calculate the branch condition if (Equal) Calculate the next instructions address PC PC + 4 + ( SignExt(imm16) 4 ) else PC PC + 4
Chapter 5.1 - Processor Design 1
138
21 rs 5 bits
Inst Address 4 nPC_sel RegWr 5 busW Clk Rs Rt 5 5 busA Rw Ra Rb 32 32 32-bit Registers busB 32
Cond
imm16
139
Clk
Equal?
00 Mux PC
32
<21:25>
Rt Rd Rt 0
busW 32 Clk
<16:20>
<11:15>
<0:15>
Adder Adder
00
ALU
32
Mux
PC
Mux
Mux
Clk
imm16 16
32
imm16
Extender
PC Ext
140
ExtOp ALUSrc
Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid Output valid after access time.
Next Address
Imm 1 6 A 32 B 32
Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew
32
PC
ALU
141
Clk
Clk
Clk
Control
Instruction Rd Rs Rt 5 5 5 Rw Ra Rb 32 32-bit Registers Clk
Next Address
Instruction Address
A 32 B 32
PC
32
32
Data Out
ALU
Clk
Datapath
142 Chapter 5.1 - Processor Design 1
143
executed
Easy
slower
than implementation that allows instructions to take different numbers of clock cycles
fast instructions: (beq) fewer clock cycles slow instructions (mult?): more cycles
multicycle,
Next
144
Summary
5 steps to design a processor 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic MIPS makes it easier Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates Single cycle datapath: CPI = 1, TCC long Next time: implementing control
Chapter 5.1 - Processor Design 1
145
Processor Design - 2
Adopted from notes by David A. Patterson, John Kubiatowicz, and others. Copyright 2001 University of California at Berkeley
146
Rs nPC_sel RegDst
Rd Rt 1 0 5 Rs 5 Rt
<21:25>
Rt
<16:20>
Rd
<11:15>
<0:15>
RegWr 5
00
busW 32 Clk
32 0
ALU
imm16
Clk
PC
Mux
Mux
1 32
imm16
32 Data In Clk
Extender
16
ExtOp
ALUSrc
An Abstract View of the Critical Register file and ideal memory: Path
The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid => Output valid after access time.
Ideal Instruction Memory Instruction Address
Instruction Rd Rs 5 5 Rt 5 Imm 16 A
Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew 32 Data Address Data In Clk
Chapter 5.2 - Processor Design 2
Next Address
32
Rw Ra Rb 32 32-bit Registers
32 B
ALU
PC
Clk
148
Clk
32
The Big Picture: Where are We Now? The Five Classic Components of a Computer
Processor Input Control Memory Datapath
Output
Next
149
Control
Instruction Rd Rs 5 5 Rt 5 A 32 Data Address Data In Clk Data Out Control Signals Conditions
Next Address
32
Rw Ra Rb 32 32-bit Registers
32 B
ALU
PC
Clk
Clk
32
Datapath
150
Rt, Rd and Imed16 hardwired into datapath from Fetch Unit have everything except control signals (underline)
Instruction<31:0> nPC_sel Rd Rt Rs Rt busA 32 0 Instruction Fetch Unit
Todays lecture will show you how to generate the control signals
<21:25> <16:20> <11:15> <0:15>
RegDst
1 Mux 0 RegWr 5 5 5
Clk
ALUctr
Rt Zero
Rs
Rd
Imm16 MemtoReg 0
busW 32 Clk
MemWr
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc
151
ExtOp
Adr
imm16
Clk
Chapter 5.2 - Processor Design 2
152
00 PC
PC Ext
ALUsrc: ALUctr:
MemWr: 1 write memory MemtoReg: 0 ALU; 1 Mem RegDst: 0 rt; 1 rd RegWr: 1 write register
RegDst
MemtoReg
32 0 1
ALU
0 1
Mux
Mux
Extender
imm16
16
32
32 Data In Clk
153
ExtOp
ALUSrc
31 op 6 bits
add
26
11 shamt 5 bits
6 funct 6 bits
rd, rs, rt
mem[PC] Fetch the instruction from memory R[rd] R[rs] + R[rt] The actual operation PC PC + 4 Calculate the next instructions address
154
the instruction from Instruction memory: Instruction mem[PC] (This is the same for all instructions)
Adder Adder
1 Clk
Chapter 5.2 - Processor Design 2
155
imm16
00 PC
Mux
31
0 funct
R[rd]
R[rs] + R[rt]
Instruction<31:0>
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = 1
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr = Add
Rt Zero
Rs
Rd
Imm16 MemtoReg = 0
MemWr = 0 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 0
156
ExtOp = x
Instruction<31:0>
1 Clk
Chapter 5.2 - Processor Design 2
157
imm16
00 PC
Adder Adder
Mux
R[rs] or ZeroExt(Imm16)
nPC_sel = Instruction<31:0> Instruction Fetch Unit
<21:25>
<16:20>
<11:15>
<0:15>
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr =
Rt Zero
Rs
Rd
Imm16 MemtoReg =
MemWr = 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc =
158
ExtOp =
R[rs]
or
ZeroExt(Imm16)
nPC_sel= +4 Instruction<31:0> Instruction Fetch Unit
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = 0
Rd
Rt Rs Rt
1 Mux 0 5 5
Clk
ALUctr = Or
Rt Zero
Rs
Rd
Imm16 MemtoReg = 0
MemWr = 0 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 1
159
ExtOp = 0
31
R[rt]
<0:15>
RegDst = 0
Rd
Rt Rs Rt
1 Mux 0 5 5
Clk
ALUctr = Add
Rt Zero
Rs
Rd
Imm16 MemtoReg = 1
MemWr = 0 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 1 ExtOp = 1
160
<21:25>
<16:20>
<11:15>
<0:15>
RegDst =
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr = busA 32 0
Rt Zero
Rs
Rd
Imm16 MemtoReg = 0
MemWr =
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = ExtOp =
161
31
Data
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = x
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr = Add
Rt Zero
Rs
Rd
Imm16 MemtoReg = x
MemWr = 1 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 1 ExtOp = 1
162
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = x
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr =Sub
Rt Zero
Rs
Rd
Imm16 MemtoReg = x
MemWr = 0 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 0
163
ExtOp = x
nPC_sel Zero
Adr
1 Clk
n C se P _ l 0 1 1
ze ? ro x 0 1
MX U 0 0 1
Chapter 5.2 - Processor Design 2
Mux
PC
164
imm16
Fun
nPC_sel
<21:25>
Rt
<21:25>
Rs
Control
DATA PATH
Chapter 5.2 - Processor Design 2
<16:20>
Rd
<11:15>
Imm16
<0:15>
MemWr MemtoReg Zero
165
inst ADD
SUB
ALUsrc = Im, Extop = Z, ALUctr = or, RegDst = rt, RegWr, nPC_sel = +4 LOAD
ALUsrc = Im, Extop = Sn, ALUctr = add, MemtoReg, RegDst = rt, RegWr, nPC_sel = +4 STORE MEM[ R[rs] + sign_ext(Imm16) ] R[rs]; PC PC + 4
ALUsrc = Im, Extop = Sn, ALUctr = add, MemWr, nPC_sel = +4 BEQ if ( R[rs] == R[rt] ) then PC PC + sign_ext(Imm16)] || 00 else PC PC + 4 nPC_sel = Br, ALUctr = sub
166
See Appendix A
x x 0 0 0 1 x xxx
26 op op op
The Concept of op Local Decoding 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop<N:0> R-type 1 0 0 1 0 0 0 x R-type ori 0 1 0 1 0 0 0 0 Or func op 6 Main Control 6 ALUop N ALU Control (Local) lw 0 1 1 1 0 0 0 1 Add sw x 1 x 0 1 0 0 1 Add jump x x x 0 0 0 1 x Subtract xxx beq x 0 x 0 0 1 0 x
ALUctr 3
168
ALU
op 6
Main Control
ALUctr 3
In
this exercise, ALUop has to be 2 bits wide to represent: R-type instructions (1) I-type instructions that require the ALU to perform: (2) Or, (3) Add, and (4) Subtract To implement the full MIPS ISA, ALUop has to be 3 bits to represent: R-type instructions (1) I-type instructions that require the ALU to perform: (2) Or, (3) Add, (4) Subtract, and (5) And (Example: andi)
R-type R-type 1 00
ori Or 0 10
lw Add 0 00
sw Add 0 00
beq
op 6
ALUctr
ALU
170
ALUop 0 1 1 x x x
ALUctr<2>
= !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0>
172
ALUctr<1>
173
ALUctr<0>
= !ALUop<2> & ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> & func<0> + ALUop<2> & func<3> & !func<2> & func<1> & !func<0>
174
ALUctr 3
& ALUop<0> + & !func<2> & func<1> & & !ALUop<0> + & !func<2> & !func<0> & ALUop<0> & !func<3> & func<2> & !func<1> & func<0> ALUop<2> & func<3> & !func<2> & func<1> & !func<0>
!func<0>
175
<= if (OP == BEQ) then Br else +4 ALUsrc <= if (OP == Rtype) then regB else immed ALUctr <= if (OP == Rtype) then funct elseif (OP == ORi) then OR elseif (OP == BEQ) then sub else add ExtOp <= _____________ MemWr <= _____________ MemtoReg <= _____________ RegWr: <=_____________ RegDst: <= _____________
176 Chapter 5.2 - Processor Design 2
nPC_sel
<= if (OP == BEQ) then Br else +4 ALUsrc <= if (OP == Rtype) then regB else immed ALUctr <= if (OP == Rtype) then funct elseif (OP == ORi) then OR elseif (OP == BEQ) then sub else add ExtOp <= if (OP == ORi) then zero else sign MemWr <= (OP == Store) MemtoReg <= (OP == Load) RegWr: <= if ((OP == Store) || (OP == BEQ)) then 0 else 1 RegDst: <= if ((OP == Load) || (OP == ORi)) then 0 else 1
177
op 6
The Truth Table for the RegDst func Main ALUSrc Control ALU
Main Control
Control (Local)
ALUctr 3
op RegDst ALUSrc MemtoReg RegWrite MemWrite nPC_sel Jump ExtOp ALUop (Symbolic) ALUop <2> ALUop <1> ALUop <0>
178
00 1101 10 0011 10 1011 00 0100 00 0010 ori lw sw beq jump 0 0 x x x 1 1 1 0 x 0 1 x x x 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 x x Or Add Add Subtract xxx 0 0 0 x 0 1 0 0 x 0 0 0 0 x 1
179
op 6 Instr<31:26>
Main Control
<21:25>
<16:20>
<11:15>
<0:15>
RegDst
1 Mux 0 RegWr 5 5
ALUctr
Rt Zero
Rs
Rd
Imm16 MemtoReg 0
busW 32 Clk
MemWr
ALU
Mux
Mux
Extender
1 32 ALUSrc
imm16 Instr<15:0>
Data In 32 Clk
16
180
ExtOp
Recap: An Abstract View of the Critical Path file and Register(Load) ideal memory:
The
CLK input is a factor ONLY during write operation During read operation, behave as combinational logic:
Instruction Rs 5 Rt 5 Imm 16 A
Next Address
32
Rw Ra
Rb
32 32-bit Registers
32 B
ALU
PC
Clk
181
Clk
32
ALUct r ExtOp ALUSrc MemtoReg RegWr busA busB Addres s busW 182
Old Delay Value Extender & Mux through Old Value Old Value Old Value
Register File Access Time New Value New Value ALU Delay New Value
cycle time: Cycle time must be long enough for the load instruction: PCs Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew Cycle time for load is much longer than needed for all other instructions
183 Chapter 5.2 - Processor Design 2
1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic
Processor Control Memory Output Input
Datapath Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates
184