ComputerOrganization (EENG3710) Spring2008

Computer Organization (EENG 3710)
Instructor: Partha Guturu EE Department
Quick Recap on our respective roles
Who is responsible for your learning of Computer Organization? Some aphorisms on Teaching philosophy: I do not teach my pupils. I provide conditions in which they can learn- Albert Einstein I hear and I forget. I see and I remember. I do and I understand Chinese proverb "Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime." -- Chinese proverb
What does the data say?

100 Even if you are fascinating.. People only remember the Percent first 15 minutes of 50 of what you say Students Paying Attention
0 10 20 30 40 50 60 Time from Start of Lecture (minutes)
Whats so good about our approach?

Learner-Centric Life-long
Approach
learning versus reactive
Proactive
Course Objectives: What you need to learn?

High level view of a computer Different types

Desk/lap tops Servers Embedded systems
Anatomy of a computer and our focus here Computer Organization versus Architecture Instruction sets Different components of a computer and their interworking Computer Performance Issues
Different Applications & Requirements
Desktop Applications Emphasis on performance of integer and Floating Point (FP) data types Little regard for program (code) size and power consumption Server Applications Database, file system, web applications, time-sharing FP (Floating Point) performance is much less important than integer and character strings Little regard for program (code) size and power consumption Embedded Applications Digital Signal Processors (DSPs), media processors, control High value placed on program size and power consumption Less memory, is cheaper and lower power Reduce chip costs: FP instructions may be optional
Embedded Computers in Your Car
Relative levels of demand for different computer types
Anatomy of Computer & Our Focus

Application (ex: browser) Compiler Operating System Instruction Set Architecture
Software Hardware
Assembler Processor Memory I/O system Datapath Control Digital Design Circuit Design
transistors
Coordination of many levels (layers) of abstraction
Why a Compiler?
In Paris they simply stared Why High Level Language when I spoke to them in Ease of thinking and coding in an French; I never did English/Math like language succeed in making those Enhanced productivity because of idiots understand their the ease to debug and validate own language. Maintainability Target independent development Mark Twain, The Availability of optimizing compilers Innocents Abroad, 1869
A Dissection to Reveal Finer Details

High Level Language Program (e.g., C) Compiler Assembly Language Program (e.g.,MIPS) Assembler Machine Language Program (MIPS) Machine Interpretation Hardware Architecture Description (Logisim, VHDL, Verilog, etc.) Architecture Implementation Logic Circuit Description (Logisim, etc.) temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;
0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 lw lw sw sw 1010 0000 0101 1100 $t0, 0($2) $t1, 4($2) $t1, 0($2) $t0, 4($2) 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111
What is in a Computer?
Components:

processor (datapath, control) input (mouse, keyboard) output (display, printer) memory (cache (SRAM), main memory (DRAM))
Our

primary focus: the processor (datapath and control)

Implemented using millions of transistors Impossible to understand by looking at each transistor We need abstraction!
5 Major Components of a Computer

Personal Computer Computer Processor Control (brain) Datapath (brawn) Memory (where programs, data live when running) Devices Input
Keyboard, Mouse Disk

(where programs, data live when not running)
Output
Display, Printer
5 Major Components of a Computer
Processor Chip (CPU) Components
Motherboard LayOut
Dramatic Changes in Technology
Processor Logic capacity: about 30% ~ 35% per year Clock rate : about 30% per year Memory DRAM: Dynamic Random Access Memory Capacity: about 60% per year (4x every 3 years) Memory speed: about 10% per year Cost per bit: improves about 25% per year Disk Capacity: about 60% ~ 100% per year Speed: about 10% per year Network Bandwidth 10 Mb ------(10 years)-- 100Mb ------(5 years)-- 1 Gb
Growth Capacity of DRAM Chips
K = 1024 (210 )
In recent years growth rate has slowed to 2x every 2 year
Dramatic Changes in Technology

# of transistors on an IC
Gordon Moore Intel Cofounder

2X Transistors / Chip Every 1.5 years Called Moores Law
Year
The Underlying Technologies

Year 1951 1965 1975 1995 2005 Technology Vacuum Tube Transistor Integrated Circuit (IC) Relative Performance/Unit Cost 1 35 900
Very Large Scale IC (VLSI)2,400,000 Ultra VLSI 6,200,000,000
What if technology in the automobile industry advanced at the same rate?
What if the automobile

If the automobile had followed the same development cycle as the computer, a Rolls-Royce would today cost $100, get a million miles per gallon, and explode once a year, killing everyone inside. Robert X. Cringely, InfoWorld magazine
Complex Chip Manufacturing Process Enabled by Technological Breakthroughs
Computer Architecture versus Computer Organization

Computer architecture is the abstract image of a computing system that is seen by a machine language (or assembly language) programmer, including the instruction set, memory address modes, processor registers, and address and data formats; whereas the computer organization is a lower level, more concrete, description of the system that involves how the constituent parts of the system are interconnected and how they interoperate in order to implement the architectural specification --Phillip A. Laplante (2001), Dictionary of Computer Science, Engineering, and Technology -> Can change organization without changing architecture (e.g. 64 bit architecture with 16 bit machine using 4 clock cycles)
Course Outline
Topic # weeks Introduction to Computer Organization (1) Computer Instructions (2) Arithmetic and Logic Unit (1) Performance Analysis (1) Data Path and Control (2) Performance Enhancement with Pipelining (2) Memory Hierarchy and Virtual Memory Concepts (2) Storage, Networks, and other Peripherals (1) Engineering Design with Microcomputers (2)
Course Objectives
Know about the different software and hardware components of a digital computer . Comprehend how different components of the digital computer collaborate to produce the end result in an application development process Apply principles of logic design to digital computer design. Analyze digital computer and decompose it into modules and lower level logical blocks involving both combinational and sequential circuit elements. Synthesize various components of computer's Arithmetic Logic Unit, Control Units, and Data Paths Understand and Assess (evaluate) computer CPU performance, and learn methods to enhance computer performance.
Language of the Computer

We will have a quick look at MIPS language MIPS- Not to be confused with million instructions per second MIPS- Microprocessor without Interlocked Pipelined Stages- a RISC (Reduced Instruction Set Computer) processor developed by MIPS Technologies. By 1990 1 out of 3 RISC processors was using MIPS; Architecture also called MIPS CISCO routers, Nintendo 64, Sony Play Station, Play Station 2, etc. use MIPS designs
Why bother to learn assembly language?
The difference between mediocre and star programmers is that star programmers understand assembly language, whether or not they use it on a daily basis. Assembly language is the language of the computer itself. To be a programmer without ever learning assembly language is like being a professional race car driver without understanding how your carburetor works. To be a truly successful programmer, you have to understand exactly what the computer sees when it is running a program. Nothing short of learning assembly language will do that for you. Assembly language is often seen as a black art among today's programmers - with those knowing this art being more productive, more knowledgeable, and better paid, even if
Basic Instruction Format

Three Instruction Formats:
R Opcode 31 I Opcode 31 J Opcode 31 26 25 26 25 26 25 rs 21 20 rs 21 20 rt 16 15 Memory Address 0 rt rd 16 15 shamt 11 10 Immediate 0 6 5 funct 0
Now Guess MIPS Architecture

How
many registers? How big a memory could be supported? What is memory word size? How to handle data in RAM? Non-architectural design/implementation issue that vary from design to design: Roles of registers
Instructions
Instruction Set Architecture (ISA)
The words of a computers language are called instructions Instructions set The vocabulary of a computers language is called instruction set Instruction Set Architecture (ISA) The set of instructions a particular CPU implements is an Instruction Set Architecture.
The Instruction Set Architecture (ISA)

software
instruction set architecture
hardware
The interface description separating the software and hardware
ISA Sales
ISA: CISC vs. RISC
Early trend was to add more and more instructions to new CPUs to do elaborate operations CISC (Complex Instruction Set Computer) The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. VAX architecture had an instruction to multiply polynomials! RISC philosophy (Cocke IBM, Patterson, Hennessy, 1980s) Reduced Instruction Set Computer Keep the instruction set small and simple, makes it easier to build fast hardware. Let software do complicated operations by composing simpler ones.
The MIPS ISA
3 Instruction Formats: all 32 bits wide OP OP OP rs rs rt rt rd sa
Instruction Categories Load/Store Computational Jump and Branch Floating Point coprocessor Memory Management Special
Registers R0 - R31
PC HI LO funct
immediate
jump target
MIPS Registers and their Roles

Name Number Use Preserved across a Call?
$zero $at $v0 -$v1 $ao -$a3
0 1 2-3 4-7
The constant value 0 Assembler Temporary Values for function results Expression Evaluation Arguments
N.A. No No No
$t0 -$t7 8-15 Temporaries No $s0 -$s7 16-23 Saved Temporaries YES $t8 -$t9 24-25 Temporaries No $k0 -$k1 26-27 Reserved for OS kernel No $gp (28) global pointer, $sp (29) stack pointer, $fp (30) frame pointer, $ra (31)return address are all preserved across a call
Simple operations
Compute
f = (a+b)-(c-d) assuming these variables are in some $s registers Memory operation- base register concept Why a multiplication factor of 4 is required for n the array element- Answer:-Memory addresses are in MIPS are byte addresses.
Quick Recap- Compilers

C++ program to Sort 10 numbers Input
C++ Compiler (Machine-X code to translate any C++ Program into Assembly Program for Machine X)
Sorted list of 10 numbers
Two steps in this dotted area can be Output : merged together Machine X into a single step Assembly Program to sort 10 numbers Input
Assembler (Machine-X code to translate any Machine X Assembly Program into Machine X code)
Output 10 numbers Input Machine X code To Sort 10 numbers
Machine X
Output
Machine X
Machine X
Quick Recap- Shortcut Compilers

C++ program to Sort 10 numbers Input C++ Compiler (Machine-X code to translate any C++ Program directly into Machine X code) Machine X Input Machine X code To Sort 10 numbers Machine X 10 numbers
Output
Output: Sorted list of 10 numbers
Quick Recap- Bootstrapping

C++ program to translate any C++ program into Machine Y code Input C++ Compiler (Machine-X code to translate any C++ Program directly into Machine X code) Machine X Machine X code to translate Output: Output any C++ program into Machine Y code Machine Y Code for C++ compiler (i.e. to Machine X translate any C++ program into Machine Y code) This can be installed and run on Machine Y; thus you have a compiler for Machine Y
Input
Chapter 2- MIPS Programming
Quick Recap- MIPS

MIPS
language- expansion of the acronym No of registers and architecture in general The 3 Instruction formats and the various fields (e.g. rs, rt, rd, shamt, etc.) Now, we proceed along with

MIPS Assembly Instruction formats Coding simple problems and translating into MIPS machine code
Simple Statements
C
Code: d = (a + b) (c + d)
Machine
code assuming a, b, c, and d are in MIPS registers code assuming that a, b, c, and d are in consecutive memory locations from a given starting address (use lw, sw)
Machine
Loops and Branches

Develop
assembly code for a typical C-code to add 100 numbers as follows: // Read 100 numbers into an array A sum = 0; for (i = 0; i < 100; i++) { sum = sum + A[i]; } // Print sum
Procedure Calls
Caller
and Callee- who should preserve which registers? Leaf and recursive procedure examples for explaining the conventions, and jla and jr instructions.
SPIM
Courtesy: Prof. Jerry Breecher Clark University Appendix A
MIPS Simulation
SPIM is a simulator. Reads a MIPS assembly language program. Simulates each instruction. Displays values of registers and memory. Supports breakpoints and single stepping. Provides simple I/O for interacting with user.
SPIM Versions
SPIM is the command line version. XSPIM is x-windows version (Unix workstations). There is also a windows version. You can use this at home and it can be downloaded from: http://www.cs.wisc.edu/~larus/spim.html.
Resources On the Web
Theres a very good SPIM tutorial at http://chortle.ccsu.edu/AssemblyTutorial/Chapter-09/ass09_1.html
In fact, theres a tutorial for a good chunk of the ISA portion of this course at: http://chortle.ccsu.edu/AssemblyTutorial/tutorialContents.html
Here are a couple of other good references you can look at: Patterson_Hennessy_AppendixA.pdf
And http://babbage.clarku.edu/~jbreecher/comp_org/labs/Introduction_To_SPIM.pdf
SPIM Program
MIPS assembly language. Must include a label main this will be called by the SPIM startup code (allows you to have command line arguments). Can include named memory locations, constants and string literals in a data segment.
General Layout
Data definitions start with .Data directive. Code definition starts with .Text directive. Text is the traditional name for the memory that holds a program. Usually have a bunch of subroutine definitions and a main.
Simple Example
.data .word 0 .text .align 2 .globl main main: lw $a0,foo # data memory # 32 bit variable # program memory # word alignment # main is global
foo:
Data Definitions
You can define variables/constants with: .word : defines 32 bit quantities. .byte: defines 8 bit quantities. .asciiz: zero-delimited ascii strings. .space: allocate some bytes.
Data Examples
.data prompt: .asciiz Hello World\n msg: .asciiz The answer is x: .space 4 y: .word 4 str: .space 100
MIPS: Software Conventions For Registers
Simple I/O
SPIM provides some simple I/O using the syscall instruction. The specific I/O done depends on some registers.

You set $v0 to indicate the operation. Parameters in $a0, $a1.
I/O Functions
System call is used to communicate with the system and do simple I/O. $v0 Load arguments (if any) into registers $a0, $a1 or $f12 (for floating point). do: syscall Results returned in registers $v0 or $f0.
Example: Reading an int

li $v0,5 syscall # Indicate we want function 5
# Upon return from the syscall, $v0 has the integer typed by # a human in the SPIM console # Now print that same integer move $a0,$v0 # Get the number to be printed into register li $v0,1 # Indicate were doing a write-integer syscall
Printing A String
msg:
main:
.data pseudoinstruction: load immediate .asciiz SPIM IS FUN .text .globl li $v0,4 la $a0,msg pseudoinstruction: load address syscall jr $ra
A Typical MIPS READ and WRITE Program

.data 0x10000000 A: .word 0, 0 .text main: la $t0, A li $v0, 5 #setting up return reg for read syscall sw $v0, ($t0) li $v0, 5 #setting up return reg for read syscall sw $v0, 4($t0) lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 li $v0, 1 #setting up return reg for print move $a0,$t3 syscall
A C-Program with Read and Sum Loops

Int main (int argc, char **argv) // Older versions of C accept: void main() { int A[5], i; for (i = 0; i <=4; i++) { scanf(%d, A[i]); } sum = 0; for (i = 0; i <=4; i++) { sum = sum + A[i]; } printf(The sum of 5 numbers is: %d\n, sum); }
The MIPS equivalent of the CProgram with Read and Sum Loops
.data A: .word 0 #Create space for the first word A[0] and initialize it to 0 .space 16 #Create space for 4 more words A[1] .. A[4] msg: .asciiz "The sum of 5 numbers is: " .text main: la $t0, A #Store in $t0 the address of A[0], the first of five words li $t1, 0 #Store in $t1, the initial value of loop variable li $t2, 4 #Store in $t2, the final value of loop variable li $t3, 0 #Initialize $t3 that increments by 4 with each word read loop: add $t4, $t0, $t3 #Put in $t4 the address of next word li $v0, 5 # Initialize $v0 for Read syscall sw $v0,($t4) # put the new integer read into the word location pointed by $t4 addi $t3, $t3, 4 #increment $t3 by 4 for calculation of next word address addi $t1, 1 ble $t1, $t2, loop (continued to next slide )
The MIPS equivalent of the CProgram with Read and Sum Loops
Continued from previous slide. li $t1, 0 #Do the same initialization for identical loop at addLoop li $t2, 4 li $t3, 0 li $s0, 0 addloop: add $t4, $t0, $t3 lw $t5, ($t4) #Read the integer at address in $t4 into $t5 add $s0, $s0, $t5 #Update the partial sum in $s0 by adding the new integer addi $t3, $t3, 4 addi $t1, 1 ble $t1, $t2, addloop li $v0, 4 la $a0, msg syscall #Make System Ready to print String #Load starting address (msg) of the string into $a0- argument register
li $v0, 1 #Make System Ready to print the integer (sum) move $a0, $s0 syscall
SPIM Subroutines
The stack is set up for you just use $sp. You can view the stack in the data window. main is called as a subroutine (have it return using jr $ra). For now, dont worry about details. But the next few pages do some excellent example of how stacks all work.
Why Are Stacks So Great?

Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS)
Why Are Stacks So Great?
MIPS Function Calling Conventions

SP fact: addiu $sp, $sp, -32 sw $ra, 20($sp) ... sw $s0, 4($sp) ... lw $ra, 20($sp) addiu $sp, $sp, 32 jr $ra
C-Program for a leaf-procedure

void main() { int e, f, g, h; scanf(%d, &e); scanf(%d, &f); scanf(%d, &g); scanf(%d, &h); result = leaf_procedure(e, f, g, h) printf (Result = %d\n, result); } Int leaf_procedure(int e, int f, int g, int h) { int res; int temp1, temp2; //Not required // (only for making it // close to MIPS code) temp1 = e + f; temp2 = g + h; res = temp1 temp2; return (res) }
Page 1: MIPS code for the main (calling program ) of leaf_procedure

.data e: .word 0 f: .word 0 g: .word 0 h: .word 0 .text main: la $t0, e #Load address of e into $t0 li $t1, 0 #set the loop iteration variable to 0 readLoop: sll $t2,$t1, 2 #Since each word is 4 bytes long, multiply loop variable by 4 add $t3, $t0, $t2 #First time in the loop, $t3 will have address of e li $v0, 5 #Prepare for read syscall sw $v0, ($t3) #Newly read value will go to e, f, g , or h depending upon # whether the loop variable $t1 contains 0, 1, 2, or 3, # that is, whether $t2 is 0, 4, 8, or 12. addi $t1, $t1, 1 xori $t2, $t1, 4 #You can destroy the original $t2 value because you are recomputing # it from $t1 at the beginning of the loop! bne $t2, $zero, readLoop #You haven't read all the 4 integers; go back to readloop.
Page 2: MIPS Code Continuation for the main of leaf_procedure

#reading complete. Make preparations for the leaf_procedure that computes (e+f)-(g+h) # by saving arguments in argument registers. lw $a0, 0($t0) #load e into $a0 lw $a1, 4($t0) #load f into $a1 lw $a2, 8($t0) #load g into $a2 lw $a3, 12($t0) #load h into $a3 jal leaf_procedure #this instruction stores the address of next instruction (the return #address, that is, the adress of the instruction at the print label) #in $ra and jumps onto the label leaf_procedure
print: move $t0, $v0 li $v0, 1 #Prepare for print move $a0, $t0 syscall j last
Page 3: MIPS Code for the leaf_procedure itself

leaf_procedure: addi $sp, $sp, -12 #Make space on the stack for 3 integers lw $t0, 0($sp) #save the contents of the registers you plan to temporarily use # in this procedure on stack so that original values can be restored # before returning to the calling program lw $t1, 4($sp) lw $s0, 8 ($sp) add $t0, $a0, $a1 #Add e and f in $a0 and $a1, respectively, and put in $to add $t1, $a2, $a3 #Add g and h in $a2 and $a3, respectively, and put in $t1 sub $s0, $t0, $t1 #subract g+h in $t1 from e+f in $to, and put it in $s0 #Make preparations for returning back to the calling procedure (main in this case) move $v0, $s0 #Put the computed value into return value register sw $t0, 0($sp) #Restore values on stack to the original resisters sw $t1, 4($sp) sw $s0, 12 ($sp) addi $sp, $sp, 12 #Update stack jr $ra #Jump to location pointed to by $ra (print, in our case) last: # the main program wil stop here as there is no valid instruction here.

main() { printf("The factorial of 10 is %d\n", fact(10)); } int fact (int n) { if (n <= 1) return(1); return (n * fact (n-1)); }

.text .global main main: subu $sp, $sp, 32 sw $ra,20($sp) li $a0,10 jal fact la $a0 LC move $a1,$v0 jal printf lw $ra,20($sp) addu $sp, $sp,32 jr $ra .data LC: .asciiz "The factorial of #stack frame size is 32 bytes #save return address # load argument (10) in $a0 #call fact #load string address in $a0 #load fact result in $a1 # call printf # restore $sp # pop the stack # exit() 10 is %d\n"

.text fact: subu sw sw subu bgtz li j L2: $a0 jal lw mul L1: lw addu jr $sp,$sp,8 $ra,8($sp) $a0,4($sp) $a0,$a0,1 $a0, L2 $v0, 1 L1 # new fact $a0,4($sp) $v0,$v0,$a0 $ra,8($sp) $sp,$sp,8 $ra # stack frame is 8 bytes #save return address # save argument(n) # compute n-1 # if n-1>0 (ie n>1) go to L2 # # return(1) argument (n-1) is already in # call fact # load n # fact(n-1)*n # restore $ra # pop the stack # return, result in $v0
Sample SPIM Programs (on the web)

multiply.s: multiplication subroutine based on repeated addition and a test program that calls it. http://babbage.clarku.edu/~jbreecher/comp_org/labs/multiply.s fact.s: computes factorials using the multiply subroutine. http://babbage.clarku.edu/~jbreecher/comp_org/labs/fact.s sort.s: the sorting program from the text. http://babbage.clarku.edu/~jbreecher/comp_org/labs/sort.s strcpy.s: the strcpy subroutine and test code. http://babbage.clarku.edu/~jbreecher/comp_org/labs/strcpy.s
Processor Design - 1
Adopted from notes by David A. Patterson, John Kubiatowicz, and others. Copyright 2001 University of California at Berkeley
90
Outline of Slides
Overview Design
a processor: step-by-step Requirements of the instruction set Components and clocking Assembling an adequate Data path Controlling the data path
91
Chapter 5.1 - Processor Design 1
The Big Picture: Where Are We Now? The five classic components of a computer
Processor Input Control Datapath Memory Output
Todays
topic: design a single cycle processor
machine design
92
Arithmetic technology
inst. set design
The CPU
Processor Datapath:
(CPU): the active part of the computer, which does all the work (data manipulation and decision-making) portion of the processor which contains hardware necessary to perform operations required by the processor (the brawn)
Control:
portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain)
93
Big Picture: The Performance Perspective

Performance
of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction
CPI
Processor
design (datapath and control) will determine: Clock cycle time Clock cycles per instruction we will do Today: Single cycle processor: Advantage: One clock cycle per instruction Disadvantage: long cycle time
Inst. Count
Cycle Time
What
94
How to Design a Processor: Step-by-step

1. Analyze instruction set datapath requirements the meaning of each instruction is given by the register transfers datapath must include storage element for ISA registers possibly more datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic
95 Chapter 5.1 - Processor Design 1
The MIPS Instruction Formats
All MIPS instructions are 32 bits long. The three instruction formats: 31 26 21 16 11 6 R-type op rs rt rd shamt
0 funct 6 bits 0 0
I-type J-type
31 6 bits 26 op 31 6 bits 26 op 6 bits
5 bits 21 rs 5 bits
5 bits 16 rt 5 bits
5 bits
5 bits immediate 16 bits
96
The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction Chapter 5.1 - Processor Design 1
target address 26 bits
Step 1a: The MIPS-lite Subset for Today
ADD and SUB
31
26 op 6 bits
21 rs 5 bits
16 rt 5 bits
11 rd 5 bits shamt 5 bits
6 funct 6 bits
- addU rd, rs, rt - subU rd, rs, rt
OR Immediate:
- ori
rt, rs, imm16
31
26 op 6 bits
21 rs 5 bits 21 rs 5 bits 21 rs 5 bits
16 rt 5 bits 16 rt 5 bits 16 rt 5 bits

0 immediate 16 bits 0 immediate 16 bits 0 immediate 16 bits
26 - lw rt, rs, imm16 31 op - sw rt, rs, imm16 6 bits
LOAD / STORE Word
BRANCH:
- beq rs, rt, imm16

97
31
26 op 6 bits
Logical Register Transfers

Register Transfer Logic gives the meaning of the instructions
All start by fetching the instruction op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 inst ADDU SUBU ORi LOAD STORE BEQ = MEM[ PC ]
Register Transfers R[rd] R[rs] + R[rt]; PC PC + 4 R[rd] R[rs] R[rt]; PC PC + 4 R[rt] R[rs] | zero_ext(Imm16); PC PC + 4 PC PC + 4 PC PC + 4
R[rt] MEM[ R[rs] + sign_ext(Imm16)]; MEM[ R[rs] + sign_ext(Imm16) ] R[rt];
98
if ( R[rs] == R[rt] ) then PC PC + 4 + sign_ext(Imm16)] || 00 else PC PC + 4
Step 1: Requirements of the Instruction Set

Memory
instruction & data
Registers

(32 x 32)
read RS read RT Write RT or RD
PC Extender Add
99
and Sub register or extended immediate 4 or extended immediate to PC

Add
Step 2: Components of the Datapath

Combinational Storage
Elements
Elements
Clocking
methodology
100
Combinational Logic Elements (Basic Building Blocks)

CarryIn
Adder
OP Sum Carry A B 32 32
A Y B
32 32
Adder
32
ALU
32
Result
MUX
Select
ALU
A B
32 32
MUX
32
101
Storage Element: Register File
Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW
RWRA RB Write Enable 5 5 5 busW 32 Clk 32 32-bit Registers busB 32 busA 32
Register is selected by: RA (number) selects the register to put on busA (data) RB (number) selects the register to put on busB (data) RW (number) selects the register to be written via busW (data) when Write Enable is 1
Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: RA or RB valid busA or busB valid after access time.
102
Storage Element: Idealized Memory
Memory (idealized) One input bus: Data In One output bus: Data Out
Write Enable Address DataOut 32
Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus
Data In 32 Clk
Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid Data Out valid after access time.
103
Memory Hierarchy (Ch. 7)

Want
a single main memory, both large and fast 1: large memories are slow while fast memories are small
Problem
Example:
Solution:
MIPS registers (fast, but few)
memory
mix of memories provides illusion of single large, fast
Cache:
a small, fast memory; Holds a copy of part of a larger, slower memory Imem, Dmem are really separate caches memories
Digression: Sequential Logic, Clocking

Combinational
circuits: no memory
Output
Sequential
depends only on the inputs
circuits: have memory
How
to ensure memory element is updated neither too soon, nor too late? Recall hardware multiplier

105
Product/multiplier register is the writable memory element Gate propagation delay means ALU result takes time to stabilize; Delay varies with inputs Must wait until result stable before write to product/multiplier register else get garbage Chapter 5.1 - Processor Design 1 How to be certain ALU output is stable?
Clock:
free running signal with fixed cycle time (clock period)
Adding a Clock to a Circuit
high (1) low (0) period rising edgefalling edge
Clock determines when to write memory element
level-triggered - store clock high (low) edge-triggered - store only on clock edge
We will use negative (falling) edge-triggered methodology
Role of Clock in MIPS Processors

single-cycle
machine: does everything in one clock cycle
instruction
execution = up to 5 steps must complete 5th step before cycle ends

rising clock edge
clock signal
falling clock edge
datapath instruction execution stable step 1/step 2/step 3/step 4/step 5 register(s) written
SR-Latches
SR-latch with NOR Gates S = 1 and R = 1 not allowed
108
Symbol for SR-Latch with NOR gates

SR-Latches
SR-latch with NAND Gates, also known as SR -latch S = 0 and R = 0 not allowed
109
Symbol for SR-Latch with NAND gates
SR-Latches with Control Input

SR-latch
with NAND Gates and control input C
C = 0, no change of state;
110
C = 1, change is allowed;
If S = 1 and R = 1, Q and Q are Indetermined
D-Latches
D-latch based on SR-Latch with NAND Gates and control input C
C = 0, no change of state; Q (t + t ) = Q (t ) C = 1, change is allowed; Q (t + t ) = D (t ) No Indeterminate Output

Negative Edge-Triggered MasterSlave D-Flip-Flop
Symbol for D-Flip Flop.

Arrowhead (>) indicates an edgetriggered sequential circuit.

112
Bubble means that triggering is effective during the HighLow C transition
Clocking Methodology for the Entire Datapath

Clk Setup Hold Dont Care . . . . . . . . . . . . Setup Hold
113
Design/synthesis based on pulsed-sequential circuits All combinational inputs remain at constant levels and only clock signal appears as a pulse with a fixed period Tcc All storage elements are clocked by the same clock edge Cycle time T = CLK-to-q + longest delay path + Setup time + cc clock skew Chapter 5.1 - Processor Design 1 (CLK-to-q + shortest delay path - clock skew) > hold time
Step 3: Assemble Data Path Meeting Requirements
Register
Transfer Requirements Datapath Assembly Fetch
Instruction Read
Operands and Execute Operation
114
Stages of the Datapath (1/6)

Problem: a single, atomic block which executes an instruction (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient Solution: break up the process of executing an instruction into stages, and then connect the stages to create the whole datapath

Smaller stages are easier to design Easy to optimize (change) one stage without touching the others
115

There is a wide variety of MIPS instructions: so what general steps do they have in common? Stage 1: instruction fetch
No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cachememory hierarchy) Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4)
116

Stage 2: Instruction Decode

upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) first, read the Opcode to determine instruction type and field lengths second, read in data from all necessary registers
-for
add, read two registers -for addi, read one register

-for
jal, no reads necessary
117

Stage
the
3: ALU (Arithmetic-Logic Unit)
real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt) what about loads and stores?
-lw
$t0, 40($t1) -the address we are accessing in memory = the value in $t1 + the value 40 -so we do this addition in this stage

Stage

4: Memory Access
actually only the load and store instructions do anything during this stage; the others remain idle since these instructions have a unique step, we need this extra stage to account for them as a result of the cache system, this stage is expected to be just as fast (on average) as the others
119

Stage

5: Register Write
most instructions write the result of some computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps?
-dont
write anything into a register at the end -these remain idle during this fifth stage
120
Generic Steps: Datapath

instruction memory
ALU
+4
imm
1. Instruction 2. Decode/ 5. Reg. 3. Execute. Memory 4 Fetch Register Write Read
121
Data memory
rd rs rt
registers
PC
Datapath Walkthroughs (1/3)

add

$r3, $r1, $r2
# r3 = r1+r2
Stage 1: fetch this instruction, incr. PC ; Stage 2: decode to find its an add, then read registers $r1 and $r2 ; Stage 3: add the two values retrieved in Stage 2; Stage 4: idle (nothing to write to memory) ; Stage 5: write result of Stage 3 into register $r3 ;
122
Example: add Instruction

instruction memory Data memory
3 1 2 imm
registers
reg[1] reg[1]+reg[2] reg[2] ALU
PC
+4
add r3, r1, r2
123

slti
$r3, $r1, 17
Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its an slti, then read register $r1 Stage 3: compare value retrieved in Stage 2 with the integer 17 Stage 4: go idle Stage 5: write the result of Stage 3 in register $r3
124
Example: slti Instruction

instruction memory
x 1 3 imm
registers
reg[1] ALU
PC
+4
17
slti r3, r1, 17
125
Data memory
reg[1]-17

sw

$r3, 17($r1)
Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its a sw, then read registers $r1 and $r3 Stage 3: add 17 to value in register $41 (retrieved in Stage 2) Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3 Stage 5: go idle (nothing to write into a register)
126
Example: sw Instruction
instruction memory Data memory MEM[r1+17]<=r3
x 1 3 imm
registers
reg[1] reg[1]+17 reg[3] ALU
PC
+4
17
SW r3, 17(r1)
127
Why Five Stages? (1/2)

Could we have a different number of stages? Yes, and other architectures do So why does MIPS have five if instructions tend to go idle for at least one stage? There is one instruction that uses all five stages: the load
128
Why Five Stages? (2/2)

lw

$r3, 17($r1)
Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its a lw, then read register $r1 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) Stage 4: read value from memory address compute in Stage 3 Stage 5: write value found in Stage 4 into register $r3
129
Example: lw Instruction
instruction memory
x 1 3 imm
registers
reg[1]
Data memory
+4
17
LW r3, 17(r1)
130
MEM[r1+17]
reg[1]+17 ALU
PC
Datapath Summary
The A
datapath based on data transfers required to perform instructions
controller causes the right transfers to happen
instruction memory
ALU
+4
imm
opcode, funct
Controller
Data memory
rd rs rt
registers
PC
Overview of the Instruction Fetch Unit
The common operations Fetch the Instruction: mem[PC] Update the program counter: Sequential Code: PC PC + 4 Branch and Jump: PC something else
Clk
PC Next Address Logic Address Instruction Memory Instruction Word 32

132
Add & Subtract

R[rd] R[rs] op R[rt]; Example: Ra, Rb, and Rw come from instructions rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction
addu rd, rs, rt

16 11 6 0
31 op 6 bits
26
21
rs rt rd shamt funct 5 bits 5 bits 5 bits 5 bits 6 bits Rd Rs Rt ALU RegWr 5 5 5 ctr busA Rw Ra Rb busW 32 Result 32 32-bit 32 3 Registers busB Clk 2 32
ALU
133
Register-Register Timing: One complete cycle

Clk PC Old Value Rs, Rt, Rd, Op, Func ALUctr RegWr busA, B busW
Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value
134
Rd Rs Rt RegWr 5 5 5 Rw Ra Rb busW 32 32-bit 32 Clk Registers
busA 32 busB 32
ALUct r 3 2 Result
Register Write Occurs Here

ALU
Logical Operations With Immediate

31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 16 15 11 immediate 16 bits rd? 0
31
immediate 0000000000000000 16 bits 16 bits R[rt] R[rs] op ZeroExt[ imm16 ] Rd Rt RegDst Mux Rs Rt? ALUct RegWr 5 5 5 r busA Rw Ra Rb busW 32 Result 32 32-bit 32 32 Registers busB Clk 32
ALU
Mux
ZeroExt
135
imm16
16
32
ALUSrc
Load Operations
R[rt] Mem[R[rs] + SignExt[imm16]]; Example: lw rt, rs, imm16
31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 11 immediate 16 bits rd 0
Rd Rt RegDst Mux Rs Rt? RegWr 5 5 5 busW 32 Clk
busA Rw Ra Rb 32 32 32-bit Registers busB 32
ALU ctr
W_Src 32 MemWr
ALU
M ux
imm16 16
136
WrEnAdr 32 ?? Data In Data 32 32 Clk Memory ALUSrc
Mux
Extender
ExtOp
Store Operations
Mem[ R[rs] + SignExt[imm16] R[rt] ]; Example: sw rt, rs, imm16
31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits immediate 16 bits ALU ctr MemWr W_Src 0
Rd Rt RegDst Mux Rs Rt RegWr5 5 5 busW 32 Clk
busA Rw Ra Rb 32 32 32-bit Registers busB 32
ALU
32
M ux
Mux
imm16
137
16
32
Data In 32 Clk ALUSrc
WrEn Adr 32 Data Memory
Extender
ExtOp
The Branch Instruction

31 26 op 6 bits
beq
21 rs 5 bits
16 rt 5 bits immediate 16 bits
mem[PC]
rs, rt, imm16
Fetch the instruction from memory
Equal R[rs] == R[rt] Calculate the branch condition if (Equal) Calculate the next instructions address PC PC + 4 + ( SignExt(imm16) 4 ) else PC PC + 4
138
Datapath for Branch Operations

31 26 op 6 bits
beq
21 rs 5 bits
16 rt 5 bits immediate 16 bits
rs, rt, imm16
Datapath generates condition (equal)
Inst Address 4 nPC_sel RegWr 5 busW Clk Rs Rt 5 5 busA Rw Ra Rb 32 32 32-bit Registers busB 32
Cond
imm16
139
Clk
Equal?
Adder Adder PC Ext
00 Mux PC
32
Summary: A Single Cycle Datapath

Inst Memory Adr nPC_sel 4 Instruction<31:0> Rs RegDst Rd 1 RegWr 5
<21:25>
Rt Rd Rt 0
busW 32 Clk
Rs Rt 5 5 Rw Ra Rb busA 32 32 32-bit 0 Registers busB 32
<16:20>
<11:15>
Imm16 Equal ALUc MemWr MemtoReg tr =
<0:15>
Adder Adder
00
ALU
32
Mux
PC
Mux
Mux
Clk
imm16 16
32
32 WrEn Adr Data In Data Clk Memory
imm16
Extender
PC Ext
140
ExtOp ALUSrc
An Abstract View of the Critical Path
Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid Output valid after access time.
Next Address
Ideal Instruction Instruction Memory Rd Rs Rt 5 5 5 Instruction Address 32 Rw Ra Rb 32 32-bit Registers
Imm 1 6 A 32 B 32
Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew
32
PC
Data Address Data In
Ideal Data Memory

ALU
141
Clk
Clk
Clk
An Abstract View of the Implementation

Ideal Instruction Memory
Control
Instruction Rd Rs Rt 5 5 5 Rw Ra Rb 32 32-bit Registers Clk
Control Signals Conditions
Next Address
Instruction Address
A 32 B 32
PC
32
32
Data Address Data In Clk
Ideal Data Memory
Data Out
ALU
Clk
Datapath
Steps 4 & 5: Implement the control
In The Next Section
143
Summary: MIPS-lite Implementations

single-cycle:
executed
uses single l-o-n-g clock cycle for each instruction
Easy
to understand, but not practical
slower

than implementation that allows instructions to take different numbers of clock cycles
fast instructions: (beq) fewer clock cycles slow instructions (mult?): more cycles
multicycle,
Next
144
pipelined implementations later
time, finish the single-cycle implementation

Summary
5 steps to design a processor 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic MIPS makes it easier Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates Single cycle datapath: CPI = 1, TCC long Next time: implementing control
145
Processor Design - 2
Adopted from notes by David A. Patterson, John Kubiatowicz, and others. Copyright 2001 University of California at Berkeley
146
Summary: A Single Cycle Datapath

Inst Memory Adr Instruction<31:0>
Rs nPC_sel RegDst
Rd Rt 1 0 5 Rs 5 Rt
<21:25>
Rt
<16:20>
Rd
<11:15>
Imm16 Equal ALUctr MemWr MemtoReg
<0:15>
RegWr 5
00
Adder Mux Adder PC Ext

147
busW 32 Clk
busA Rw Ra Rb 32 32-bit Registers busB 32
32 0
ALU
32 WrEn Adr Data Memory
imm16
Clk
PC
Mux
Mux
1 32
imm16
32 Data In Clk
Extender
16
ExtOp
ALUSrc
An Abstract View of the Critical Register file and ideal memory: Path

The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid => Output valid after access time.
Ideal Instruction Memory Instruction Address
Instruction Rd Rs 5 5 Rt 5 Imm 16 A
Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew 32 Data Address Data In Clk
Next Address
32
Rw Ra Rb 32 32-bit Registers
32 B
Ideal Data Memory
ALU
PC
Clk
148
Clk
32
The Big Picture: Where are We Now? The Five Classic Components of a Computer
Processor Input Control Memory Datapath
Output
Next
Topic: Designing the Control for the Single Cycle Datapath

149
An Abstract View of the Implementation

Ideal Instruction Memory Instruction Address
Control
Instruction Rd Rs 5 5 Rt 5 A 32 Data Address Data In Clk Data Out Control Signals Conditions
Next Address
32
Rw Ra Rb 32 32-bit Registers
32 B
Ideal Data Memory
ALU
PC
Clk
Clk
32
Datapath
150
Recap: A Single Cycle Datapath

Rs, We
Rt, Rd and Imed16 hardwired into datapath from Fetch Unit have everything except control signals (underline)
Instruction<31:0> nPC_sel Rd Rt Rs Rt busA 32 0 Instruction Fetch Unit
Todays lecture will show you how to generate the control signals
<21:25> <16:20> <11:15> <0:15>
RegDst
1 Mux 0 RegWr 5 5 5
Clk
ALUctr
Rt Zero
Rs
Rd
Imm16 MemtoReg 0
busW 32 Clk
Rw Ra Rb 32 32-bit Registers busB 32
MemWr
ALU
32 WrEn Adr Data Memory 32
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc
151
ExtOp
Recap: Meaning of the Control Signals

0 PC PC + 4 1 PC PC + 4 + SignExt(Im16) || 00 Later in lecture: higher-level connection between mux and branch cond
nPC_sel:
nPC_sel Inst Memory
Adr
Adder Mux Adder
imm16
Clk
152
00 PC
PC Ext
Recap: Meaning of the Control Signals ExtOp: zero, sign

0 regB; 1 immed add, sub, or
Equal ALUctr MemWr Rd Rt 0 1
ALUsrc: ALUctr:
MemWr: 1 write memory MemtoReg: 0 ALU; 1 Mem RegDst: 0 rt; 1 rd RegWr: 1 write register
RegDst
MemtoReg
RegWr 5 5 Rs 5 Rt Rw Ra Rb busA busW 32 32-bit 32 busB Registers 32 Clk
32 0 1
ALU
32 WrEn Adr Data Memory
0 1
Mux
Mux
Extender
imm16
16
32
32 Data In Clk
153
ExtOp
ALUSrc
31 op 6 bits
add
26
The add Instruction

21 16 rs 5 bits rt 5 bits rd 5 bits
11 shamt 5 bits
6 funct 6 bits
rd, rs, rt
mem[PC] Fetch the instruction from memory R[rd] R[rs] + R[rt] The actual operation PC PC + 4 Calculate the next instructions address
154
Fetch Unit at the Beginning of add

Inst Memory Adr nPC_sel Instruction<31:0>
Fetch
the instruction from Instruction memory: Instruction mem[PC] (This is the same for all instructions)
Adder Adder
1 Clk
155
imm16
00 PC
Mux
31
The Single Cycle 26 21 16 11 Datapathrtduring add6 op rs rd shamt

nPC_sel= +4 Instruction Fetch Unit
0 funct
R[rd]
R[rs] + R[rt]
Instruction<31:0>
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = 1
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr = Add
Rt Zero
Rs
Rd
Imm16 MemtoReg = 0
RegWr = 1 busW 32 Clk
busA Rw Ra Rb 32 32 32-bit Registers busB 0 32
MemWr = 0 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 0
156
ExtOp = x
Instruction Fetch Unit at the End of PC PC + 4 add

This
is the same for all instructions except: Branch and Jump

Inst Memory Adr nPC_sel
Instruction<31:0>
1 Clk
157
imm16
00 PC
Adder Adder
Mux
The Single Cycle Datapath during 26 21 0 Or31Immediate 16

op rs rt immediate
R[rt]
R[rs] or ZeroExt(Imm16)
nPC_sel = Instruction<31:0> Instruction Fetch Unit
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = RegWr = busW 32 Clk
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr =
Rt Zero
Rs
Rd
Imm16 MemtoReg =
MemWr = 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc =
158
ExtOp =
The Single Cycle Datapath during 26 21 0 Or31Immediate 16

op rs rt immediate
R[rt]
R[rs]
or
ZeroExt(Imm16)
nPC_sel= +4 Instruction<31:0> Instruction Fetch Unit
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = 0
Rd
Rt Rs Rt
1 Mux 0 5 5
Clk
RegWr = 1 5 busW 32 Clk
ALUctr = Or
Rt Zero
Rs
Rd
Imm16 MemtoReg = 0
MemWr = 0 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 1
159
ExtOp = 0
31
The Single Cycle 26 21 16 Datapath during Load op rs rt immediate

nPC_sel= +4 Instruction<31:0> Instruction Fetch Unit
R[rt]
Data Memory {R[rs] + SignExt[imm16]}

<21:25> <16:20> <11:15>
<0:15>
RegDst = 0
Rd
Rt Rs Rt
1 Mux 0 5 5
Clk
RegWr = 1 5 busW 32 Clk
ALUctr = Add
Rt Zero
Rs
Rd
Imm16 MemtoReg = 1
MemWr = 0 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 1 ExtOp = 1
160
The Single Cycle Datapath 31 26 21 16 0 during rsStore op rt immediate

Data
Memory {R[rs] + SignExt[imm16]} R[rt]

nPC_sel = Instruction<31:0> Instruction Fetch Unit
<21:25>
<16:20>
<11:15>
<0:15>
RegDst =
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
RegWr = busW 32 Clk
ALUctr = busA 32 0
Rt Zero
Rs
Rd
Imm16 MemtoReg = 0
MemWr =
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = ExtOp =
161
31
The Single Cycle 26 21 16 Datapath during Store op rs rt immediate

nPC_sel= +4 Instruction Fetch Unit
Data
Memory {R[rs] + SignExt[imm16]} R[rt]

Instruction<31:0>
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = x
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr = Add
Rt Zero
Rs
Rd
Imm16 MemtoReg = x
MemWr = 1 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 1 ExtOp = 1
162
The Single Cycle Datapath 31 26 21 16 0 duringrsBranch op rt immediate

if
(R[rs] R[rt] == 0) then Zero 1; else Zero 0

nPC_sel= Br Instruction<31:0> Instruction Fetch Unit
<21:25>
<16:20>
<11:15>
<0:15>
RegDst = x
Rd
Rt Rs Rt
1 Mux 0 5 5 5
Clk
ALUctr =Sub
Rt Zero
Rs
Rd
Imm16 MemtoReg = x
MemWr = 0 0
ALU
Mux
Mux
Extender
1 32
imm16
Data In 32 Clk
16
ALUSrc = 0
163
ExtOp = x
Instruction Fetch Unit at the0 31 26 21 16 op rs rt immediate End of Branch

if
(Zero == 1) then PC = PC + 4 + SignExt(imm16) 4 ; else PC = PC + 4

Inst Memory Instruction<31:0>
nPC_sel Zero
Adr
What is encoding of nPC_sel?
Direct MUX select? Branch / not branch

4
Lets choose second option

Adder Adder 00
0
1 Clk
n C se P _ l 0 1 1
ze ? ro x 0 1
MX U 0 0 1
Mux
PC
164
imm16
Step 4: Given Datapath: Instruction<31:0> RTL Control

Inst Memory Adr Op
Fun
nPC_sel
RegWr RegDst ExtOp ALUSrc ALUctr
<21:25>
Rt
<21:25>
Rs
Control
DATA PATH
<16:20>
Rd
<11:15>
Imm16
<0:15>
MemWr MemtoReg Zero
165
inst ADD
A Summary of Register Transfer Control Signals

R[rd] R[rs] + R[rt]; R[rd] R[rs] R[rt]; R[rt] R[rs] + zero_ext(Imm16); R[rt] MEM[ R[rs] + sign_ext(Imm16)]; PC PC + 4 PC PC + 4 PC PC + 4 PC PC + 4 ALUsrc = RegB, ALUctr = add, RegDst = rd, RegWr, nPC_sel = +4
SUB
ALUsrc = RegB, ALUctr = sub, RegDst = rd, RegWr, nPC_sel = +4 ORi
ALUsrc = Im, Extop = Z, ALUctr = or, RegDst = rt, RegWr, nPC_sel = +4 LOAD
ALUsrc = Im, Extop = Sn, ALUctr = add, MemtoReg, RegDst = rt, RegWr, nPC_sel = +4 STORE MEM[ R[rs] + sign_ext(Imm16) ] R[rs]; PC PC + 4
ALUsrc = Im, Extop = Sn, ALUctr = add, MemWr, nPC_sel = +4 BEQ if ( R[rs] == R[rt] ) then PC PC + sign_ext(Imm16)] || 00 else PC PC + 4 nPC_sel = Br, ALUctr = sub
166
See Appendix A
A Summary of Control Signals 10 0010 func 10 0000 We Dont Care :-)

op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 add sub ori lw sw beq jump 1 1 0 0 x x x 0 0 1 0 0 0 x Add 21 rs rs rt rt target address 0 0 1 0 0 0 x Subtract 16 rd 1 0 1 0 0 0 0 Or 1 1 1 0 0 0 1 Add 11 shamt immediate 1 x 0 1 0 0 1 Add 6 funct 0 x 0 0 1 0 x Subtract 0 add, sub ori, lw, sw, beq jump
RegDst ALUSrc MemtoReg RegWrite
x x 0 0 0 1 x xxx
MemWrite nPCsel Jump ExtOp ALUctr<2:0> 31 R-type I-type J-type

167
26 op op op
The Concept of op Local Decoding 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop<N:0> R-type 1 0 0 1 0 0 0 x R-type ori 0 1 0 1 0 0 0 0 Or func op 6 Main Control 6 ALUop N ALU Control (Local) lw 0 1 1 1 0 0 0 1 Add sw x 1 x 0 1 0 0 1 Add jump x x x 0 0 0 1 x Subtract xxx beq x 0 x 0 0 1 0 x
ALUctr 3
168
ALU
op 6
Main Control
The Encoding func of ALUop ALU

6 ALUop N Control (Local)
ALUctr 3
In
this exercise, ALUop has to be 2 bits wide to represent: R-type instructions (1) I-type instructions that require the ALU to perform: (2) Or, (3) Add, and (4) Subtract To implement the full MIPS ISA, ALUop has to be 3 bits to represent: R-type instructions (1) I-type instructions that require the ALU to perform: (2) Or, (3) Add, (4) Subtract, and (5) And (Example: andi)
ALUop (Symbolic) ALUop<2:0>

169
R-type R-type 1 00
ori Or 0 10
lw Add 0 00
sw Add 0 00
jump Subtract xxx xxx 0 01

beq
op 6
The Decoding of the func func Field ALU ALUctr 6 Main

Control ALUop N Control (Local) lw Add 0 00 11 rd P. 286 text: shamt 3 sw Add 0 00 beq Subtract 0 01 6 funct jump xxx xxx 0 R-type ori Or 0 10 16 rt R-type 1 00 21 rs
ALUop (Symbolic) ALUop<2:0> 31 R-type op 26
funct<5:0> 10 0000 10 0010 10 0100 10 0101 10 1010
Instruction Operation add subtract and or set-on-less-than
ALUctr
ALUctr<2:0> 000 001
ALU Operation And Or Add Subtract Set-on-less-than
ALU
010 110 111
170
The Truth Table for ALUctr

ALUop (Symbolic) ALUop<2:0> R-type R-type 1 00 ori Or 0 10 lw Add 0 00 sw Add 0 00 beq Subtract 0 01
funct<3:0> 0000 0010 0100 0101 1010
Instruction Op. add subtract and or set-on-less-than
ALUop bit<2> bit<1> bit<0> 0 0 0 0 x 1 0 1 x 1 x x 1 x x 1 x x 1 x x 1 x x

171
func bit<3> bit<2> bit<1> bit<0> x x x x x x x x x x x x 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0
ALU Operation Add Subtract Or Add Subtract And Or Set on <
ALUctr bit<2> bit<1> bit<0> 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 1 1 1

ALUop 0 1 1 x x x
The Logic Equation for ALUctr<2> func

bit<3> bit<2> bit<1> bit<0> x 0 1 x 0 0 x 1 1 x 0 0 ALUctr<2> 1 1 1 1 x x
bit<2> bit<1> bit<0>
This makes func<3> a dont care
ALUctr<2>
= !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0>
172
The Logic Equation for ALUctr<1> ALUop func

bit<2> bit<1> bit<0> 0 0 1 1 1 0 x x x x 0 1 x x x bit<3> bit<2> bit<1> bit<0> x x 0 0 1 x x 0 0 0 x x 0 1 1 x x 0 0 0 ALUctr<1> 1 1 1 1 1
ALUctr<1>
= !ALUop<2> & !ALUop<0> + ALUop<2> & !func<2> & !func<0>
173
The Logic Equation ALUop for ALUctr<0> func

bit<2> 0 1 1 bit<1> 1 x x bit<0> x x x bit<3> x 0 1 bit<2> x 1 0 bit<1> x 0 1 bit<0> x 1 0 ALUctr<0> 1 1 1
ALUctr<0>
= !ALUop<2> & ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> & func<0> + ALUop<2> & func<3> & !func<2> & func<1> & !func<0>
174
The ALU func Control Block

6 ALUop 3
ALUctr<2> ALUctr<1> ALUctr<0>
ALU Control (Local)
ALUctr 3
!ALUop<2> ALUop<2> = !ALUop<2> ALUop<2> = !ALUop<2> + ALUop<2> +
& ALUop<0> + & !func<2> & func<1> & & !ALUop<0> + & !func<2> & !func<0> & ALUop<0> & !func<3> & func<2> & !func<1> & func<0> ALUop<2> & func<3> & !func<2> & func<1> & !func<0>
!func<0>
175
Step 5: Logic for Each Control Signal

nPC_sel
<= if (OP == BEQ) then Br else +4 ALUsrc <= if (OP == Rtype) then regB else immed ALUctr <= if (OP == Rtype) then funct elseif (OP == ORi) then OR elseif (OP == BEQ) then sub else add ExtOp <= _____________ MemWr <= _____________ MemtoReg <= _____________ RegWr: <=_____________ RegDst: <= _____________
nPC_sel
<= if (OP == BEQ) then Br else +4 ALUsrc <= if (OP == Rtype) then regB else immed ALUctr <= if (OP == Rtype) then funct elseif (OP == ORi) then OR elseif (OP == BEQ) then sub else add ExtOp <= if (OP == ORi) then zero else sign MemWr <= (OP == Store) MemtoReg <= (OP == Load) RegWr: <= if ((OP == Store) || (OP == BEQ)) then 0 else 1 RegDst: <= if ((OP == Load) || (OP == ORi)) then 0 else 1
Step 5: Logic for each control signal
177
op 6
The Truth Table for the RegDst func Main ALUSrc Control ALU
Main Control
ALUop 3 00 0000 R-type 1 0 0 1 0 0 0 x R-type 1 0 0
Control (Local)
ALUctr 3
op RegDst ALUSrc MemtoReg RegWrite MemWrite nPC_sel Jump ExtOp ALUop (Symbolic) ALUop <2> ALUop <1> ALUop <0>
178
00 1101 10 0011 10 1011 00 0100 00 0010 ori lw sw beq jump 0 0 x x x 1 1 1 0 x 0 1 x x x 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 x x Or Add Add Subtract xxx 0 0 0 x 0 1 0 0 x 0 0 0 0 x 1
A Real MIPS Datapath (CNS T0)
179
op 6 Instr<31:26>
Main Control
Summary: A Single Cycle ALUop ALU Processor 3 ALUctr RegDst func

Control ALUSrc Instr<5:0> 6 nPC_sel Rd Rt Rs Rt busA 32 0 Clk 5 Instruction<31:0> Instruction Fetch Unit 3
<21:25>
<16:20>
<11:15>
<0:15>
RegDst
1 Mux 0 RegWr 5 5
ALUctr
Rt Zero
Rs
Rd
Imm16 MemtoReg 0
busW 32 Clk
MemWr
ALU
Mux
Mux
Extender
1 32 ALUSrc
imm16 Instr<15:0>
Data In 32 Clk
16
180
ExtOp
Recap: An Abstract View of the Critical Path file and Register(Load) ideal memory:
The
CLK input is a factor ONLY during write operation During read operation, behave as combinational logic:
Address valid Output valid after access time.

Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew 32 Data Address Data In Clk
Ideal Instruction Memory Instruction Address Rd 5
Instruction Rs 5 Rt 5 Imm 16 A
Next Address
32
Rw Ra
Rb
32 32-bit Registers
32 B
Ideal Data Memory
ALU
PC
Clk
181
Clk
32
Clk PC Old Value Rs, Rt, Rd, Op, Func
Worst Case Timing (Load)

Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Old Value Instruction Memoey Access Time New Value Delay through Control Logic New Value New Value New Value New Value New Value
ALUct r ExtOp ALUSrc MemtoReg RegWr busA busB Addres s busW 182
Register Write Occurs
Old Delay Value Extender & Mux through Old Value Old Value Old Value
Register File Access Time New Value New Value ALU Delay New Value
Data Memory Access Time New Chapter 5.2 - Processor Design 2
Drawback of this Single Cycle

Processor
Long
cycle time: Cycle time must be long enough for the load instruction: PCs Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew Cycle time for load is much longer than needed for all other instructions
Summar Single cycle datapath: CPI = 1, CCT long y

5 steps to design a processor

1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic
Processor Control Memory Output Input
Control is the hard part MIPS makes control easier

Datapath Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates
184

ComputerOrganization (EENG3710) Spring2008

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ComputerOrganization (EENG3710) Spring2008

Uploaded by

Copyright:

Available Formats

Computer Organization (EENG 3710)

Instructor: Partha Guturu EE Department

Quick Recap on our respective roles

What does the data say?

0 10 20 30 40 50 60 Time from Start of Lecture (minutes)

Whats so good about our approach?

learning versus reactive

Course Objectives: What you need to learn?

High level view of a computer Different types

Desk/lap tops Servers Embedded systems

Different Applications & Requirements

Embedded Computers in Your Car

Relative levels of demand for different computer types

Anatomy of Computer & Our Focus

Coordination of many levels (layers) of abstraction

A Dissection to Reveal Finer Details

primary focus: the processor (datapath and control)

5 Major Components of a Computer

Keyboard, Mouse Disk

5 Major Components of a Computer

Processor Chip (CPU) Components

Dramatic Changes in Technology

Growth Capacity of DRAM Chips

In recent years growth rate has slowed to 2x every 2 year

Dramatic Changes in Technology

Gordon Moore Intel Cofounder

The Underlying Technologies

Very Large Scale IC (VLSI)2,400,000 Ultra VLSI 6,200,000,000

What if technology in the automobile industry advanced at the same rate?

What if the automobile

Complex Chip Manufacturing Process Enabled by Technological Breakthroughs

Computer Architecture versus Computer Organization

Language of the Computer

Why bother to learn assembly language?

Basic Instruction Format

Now Guess MIPS Architecture

Instruction Set Architecture (ISA)

The Instruction Set Architecture (ISA)

The interface description separating the software and hardware

ISA: CISC vs. RISC

The MIPS ISA

3 Instruction Formats: all 32 bits wide OP OP OP rs rs rt rt rd sa

MIPS Registers and their Roles

$zero $at $v0 -$v1 $ao -$a3

Quick Recap- Compilers

Sorted list of 10 numbers

Output 10 numbers Input Machine X code To Sort 10 numbers

Quick Recap- Shortcut Compilers

Output: Sorted list of 10 numbers

Quick Recap- Bootstrapping

Chapter 2- MIPS Programming

Quick Recap- MIPS

Loops and Branches

Resources On the Web

Theres a very good SPIM tutorial at http://chortle.ccsu.edu/AssemblyTutorial/Chapter-09/ass09_1.html

MIPS: Software Conventions For Registers

You set $v0 to indicate the operation. Parameters in $a0, $a1.

Example: Reading an int

A Typical MIPS READ and WRITE Program

A C-Program with Read and Sum Loops

Why Are Stacks So Great?

Why Are Stacks So Great?