You are on page 1of 184

Computer Organization (EENG 3710)

Instructor: Partha Guturu EE Department

Quick Recap on our respective roles

Who is responsible for your learning of Computer Organization? Some aphorisms on Teaching philosophy: I do not teach my pupils. I provide conditions in which they can learn- Albert Einstein I hear and I forget. I see and I remember. I do and I understand Chinese proverb "Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime." -- Chinese proverb

What does the data say?


100 Even if you are fascinating.. People only remember the Percent first 15 minutes of 50 of what you say Students Paying Attention

0 10 20 30 40 50 60 Time from Start of Lecture (minutes)

Whats so good about our approach?


Learner-Centric Life-long

Approach

learning versus reactive

Proactive

Course Objectives: What you need to learn?


High level view of a computer Different types


Desk/lap tops Servers Embedded systems

Anatomy of a computer and our focus here Computer Organization versus Architecture Instruction sets Different components of a computer and their interworking Computer Performance Issues

Different Applications & Requirements

Desktop Applications Emphasis on performance of integer and Floating Point (FP) data types Little regard for program (code) size and power consumption Server Applications Database, file system, web applications, time-sharing FP (Floating Point) performance is much less important than integer and character strings Little regard for program (code) size and power consumption Embedded Applications Digital Signal Processors (DSPs), media processors, control High value placed on program size and power consumption Less memory, is cheaper and lower power Reduce chip costs: FP instructions may be optional

Embedded Computers in Your Car

Relative levels of demand for different computer types

Anatomy of Computer & Our Focus


Application (ex: browser) Compiler Operating System Instruction Set Architecture

Software Hardware

Assembler Processor Memory I/O system Datapath Control Digital Design Circuit Design
transistors

Coordination of many levels (layers) of abstraction

Why a Compiler?
In Paris they simply stared Why High Level Language when I spoke to them in Ease of thinking and coding in an French; I never did English/Math like language succeed in making those Enhanced productivity because of idiots understand their the ease to debug and validate own language. Maintainability Target independent development Mark Twain, The Availability of optimizing compilers Innocents Abroad, 1869

A Dissection to Reveal Finer Details


High Level Language Program (e.g., C) Compiler Assembly Language Program (e.g.,MIPS) Assembler Machine Language Program (MIPS) Machine Interpretation Hardware Architecture Description (Logisim, VHDL, Verilog, etc.) Architecture Implementation Logic Circuit Description (Logisim, etc.) temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;
0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 lw lw sw sw 1010 0000 0101 1100 $t0, 0($2) $t1, 4($2) $t1, 0($2) $t0, 4($2) 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111

What is in a Computer?
Components:

processor (datapath, control) input (mouse, keyboard) output (display, printer) memory (cache (SRAM), main memory (DRAM))

Our

primary focus: the processor (datapath and control)


Implemented using millions of transistors Impossible to understand by looking at each transistor We need abstraction!

5 Major Components of a Computer


Personal Computer Computer Processor Control (brain) Datapath (brawn) Memory (where programs, data live when running) Devices Input

Keyboard, Mouse Disk


(where programs, data live when not running)

Output

Display, Printer

5 Major Components of a Computer

Processor Chip (CPU) Components

Motherboard LayOut

Dramatic Changes in Technology

Processor Logic capacity: about 30% ~ 35% per year Clock rate : about 30% per year Memory DRAM: Dynamic Random Access Memory Capacity: about 60% per year (4x every 3 years) Memory speed: about 10% per year Cost per bit: improves about 25% per year Disk Capacity: about 60% ~ 100% per year Speed: about 10% per year Network Bandwidth 10 Mb ------(10 years)-- 100Mb ------(5 years)-- 1 Gb

Growth Capacity of DRAM Chips

K = 1024 (210 )

In recent years growth rate has slowed to 2x every 2 year

Dramatic Changes in Technology


# of transistors on an IC

Gordon Moore Intel Cofounder


2X Transistors / Chip Every 1.5 years Called Moores Law

Year

The Underlying Technologies


Year 1951 1965 1975 1995 2005 Technology Vacuum Tube Transistor Integrated Circuit (IC) Relative Performance/Unit Cost 1 35 900

Very Large Scale IC (VLSI)2,400,000 Ultra VLSI 6,200,000,000

What if technology in the automobile industry advanced at the same rate?

What if the automobile


If the automobile had followed the same development cycle as the computer, a Rolls-Royce would today cost $100, get a million miles per gallon, and explode once a year, killing everyone inside. Robert X. Cringely, InfoWorld magazine

Complex Chip Manufacturing Process Enabled by Technological Breakthroughs

Computer Architecture versus Computer Organization


Computer architecture is the abstract image of a computing system that is seen by a machine language (or assembly language) programmer, including the instruction set, memory address modes, processor registers, and address and data formats; whereas the computer organization is a lower level, more concrete, description of the system that involves how the constituent parts of the system are interconnected and how they interoperate in order to implement the architectural specification --Phillip A. Laplante (2001), Dictionary of Computer Science, Engineering, and Technology -> Can change organization without changing architecture (e.g. 64 bit architecture with 16 bit machine using 4 clock cycles)

Course Outline
Topic # weeks Introduction to Computer Organization (1) Computer Instructions (2) Arithmetic and Logic Unit (1) Performance Analysis (1) Data Path and Control (2) Performance Enhancement with Pipelining (2) Memory Hierarchy and Virtual Memory Concepts (2) Storage, Networks, and other Peripherals (1) Engineering Design with Microcomputers (2)

Course Objectives
Know about the different software and hardware components of a digital computer . Comprehend how different components of the digital computer collaborate to produce the end result in an application development process Apply principles of logic design to digital computer design. Analyze digital computer and decompose it into modules and lower level logical blocks involving both combinational and sequential circuit elements. Synthesize various components of computer's Arithmetic Logic Unit, Control Units, and Data Paths Understand and Assess (evaluate) computer CPU performance, and learn methods to enhance computer performance.

Language of the Computer


We will have a quick look at MIPS language MIPS- Not to be confused with million instructions per second MIPS- Microprocessor without Interlocked Pipelined Stages- a RISC (Reduced Instruction Set Computer) processor developed by MIPS Technologies. By 1990 1 out of 3 RISC processors was using MIPS; Architecture also called MIPS CISCO routers, Nintendo 64, Sony Play Station, Play Station 2, etc. use MIPS designs

Why bother to learn assembly language?

The difference between mediocre and star programmers is that star programmers understand assembly language, whether or not they use it on a daily basis. Assembly language is the language of the computer itself. To be a programmer without ever learning assembly language is like being a professional race car driver without understanding how your carburetor works. To be a truly successful programmer, you have to understand exactly what the computer sees when it is running a program. Nothing short of learning assembly language will do that for you. Assembly language is often seen as a black art among today's programmers - with those knowing this art being more productive, more knowledgeable, and better paid, even if

Basic Instruction Format


Three Instruction Formats:
R Opcode 31 I Opcode 31 J Opcode 31 26 25 26 25 26 25 rs 21 20 rs 21 20 rt 16 15 Memory Address 0 rt rd 16 15 shamt 11 10 Immediate 0 6 5 funct 0

Now Guess MIPS Architecture


How

many registers? How big a memory could be supported? What is memory word size? How to handle data in RAM? Non-architectural design/implementation issue that vary from design to design: Roles of registers

Instructions

Instruction Set Architecture (ISA)

The words of a computers language are called instructions Instructions set The vocabulary of a computers language is called instruction set Instruction Set Architecture (ISA) The set of instructions a particular CPU implements is an Instruction Set Architecture.

The Instruction Set Architecture (ISA)


software
instruction set architecture

hardware

The interface description separating the software and hardware

ISA Sales

ISA: CISC vs. RISC

Early trend was to add more and more instructions to new CPUs to do elaborate operations CISC (Complex Instruction Set Computer) The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. VAX architecture had an instruction to multiply polynomials! RISC philosophy (Cocke IBM, Patterson, Hennessy, 1980s) Reduced Instruction Set Computer Keep the instruction set small and simple, makes it easier to build fast hardware. Let software do complicated operations by composing simpler ones.

The MIPS ISA

3 Instruction Formats: all 32 bits wide OP OP OP rs rs rt rt rd sa

Instruction Categories Load/Store Computational Jump and Branch Floating Point coprocessor Memory Management Special

Registers R0 - R31

PC HI LO funct

immediate

jump target

MIPS Registers and their Roles


Name Number Use Preserved across a Call?

$zero $at $v0 -$v1 $ao -$a3

0 1 2-3 4-7

The constant value 0 Assembler Temporary Values for function results Expression Evaluation Arguments

N.A. No No No

$t0 -$t7 8-15 Temporaries No $s0 -$s7 16-23 Saved Temporaries YES $t8 -$t9 24-25 Temporaries No $k0 -$k1 26-27 Reserved for OS kernel No $gp (28) global pointer, $sp (29) stack pointer, $fp (30) frame pointer, $ra (31)return address are all preserved across a call

Simple operations
Compute

f = (a+b)-(c-d) assuming these variables are in some $s registers Memory operation- base register concept Why a multiplication factor of 4 is required for n the array element- Answer:-Memory addresses are in MIPS are byte addresses.

Quick Recap- Compilers


C++ program to Sort 10 numbers Input
C++ Compiler (Machine-X code to translate any C++ Program into Assembly Program for Machine X)

Sorted list of 10 numbers

Two steps in this dotted area can be Output : merged together Machine X into a single step Assembly Program to sort 10 numbers Input
Assembler (Machine-X code to translate any Machine X Assembly Program into Machine X code)

Output 10 numbers Input Machine X code To Sort 10 numbers

Machine X

Output

Machine X

Machine X

Quick Recap- Shortcut Compilers


C++ program to Sort 10 numbers Input C++ Compiler (Machine-X code to translate any C++ Program directly into Machine X code) Machine X Input Machine X code To Sort 10 numbers Machine X 10 numbers

Output

Output: Sorted list of 10 numbers

Quick Recap- Bootstrapping


C++ program to translate any C++ program into Machine Y code Input C++ Compiler (Machine-X code to translate any C++ Program directly into Machine X code) Machine X Machine X code to translate Output: Output any C++ program into Machine Y code Machine Y Code for C++ compiler (i.e. to Machine X translate any C++ program into Machine Y code) This can be installed and run on Machine Y; thus you have a compiler for Machine Y

Input

Chapter 2- MIPS Programming

Quick Recap- MIPS


MIPS

language- expansion of the acronym No of registers and architecture in general The 3 Instruction formats and the various fields (e.g. rs, rt, rd, shamt, etc.) Now, we proceed along with

MIPS Assembly Instruction formats Coding simple problems and translating into MIPS machine code

Simple Statements
C

Code: d = (a + b) (c + d)

Machine

code assuming a, b, c, and d are in MIPS registers code assuming that a, b, c, and d are in consecutive memory locations from a given starting address (use lw, sw)

Machine

Loops and Branches


Develop

assembly code for a typical C-code to add 100 numbers as follows: // Read 100 numbers into an array A sum = 0; for (i = 0; i < 100; i++) { sum = sum + A[i]; } // Print sum

Procedure Calls
Caller

and Callee- who should preserve which registers? Leaf and recursive procedure examples for explaining the conventions, and jla and jr instructions.

SPIM
Courtesy: Prof. Jerry Breecher Clark University Appendix A

MIPS Simulation

SPIM is a simulator. Reads a MIPS assembly language program. Simulates each instruction. Displays values of registers and memory. Supports breakpoints and single stepping. Provides simple I/O for interacting with user.

SPIM Versions

SPIM is the command line version. XSPIM is x-windows version (Unix workstations). There is also a windows version. You can use this at home and it can be downloaded from: http://www.cs.wisc.edu/~larus/spim.html.

Resources On the Web

Theres a very good SPIM tutorial at http://chortle.ccsu.edu/AssemblyTutorial/Chapter-09/ass09_1.html

In fact, theres a tutorial for a good chunk of the ISA portion of this course at: http://chortle.ccsu.edu/AssemblyTutorial/tutorialContents.html

Here are a couple of other good references you can look at: Patterson_Hennessy_AppendixA.pdf

And http://babbage.clarku.edu/~jbreecher/comp_org/labs/Introduction_To_SPIM.pdf

SPIM Program

MIPS assembly language. Must include a label main this will be called by the SPIM startup code (allows you to have command line arguments). Can include named memory locations, constants and string literals in a data segment.

General Layout

Data definitions start with .Data directive. Code definition starts with .Text directive. Text is the traditional name for the memory that holds a program. Usually have a bunch of subroutine definitions and a main.

Simple Example
.data .word 0 .text .align 2 .globl main main: lw $a0,foo # data memory # 32 bit variable # program memory # word alignment # main is global

foo:

Data Definitions

You can define variables/constants with: .word : defines 32 bit quantities. .byte: defines 8 bit quantities. .asciiz: zero-delimited ascii strings. .space: allocate some bytes.

Data Examples

.data prompt: .asciiz Hello World\n msg: .asciiz The answer is x: .space 4 y: .word 4 str: .space 100

MIPS: Software Conventions For Registers

Simple I/O
SPIM provides some simple I/O using the syscall instruction. The specific I/O done depends on some registers.

You set $v0 to indicate the operation. Parameters in $a0, $a1.

I/O Functions
System call is used to communicate with the system and do simple I/O. $v0 Load arguments (if any) into registers $a0, $a1 or $f12 (for floating point). do: syscall Results returned in registers $v0 or $f0.

Example: Reading an int


li $v0,5 syscall # Indicate we want function 5

# Upon return from the syscall, $v0 has the integer typed by # a human in the SPIM console # Now print that same integer move $a0,$v0 # Get the number to be printed into register li $v0,1 # Indicate were doing a write-integer syscall

Printing A String

msg:

main:

.data pseudoinstruction: load immediate .asciiz SPIM IS FUN .text .globl li $v0,4 la $a0,msg pseudoinstruction: load address syscall jr $ra

A Typical MIPS READ and WRITE Program


.data 0x10000000 A: .word 0, 0 .text main: la $t0, A li $v0, 5 #setting up return reg for read syscall sw $v0, ($t0) li $v0, 5 #setting up return reg for read syscall sw $v0, 4($t0) lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 li $v0, 1 #setting up return reg for print move $a0,$t3 syscall

A C-Program with Read and Sum Loops


Int main (int argc, char **argv) // Older versions of C accept: void main() { int A[5], i; for (i = 0; i <=4; i++) { scanf(%d, A[i]); } sum = 0; for (i = 0; i <=4; i++) { sum = sum + A[i]; } printf(The sum of 5 numbers is: %d\n, sum); }

The MIPS equivalent of the CProgram with Read and Sum Loops
.data A: .word 0 #Create space for the first word A[0] and initialize it to 0 .space 16 #Create space for 4 more words A[1] .. A[4] msg: .asciiz "The sum of 5 numbers is: " .text main: la $t0, A #Store in $t0 the address of A[0], the first of five words li $t1, 0 #Store in $t1, the initial value of loop variable li $t2, 4 #Store in $t2, the final value of loop variable li $t3, 0 #Initialize $t3 that increments by 4 with each word read loop: add $t4, $t0, $t3 #Put in $t4 the address of next word li $v0, 5 # Initialize $v0 for Read syscall sw $v0,($t4) # put the new integer read into the word location pointed by $t4 addi $t3, $t3, 4 #increment $t3 by 4 for calculation of next word address addi $t1, 1 ble $t1, $t2, loop (continued to next slide )

The MIPS equivalent of the CProgram with Read and Sum Loops
Continued from previous slide. li $t1, 0 #Do the same initialization for identical loop at addLoop li $t2, 4 li $t3, 0 li $s0, 0 addloop: add $t4, $t0, $t3 lw $t5, ($t4) #Read the integer at address in $t4 into $t5 add $s0, $s0, $t5 #Update the partial sum in $s0 by adding the new integer addi $t3, $t3, 4 addi $t1, 1 ble $t1, $t2, addloop li $v0, 4 la $a0, msg syscall #Make System Ready to print String #Load starting address (msg) of the string into $a0- argument register

li $v0, 1 #Make System Ready to print the integer (sum) move $a0, $s0 syscall

SPIM Subroutines

The stack is set up for you just use $sp. You can view the stack in the data window. main is called as a subroutine (have it return using jr $ra). For now, dont worry about details. But the next few pages do some excellent example of how stacks all work.

Why Are Stacks So Great?


Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS)

Why Are Stacks So Great?

MIPS Function Calling Conventions


SP fact: addiu $sp, $sp, -32 sw $ra, 20($sp) ... sw $s0, 4($sp) ... lw $ra, 20($sp) addiu $sp, $sp, 32 jr $ra

C-Program for a leaf-procedure


void main() { int e, f, g, h; scanf(%d, &e); scanf(%d, &f); scanf(%d, &g); scanf(%d, &h); result = leaf_procedure(e, f, g, h) printf (Result = %d\n, result); } Int leaf_procedure(int e, int f, int g, int h) { int res; int temp1, temp2; //Not required // (only for making it // close to MIPS code) temp1 = e + f; temp2 = g + h; res = temp1 temp2; return (res) }

Page 1: MIPS code for the main (calling program ) of leaf_procedure


.data e: .word 0 f: .word 0 g: .word 0 h: .word 0 .text main: la $t0, e #Load address of e into $t0 li $t1, 0 #set the loop iteration variable to 0 readLoop: sll $t2,$t1, 2 #Since each word is 4 bytes long, multiply loop variable by 4 add $t3, $t0, $t2 #First time in the loop, $t3 will have address of e li $v0, 5 #Prepare for read syscall sw $v0, ($t3) #Newly read value will go to e, f, g , or h depending upon # whether the loop variable $t1 contains 0, 1, 2, or 3, # that is, whether $t2 is 0, 4, 8, or 12. addi $t1, $t1, 1 xori $t2, $t1, 4 #You can destroy the original $t2 value because you are recomputing # it from $t1 at the beginning of the loop! bne $t2, $zero, readLoop #You haven't read all the 4 integers; go back to readloop.

Page 2: MIPS Code Continuation for the main of leaf_procedure


#reading complete. Make preparations for the leaf_procedure that computes (e+f)-(g+h) # by saving arguments in argument registers. lw $a0, 0($t0) #load e into $a0 lw $a1, 4($t0) #load f into $a1 lw $a2, 8($t0) #load g into $a2 lw $a3, 12($t0) #load h into $a3 jal leaf_procedure #this instruction stores the address of next instruction (the return #address, that is, the adress of the instruction at the print label) #in $ra and jumps onto the label leaf_procedure

print: move $t0, $v0 li $v0, 1 #Prepare for print move $a0, $t0 syscall j last

Page 3: MIPS Code for the leaf_procedure itself


leaf_procedure: addi $sp, $sp, -12 #Make space on the stack for 3 integers lw $t0, 0($sp) #save the contents of the registers you plan to temporarily use # in this procedure on stack so that original values can be restored # before returning to the calling program lw $t1, 4($sp) lw $s0, 8 ($sp) add $t0, $a0, $a1 #Add e and f in $a0 and $a1, respectively, and put in $to add $t1, $a2, $a3 #Add g and h in $a2 and $a3, respectively, and put in $t1 sub $s0, $t0, $t1 #subract g+h in $t1 from e+f in $to, and put it in $s0 #Make preparations for returning back to the calling procedure (main in this case) move $v0, $s0 #Put the computed value into return value register sw $t0, 0($sp) #Restore values on stack to the original resisters sw $t1, 4($sp) sw $s0, 12 ($sp) addi $sp, $sp, 12 #Update stack jr $ra #Jump to location pointed to by $ra (print, in our case) last: # the main program wil stop here as there is no valid instruction here.

MIPS Function Calling Conventions


main() { printf("The factorial of 10 is %d\n", fact(10)); } int fact (int n) { if (n <= 1) return(1); return (n * fact (n-1)); }

MIPS Function Calling Conventions


.text .global main main: subu $sp, $sp, 32 sw $ra,20($sp) li $a0,10 jal fact la $a0 LC move $a1,$v0 jal printf lw $ra,20($sp) addu $sp, $sp,32 jr $ra .data LC: .asciiz "The factorial of #stack frame size is 32 bytes #save return address # load argument (10) in $a0 #call fact #load string address in $a0 #load fact result in $a1 # call printf # restore $sp # pop the stack # exit() 10 is %d\n"

MIPS Function Calling Conventions


.text fact: subu sw sw subu bgtz li j L2: $a0 jal lw mul L1: lw addu jr $sp,$sp,8 $ra,8($sp) $a0,4($sp) $a0,$a0,1 $a0, L2 $v0, 1 L1 # new fact $a0,4($sp) $v0,$v0,$a0 $ra,8($sp) $sp,$sp,8 $ra # stack frame is 8 bytes #save return address # save argument(n) # compute n-1 # if n-1>0 (ie n>1) go to L2 # # return(1) argument (n-1) is already in # call fact # load n # fact(n-1)*n # restore $ra # pop the stack # return, result in $v0

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

MIPS Function Calling Conventions

Sample SPIM Programs (on the web)


multiply.s: multiplication subroutine based on repeated addition and a test program that calls it. http://babbage.clarku.edu/~jbreecher/comp_org/labs/multiply.s fact.s: computes factorials using the multiply subroutine. http://babbage.clarku.edu/~jbreecher/comp_org/labs/fact.s sort.s: the sorting program from the text. http://babbage.clarku.edu/~jbreecher/comp_org/labs/sort.s strcpy.s: the strcpy subroutine and test code. http://babbage.clarku.edu/~jbreecher/comp_org/labs/strcpy.s

Processor Design - 1

Adopted from notes by David A. Patterson, John Kubiatowicz, and others. Copyright 2001 University of California at Berkeley

90

Outline of Slides

Overview Design

a processor: step-by-step Requirements of the instruction set Components and clocking Assembling an adequate Data path Controlling the data path

91

Chapter 5.1 - Processor Design 1

The Big Picture: Where Are We Now? The five classic components of a computer
Processor Input Control Datapath Memory Output

Todays

topic: design a single cycle processor

machine design
92

Arithmetic technology
Chapter 5.1 - Processor Design 1

inst. set design

The CPU
Processor Datapath:

(CPU): the active part of the computer, which does all the work (data manipulation and decision-making) portion of the processor which contains hardware necessary to perform operations required by the processor (the brawn)
Control:

portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain)

93

Chapter 5.1 - Processor Design 1

Big Picture: The Performance Perspective


Performance

of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction

CPI

Processor

design (datapath and control) will determine: Clock cycle time Clock cycles per instruction we will do Today: Single cycle processor: Advantage: One clock cycle per instruction Disadvantage: long cycle time

Inst. Count

Cycle Time

What

94

Chapter 5.1 - Processor Design 1

How to Design a Processor: Step-by-step


1. Analyze instruction set datapath requirements the meaning of each instruction is given by the register transfers datapath must include storage element for ISA registers possibly more datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic
95 Chapter 5.1 - Processor Design 1

The MIPS Instruction Formats

All MIPS instructions are 32 bits long. The three instruction formats: 31 26 21 16 11 6 R-type op rs rt rd shamt

0 funct 6 bits 0 0

I-type J-type

31 6 bits 26 op 31 6 bits 26 op 6 bits

5 bits 21 rs 5 bits

5 bits 16 rt 5 bits

5 bits

5 bits immediate 16 bits

96

The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction Chapter 5.1 - Processor Design 1

target address 26 bits

Step 1a: The MIPS-lite Subset for Today

ADD and SUB

31

26 op 6 bits

21 rs 5 bits

16 rt 5 bits

11 rd 5 bits shamt 5 bits

6 funct 6 bits

- addU rd, rs, rt - subU rd, rs, rt

OR Immediate:

- ori

rt, rs, imm16

31

26 op 6 bits

21 rs 5 bits 21 rs 5 bits 21 rs 5 bits

16 rt 5 bits 16 rt 5 bits 16 rt 5 bits


Chapter 5.1 - Processor Design 1

0 immediate 16 bits 0 immediate 16 bits 0 immediate 16 bits

26 - lw rt, rs, imm16 31 op - sw rt, rs, imm16 6 bits

LOAD / STORE Word

BRANCH:

- beq rs, rt, imm16


97

31

26 op 6 bits

Logical Register Transfers


Register Transfer Logic gives the meaning of the instructions

All start by fetching the instruction op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 inst ADDU SUBU ORi LOAD STORE BEQ = MEM[ PC ]

Register Transfers R[rd] R[rs] + R[rt]; PC PC + 4 R[rd] R[rs] R[rt]; PC PC + 4 R[rt] R[rs] | zero_ext(Imm16); PC PC + 4 PC PC + 4 PC PC + 4
Chapter 5.1 - Processor Design 1

R[rt] MEM[ R[rs] + sign_ext(Imm16)]; MEM[ R[rs] + sign_ext(Imm16) ] R[rt];

98

if ( R[rs] == R[rt] ) then PC PC + 4 + sign_ext(Imm16)] || 00 else PC PC + 4

Step 1: Requirements of the Instruction Set


Memory

instruction & data

Registers

(32 x 32)

read RS read RT Write RT or RD

PC Extender Add
99

and Sub register or extended immediate 4 or extended immediate to PC


Chapter 5.1 - Processor Design 1

Add

Step 2: Components of the Datapath


Combinational Storage

Elements

Elements

Clocking

methodology

100

Chapter 5.1 - Processor Design 1

Combinational Logic Elements (Basic Building Blocks)


CarryIn
Adder

OP Sum Carry A B 32 32

A Y B

32 32

Adder

32

ALU

32

Result

MUX

Select
ALU

A B

32 32

MUX

32
Chapter 5.1 - Processor Design 1

101

Storage Element: Register File

Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW

RWRA RB Write Enable 5 5 5 busW 32 Clk 32 32-bit Registers busB 32 busA 32

Register is selected by: RA (number) selects the register to put on busA (data) RB (number) selects the register to put on busB (data) RW (number) selects the register to be written via busW (data) when Write Enable is 1

Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: RA or RB valid busA or busB valid after access time.

Chapter 5.1 - Processor Design 1

102

Storage Element: Idealized Memory

Memory (idealized) One input bus: Data In One output bus: Data Out

Write Enable Address DataOut 32

Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus

Data In 32 Clk

Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid Data Out valid after access time.
Chapter 5.1 - Processor Design 1

103

Memory Hierarchy (Ch. 7)


Want

a single main memory, both large and fast 1: large memories are slow while fast memories are small

Problem

Example:
Solution:

MIPS registers (fast, but few)

memory

mix of memories provides illusion of single large, fast

Cache:

a small, fast memory; Holds a copy of part of a larger, slower memory Imem, Dmem are really separate caches memories
104 Chapter 5.1 - Processor Design 1

Digression: Sequential Logic, Clocking


Combinational

circuits: no memory

Output
Sequential

depends only on the inputs

circuits: have memory

How

to ensure memory element is updated neither too soon, nor too late? Recall hardware multiplier

105

Product/multiplier register is the writable memory element Gate propagation delay means ALU result takes time to stabilize; Delay varies with inputs Must wait until result stable before write to product/multiplier register else get garbage Chapter 5.1 - Processor Design 1 How to be certain ALU output is stable?

Clock:

free running signal with fixed cycle time (clock period)

Adding a Clock to a Circuit

high (1) low (0) period rising edgefalling edge

Clock determines when to write memory element

level-triggered - store clock high (low) edge-triggered - store only on clock edge
We will use negative (falling) edge-triggered methodology
106 Chapter 5.1 - Processor Design 1

Role of Clock in MIPS Processors


single-cycle

machine: does everything in one clock cycle

instruction

execution = up to 5 steps must complete 5th step before cycle ends


rising clock edge
clock signal

falling clock edge

datapath instruction execution stable step 1/step 2/step 3/step 4/step 5 register(s) written
107 Chapter 5.1 - Processor Design 1

SR-Latches
SR-latch with NOR Gates S = 1 and R = 1 not allowed

108

Symbol for SR-Latch with NOR gates


Chapter 5.1 - Processor Design 1

SR-Latches
SR-latch with NAND Gates, also known as SR -latch S = 0 and R = 0 not allowed

Chapter 5.1 - Processor Design 1

109

Symbol for SR-Latch with NAND gates

SR-Latches with Control Input


SR-latch

with NAND Gates and control input C

C = 0, no change of state;
110

Chapter 5.1 - Processor Design 1

C = 1, change is allowed;

If S = 1 and R = 1, Q and Q are Indetermined

D-Latches

D-latch based on SR-Latch with NAND Gates and control input C

C = 0, no change of state; Q (t + t ) = Q (t ) C = 1, change is allowed; Q (t + t ) = D (t ) No Indeterminate Output


111 Chapter 5.1 - Processor Design 1

Negative Edge-Triggered MasterSlave D-Flip-Flop

Symbol for D-Flip Flop.


Chapter 5.1 - Processor Design 1

Arrowhead (>) indicates an edgetriggered sequential circuit.


112

Bubble means that triggering is effective during the HighLow C transition

Clocking Methodology for the Entire Datapath


Clk Setup Hold Dont Care . . . . . . . . . . . . Setup Hold

113

Design/synthesis based on pulsed-sequential circuits All combinational inputs remain at constant levels and only clock signal appears as a pulse with a fixed period Tcc All storage elements are clocked by the same clock edge Cycle time T = CLK-to-q + longest delay path + Setup time + cc clock skew Chapter 5.1 - Processor Design 1 (CLK-to-q + shortest delay path - clock skew) > hold time

Step 3: Assemble Data Path Meeting Requirements

Register

Transfer Requirements Datapath Assembly Fetch

Instruction Read

Operands and Execute Operation

114

Chapter 5.1 - Processor Design 1

Stages of the Datapath (1/6)


Problem: a single, atomic block which executes an instruction (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient Solution: break up the process of executing an instruction into stages, and then connect the stages to create the whole datapath

Smaller stages are easier to design Easy to optimize (change) one stage without touching the others

115

Chapter 5.1 - Processor Design 1

Stages of the Datapath (2/6)


There is a wide variety of MIPS instructions: so what general steps do they have in common? Stage 1: instruction fetch

No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cachememory hierarchy) Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4)
Chapter 5.1 - Processor Design 1

116

Stages of the Datapath (3/6)


Stage 2: Instruction Decode

upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) first, read the Opcode to determine instruction type and field lengths second, read in data from all necessary registers
-for

add, read two registers -for addi, read one register


-for

jal, no reads necessary

117

Chapter 5.1 - Processor Design 1

Stages of the Datapath (4/6)


Stage
the

3: ALU (Arithmetic-Logic Unit)

real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt) what about loads and stores?
-lw

$t0, 40($t1) -the address we are accessing in memory = the value in $t1 + the value 40 -so we do this addition in this stage
118 Chapter 5.1 - Processor Design 1

Stages of the Datapath (5/6)


Stage

4: Memory Access

actually only the load and store instructions do anything during this stage; the others remain idle since these instructions have a unique step, we need this extra stage to account for them as a result of the cache system, this stage is expected to be just as fast (on average) as the others
Chapter 5.1 - Processor Design 1

119

Stages of the Datapath (6/6)


Stage

5: Register Write

most instructions write the result of some computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps?
-dont

write anything into a register at the end -these remain idle during this fifth stage
Chapter 5.1 - Processor Design 1

120

Generic Steps: Datapath


instruction memory
ALU

+4

imm

1. Instruction 2. Decode/ 5. Reg. 3. Execute. Memory 4 Fetch Register Write Read

121

Data memory
Chapter 5.1 - Processor Design 1

rd rs rt

registers

PC

Datapath Walkthroughs (1/3)


add

$r3, $r1, $r2

# r3 = r1+r2

Stage 1: fetch this instruction, incr. PC ; Stage 2: decode to find its an add, then read registers $r1 and $r2 ; Stage 3: add the two values retrieved in Stage 2; Stage 4: idle (nothing to write to memory) ; Stage 5: write result of Stage 3 into register $r3 ;
Chapter 5.1 - Processor Design 1

122

Example: add Instruction


instruction memory Data memory
3 1 2 imm

registers

reg[1] reg[1]+reg[2] reg[2] ALU

PC

+4

add r3, r1, r2

123

Chapter 5.1 - Processor Design 1

Datapath Walkthroughs (2/3)


slti

$r3, $r1, 17

Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its an slti, then read register $r1 Stage 3: compare value retrieved in Stage 2 with the integer 17 Stage 4: go idle Stage 5: write the result of Stage 3 in register $r3

124

Chapter 5.1 - Processor Design 1

Example: slti Instruction


instruction memory
x 1 3 imm

registers

reg[1] ALU

PC

+4

17

slti r3, r1, 17

125

Chapter 5.1 - Processor Design 1

Data memory

reg[1]-17

Datapath Walkthroughs (3/3)


sw

$r3, 17($r1)

Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its a sw, then read registers $r1 and $r3 Stage 3: add 17 to value in register $41 (retrieved in Stage 2) Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3 Stage 5: go idle (nothing to write into a register)

126

Chapter 5.1 - Processor Design 1

Example: sw Instruction
instruction memory Data memory MEM[r1+17]<=r3
x 1 3 imm

registers

reg[1] reg[1]+17 reg[3] ALU

PC

+4

17

SW r3, 17(r1)

127

Chapter 5.1 - Processor Design 1

Why Five Stages? (1/2)


Could we have a different number of stages? Yes, and other architectures do So why does MIPS have five if instructions tend to go idle for at least one stage? There is one instruction that uses all five stages: the load

128

Chapter 5.1 - Processor Design 1

Why Five Stages? (2/2)


lw

$r3, 17($r1)

Stage 1: fetch this instruction, inc. PC Stage 2: decode to find its a lw, then read register $r1 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) Stage 4: read value from memory address compute in Stage 3 Stage 5: write value found in Stage 4 into register $r3

129

Chapter 5.1 - Processor Design 1

Example: lw Instruction
instruction memory
x 1 3 imm

registers

reg[1]

Data memory

+4

17

LW r3, 17(r1)

130

Chapter 5.1 - Processor Design 1

MEM[r1+17]

reg[1]+17 ALU

PC

Datapath Summary
The A

datapath based on data transfers required to perform instructions

controller causes the right transfers to happen

instruction memory

ALU

+4

imm

opcode, funct

Controller
131 Chapter 5.1 - Processor Design 1

Data memory

rd rs rt

registers

PC

Overview of the Instruction Fetch Unit

The common operations Fetch the Instruction: mem[PC] Update the program counter: Sequential Code: PC PC + 4 Branch and Jump: PC something else

Clk

PC Next Address Logic Address Instruction Memory Instruction Word 32


Chapter 5.1 - Processor Design 1

132

Add & Subtract


R[rd] R[rs] op R[rt]; Example: Ra, Rb, and Rw come from instructions rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction

addu rd, rs, rt


16 11 6 0

31 op 6 bits

26

21

rs rt rd shamt funct 5 bits 5 bits 5 bits 5 bits 6 bits Rd Rs Rt ALU RegWr 5 5 5 ctr busA Rw Ra Rb busW 32 Result 32 32-bit 32 3 Registers busB Clk 2 32

ALU

133

Chapter 5.1 - Processor Design 1

Register-Register Timing: One complete cycle


Clk PC Old Value Rs, Rt, Rd, Op, Func ALUctr RegWr busA, B busW

Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value

134

Rd Rs Rt RegWr 5 5 5 Rw Ra Rb busW 32 32-bit 32 Clk Registers

busA 32 busB 32

ALUct r 3 2 Result

Register Write Occurs Here


Chapter 5.1 - Processor Design 1

ALU

Logical Operations With Immediate


31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 16 15 11 immediate 16 bits rd? 0

31

immediate 0000000000000000 16 bits 16 bits R[rt] R[rs] op ZeroExt[ imm16 ] Rd Rt RegDst Mux Rs Rt? ALUct RegWr 5 5 5 r busA Rw Ra Rb busW 32 Result 32 32-bit 32 32 Registers busB Clk 32

ALU

Mux

ZeroExt

135

imm16

16

32

ALUSrc

Chapter 5.1 - Processor Design 1

Load Operations
R[rt] Mem[R[rs] + SignExt[imm16]]; Example: lw rt, rs, imm16
31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 11 immediate 16 bits rd 0

Rd Rt RegDst Mux Rs Rt? RegWr 5 5 5 busW 32 Clk

busA Rw Ra Rb 32 32 32-bit Registers busB 32

ALU ctr

W_Src 32 MemWr

ALU

M ux

imm16 16
136

WrEnAdr 32 ?? Data In Data 32 32 Clk Memory ALUSrc

Mux

Extender

Chapter 5.1 - Processor Design 1

ExtOp

Store Operations
Mem[ R[rs] + SignExt[imm16] R[rt] ]; Example: sw rt, rs, imm16
31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits immediate 16 bits ALU ctr MemWr W_Src 0

Rd Rt RegDst Mux Rs Rt RegWr5 5 5 busW 32 Clk

busA Rw Ra Rb 32 32 32-bit Registers busB 32

ALU

32

M ux

Mux

imm16
137

16

32

Data In 32 Clk ALUSrc

WrEn Adr 32 Data Memory

Extender

Chapter 5.1 - Processor Design 1

ExtOp

The Branch Instruction


31 26 op 6 bits
beq

21 rs 5 bits

16 rt 5 bits immediate 16 bits

mem[PC]

rs, rt, imm16

Fetch the instruction from memory

Equal R[rs] == R[rt] Calculate the branch condition if (Equal) Calculate the next instructions address PC PC + 4 + ( SignExt(imm16) 4 ) else PC PC + 4
Chapter 5.1 - Processor Design 1

138

Datapath for Branch Operations


31 26 op 6 bits
beq

21 rs 5 bits

16 rt 5 bits immediate 16 bits

rs, rt, imm16

Datapath generates condition (equal)

Inst Address 4 nPC_sel RegWr 5 busW Clk Rs Rt 5 5 busA Rw Ra Rb 32 32 32-bit Registers busB 32

Cond

imm16
139

Clk

Chapter 5.1 - Processor Design 1

Equal?

Adder Adder PC Ext

00 Mux PC

32

Summary: A Single Cycle Datapath


Inst Memory Adr nPC_sel 4 Instruction<31:0> Rs RegDst Rd 1 RegWr 5

<21:25>

Rt Rd Rt 0

busW 32 Clk

Rs Rt 5 5 Rw Ra Rb busA 32 32 32-bit 0 Registers busB 32

<16:20>

<11:15>

Imm16 Equal ALUc MemWr MemtoReg tr =

<0:15>

Adder Adder

00

ALU

32

Mux

PC

Mux

Mux

Clk

imm16 16

32

32 WrEn Adr Data In Data Clk Memory

imm16

Extender

PC Ext

140

ExtOp ALUSrc

Chapter 5.1 - Processor Design 1

An Abstract View of the Critical Path

Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid Output valid after access time.

Next Address

Ideal Instruction Instruction Memory Rd Rs Rt 5 5 5 Instruction Address 32 Rw Ra Rb 32 32-bit Registers

Imm 1 6 A 32 B 32

Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew

32

PC

Data Address Data In

Ideal Data Memory


Chapter 5.1 - Processor Design 1

ALU

141

Clk

Clk

Clk

An Abstract View of the Implementation


Ideal Instruction Memory

Control
Instruction Rd Rs Rt 5 5 5 Rw Ra Rb 32 32-bit Registers Clk

Control Signals Conditions

Next Address

Instruction Address

A 32 B 32

PC

32

32

Data Address Data In Clk

Ideal Data Memory

Data Out

ALU

Clk

Datapath
142 Chapter 5.1 - Processor Design 1

Steps 4 & 5: Implement the control

In The Next Section

143

Chapter 5.1 - Processor Design 1

Summary: MIPS-lite Implementations


single-cycle:

executed

uses single l-o-n-g clock cycle for each instruction

Easy

to understand, but not practical

slower

than implementation that allows instructions to take different numbers of clock cycles
fast instructions: (beq) fewer clock cycles slow instructions (mult?): more cycles

multicycle,
Next
144

pipelined implementations later

time, finish the single-cycle implementation


Chapter 5.1 - Processor Design 1

Summary

5 steps to design a processor 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic MIPS makes it easier Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates Single cycle datapath: CPI = 1, TCC long Next time: implementing control
Chapter 5.1 - Processor Design 1

145

Processor Design - 2

Adopted from notes by David A. Patterson, John Kubiatowicz, and others. Copyright 2001 University of California at Berkeley

146

Summary: A Single Cycle Datapath


Inst Memory Adr Instruction<31:0>

Rs nPC_sel RegDst

Rd Rt 1 0 5 Rs 5 Rt

<21:25>

Rt

<16:20>
Rd

<11:15>

Imm16 Equal ALUctr MemWr MemtoReg

<0:15>

RegWr 5

00

Adder Mux Adder PC Ext


147

busW 32 Clk

busA Rw Ra Rb 32 32-bit Registers busB 32

32 0

ALU

32 WrEn Adr Data Memory

imm16

Clk

PC

Mux

Mux

1 32

imm16

32 Data In Clk

Extender

16

ExtOp

ALUSrc

Chapter 5.2 - Processor Design 2

An Abstract View of the Critical Register file and ideal memory: Path

The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid => Output valid after access time.
Ideal Instruction Memory Instruction Address

Instruction Rd Rs 5 5 Rt 5 Imm 16 A

Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew 32 Data Address Data In Clk
Chapter 5.2 - Processor Design 2

Next Address

32

Rw Ra Rb 32 32-bit Registers

32 B

Ideal Data Memory

ALU

PC

Clk

148

Clk

32

The Big Picture: Where are We Now? The Five Classic Components of a Computer
Processor Input Control Memory Datapath

Output

Next

Topic: Designing the Control for the Single Cycle Datapath


Chapter 5.2 - Processor Design 2

149

An Abstract View of the Implementation


Ideal Instruction Memory Instruction Address

Control
Instruction Rd Rs 5 5 Rt 5 A 32 Data Address Data In Clk Data Out Control Signals Conditions

Next Address

32

Rw Ra Rb 32 32-bit Registers

32 B

Ideal Data Memory

ALU

PC

Clk

Clk

32

Datapath

150

Chapter 5.2 - Processor Design 2

Recap: A Single Cycle Datapath


Rs, We

Rt, Rd and Imed16 hardwired into datapath from Fetch Unit have everything except control signals (underline)
Instruction<31:0> nPC_sel Rd Rt Rs Rt busA 32 0 Instruction Fetch Unit

Todays lecture will show you how to generate the control signals
<21:25> <16:20> <11:15> <0:15>

RegDst

1 Mux 0 RegWr 5 5 5

Clk

ALUctr

Rt Zero

Rs

Rd

Imm16 MemtoReg 0

busW 32 Clk

Rw Ra Rb 32 32-bit Registers busB 32

MemWr

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32

imm16

Data In 32 Clk

16

ALUSrc
151

ExtOp

Chapter 5.2 - Processor Design 2

Recap: Meaning of the Control Signals


0 PC PC + 4 1 PC PC + 4 + SignExt(Im16) || 00 Later in lecture: higher-level connection between mux and branch cond
nPC_sel:

nPC_sel Inst Memory

Adr

Adder Mux Adder

imm16

Clk
Chapter 5.2 - Processor Design 2

152

00 PC

PC Ext

Recap: Meaning of the Control Signals ExtOp: zero, sign


0 regB; 1 immed add, sub, or
Equal ALUctr MemWr Rd Rt 0 1

ALUsrc: ALUctr:

MemWr: 1 write memory MemtoReg: 0 ALU; 1 Mem RegDst: 0 rt; 1 rd RegWr: 1 write register

RegDst

MemtoReg

RegWr 5 5 Rs 5 Rt Rw Ra Rb busA busW 32 32-bit 32 busB Registers 32 Clk

32 0 1

ALU

32 WrEn Adr Data Memory

0 1

Mux

Mux

Extender

imm16

16

32

32 Data In Clk

153

ExtOp

ALUSrc

Chapter 5.2 - Processor Design 2

31 op 6 bits
add

26

The add Instruction


21 16 rs 5 bits rt 5 bits rd 5 bits

11 shamt 5 bits

6 funct 6 bits

rd, rs, rt

mem[PC] Fetch the instruction from memory R[rd] R[rs] + R[rt] The actual operation PC PC + 4 Calculate the next instructions address

154

Chapter 5.2 - Processor Design 2

Fetch Unit at the Beginning of add


Inst Memory Adr nPC_sel Instruction<31:0>
Fetch

the instruction from Instruction memory: Instruction mem[PC] (This is the same for all instructions)

Adder Adder

1 Clk
Chapter 5.2 - Processor Design 2

155

imm16

00 PC

Mux

31

The Single Cycle 26 21 16 11 Datapathrtduring add6 op rs rd shamt


nPC_sel= +4 Instruction Fetch Unit

0 funct

R[rd]

R[rs] + R[rt]
Instruction<31:0>

<21:25>

<16:20>

<11:15>

<0:15>

RegDst = 1

Rd

Rt Rs Rt

1 Mux 0 5 5 5

Clk

ALUctr = Add

Rt Zero

Rs

Rd

Imm16 MemtoReg = 0

RegWr = 1 busW 32 Clk

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32

MemWr = 0 0

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32

imm16

Data In 32 Clk

16

ALUSrc = 0
156

ExtOp = x

Chapter 5.2 - Processor Design 2

Instruction Fetch Unit at the End of PC PC + 4 add


This

is the same for all instructions except: Branch and Jump


Inst Memory Adr nPC_sel

Instruction<31:0>

1 Clk
Chapter 5.2 - Processor Design 2

157

imm16

00 PC

Adder Adder

Mux

The Single Cycle Datapath during 26 21 0 Or31Immediate 16


op rs rt immediate
R[rt]

R[rs] or ZeroExt(Imm16)
nPC_sel = Instruction<31:0> Instruction Fetch Unit

<21:25>

<16:20>

<11:15>

<0:15>

RegDst = RegWr = busW 32 Clk

Rd

Rt Rs Rt

1 Mux 0 5 5 5

Clk

ALUctr =

Rt Zero

Rs

Rd

Imm16 MemtoReg =

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32

MemWr = 0

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32

imm16

Data In 32 Clk

16

ALUSrc =
158

ExtOp =

Chapter 5.2 - Processor Design 2

The Single Cycle Datapath during 26 21 0 Or31Immediate 16


op rs rt immediate
R[rt]

R[rs]

or

ZeroExt(Imm16)
nPC_sel= +4 Instruction<31:0> Instruction Fetch Unit

<21:25>

<16:20>

<11:15>

<0:15>

RegDst = 0

Rd

Rt Rs Rt

1 Mux 0 5 5

Clk

RegWr = 1 5 busW 32 Clk

ALUctr = Or

Rt Zero

Rs

Rd

Imm16 MemtoReg = 0

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32

MemWr = 0 0

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32

imm16

Data In 32 Clk

16

ALUSrc = 1
159

ExtOp = 0

Chapter 5.2 - Processor Design 2

31

The Single Cycle 26 21 16 Datapath during Load op rs rt immediate


nPC_sel= +4 Instruction<31:0> Instruction Fetch Unit

R[rt]

Data Memory {R[rs] + SignExt[imm16]}


<21:25> <16:20> <11:15>

<0:15>

RegDst = 0

Rd

Rt Rs Rt

1 Mux 0 5 5

Clk

RegWr = 1 5 busW 32 Clk

ALUctr = Add

Rt Zero

Rs

Rd

Imm16 MemtoReg = 1

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32

MemWr = 0 0

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32

imm16

Data In 32 Clk

16

ALUSrc = 1 ExtOp = 1
160

Chapter 5.2 - Processor Design 2

The Single Cycle Datapath 31 26 21 16 0 during rsStore op rt immediate


Data

Memory {R[rs] + SignExt[imm16]} R[rt]


nPC_sel = Instruction<31:0> Instruction Fetch Unit

<21:25>

<16:20>

<11:15>

<0:15>

RegDst =

Rd

Rt Rs Rt

1 Mux 0 5 5 5

Clk

RegWr = busW 32 Clk

ALUctr = busA 32 0

Rt Zero

Rs

Rd

Imm16 MemtoReg = 0

Rw Ra Rb 32 32-bit Registers busB 32

MemWr =

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32

imm16

Data In 32 Clk

16

ALUSrc = ExtOp =
161

Chapter 5.2 - Processor Design 2

31

The Single Cycle 26 21 16 Datapath during Store op rs rt immediate


nPC_sel= +4 Instruction Fetch Unit

Data

Memory {R[rs] + SignExt[imm16]} R[rt]


Instruction<31:0>

<21:25>

<16:20>

<11:15>

<0:15>

RegDst = x

Rd

Rt Rs Rt

1 Mux 0 5 5 5

Clk

RegWr = 0 busW 32 Clk

ALUctr = Add

Rt Zero

Rs

Rd

Imm16 MemtoReg = x

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32

MemWr = 1 0

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32

imm16

Data In 32 Clk

16

ALUSrc = 1 ExtOp = 1
162

Chapter 5.2 - Processor Design 2

The Single Cycle Datapath 31 26 21 16 0 duringrsBranch op rt immediate


if

(R[rs] R[rt] == 0) then Zero 1; else Zero 0


nPC_sel= Br Instruction<31:0> Instruction Fetch Unit

<21:25>

<16:20>

<11:15>

<0:15>

RegDst = x

Rd

Rt Rs Rt

1 Mux 0 5 5 5

Clk

RegWr = 0 busW 32 Clk

ALUctr =Sub

Rt Zero

Rs

Rd

Imm16 MemtoReg = x

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32

MemWr = 0 0

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32

imm16

Data In 32 Clk

16

ALUSrc = 0
163

ExtOp = x

Chapter 5.2 - Processor Design 2

Instruction Fetch Unit at the0 31 26 21 16 op rs rt immediate End of Branch


if

(Zero == 1) then PC = PC + 4 + SignExt(imm16) 4 ; else PC = PC + 4


Inst Memory Instruction<31:0>

nPC_sel Zero

Adr

What is encoding of nPC_sel?

Direct MUX select? Branch / not branch


4

Lets choose second option


Adder Adder 00
0

1 Clk

n C se P _ l 0 1 1

ze ? ro x 0 1

MX U 0 0 1
Chapter 5.2 - Processor Design 2

Mux

PC

164

imm16

Step 4: Given Datapath: Instruction<31:0> RTL Control


Inst Memory Adr Op

Fun

nPC_sel

RegWr RegDst ExtOp ALUSrc ALUctr

<21:25>
Rt

<21:25>
Rs

Control

DATA PATH
Chapter 5.2 - Processor Design 2

<16:20>
Rd

<11:15>

Imm16

<0:15>
MemWr MemtoReg Zero

165

inst ADD

A Summary of Register Transfer Control Signals


R[rd] R[rs] + R[rt]; R[rd] R[rs] R[rt]; R[rt] R[rs] + zero_ext(Imm16); R[rt] MEM[ R[rs] + sign_ext(Imm16)]; PC PC + 4 PC PC + 4 PC PC + 4 PC PC + 4 ALUsrc = RegB, ALUctr = add, RegDst = rd, RegWr, nPC_sel = +4

SUB

ALUsrc = RegB, ALUctr = sub, RegDst = rd, RegWr, nPC_sel = +4 ORi

ALUsrc = Im, Extop = Z, ALUctr = or, RegDst = rt, RegWr, nPC_sel = +4 LOAD

ALUsrc = Im, Extop = Sn, ALUctr = add, MemtoReg, RegDst = rt, RegWr, nPC_sel = +4 STORE MEM[ R[rs] + sign_ext(Imm16) ] R[rs]; PC PC + 4

ALUsrc = Im, Extop = Sn, ALUctr = add, MemWr, nPC_sel = +4 BEQ if ( R[rs] == R[rt] ) then PC PC + sign_ext(Imm16)] || 00 else PC PC + 4 nPC_sel = Br, ALUctr = sub

166

Chapter 5.2 - Processor Design 2

See Appendix A

A Summary of Control Signals 10 0010 func 10 0000 We Dont Care :-)


op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 add sub ori lw sw beq jump 1 1 0 0 x x x 0 0 1 0 0 0 x Add 21 rs rs rt rt target address 0 0 1 0 0 0 x Subtract 16 rd 1 0 1 0 0 0 0 Or 1 1 1 0 0 0 1 Add 11 shamt immediate 1 x 0 1 0 0 1 Add 6 funct 0 x 0 0 1 0 x Subtract 0 add, sub ori, lw, sw, beq jump
Chapter 5.2 - Processor Design 2

RegDst ALUSrc MemtoReg RegWrite

x x 0 0 0 1 x xxx

MemWrite nPCsel Jump ExtOp ALUctr<2:0> 31 R-type I-type J-type


167

26 op op op

The Concept of op Local Decoding 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop<N:0> R-type 1 0 0 1 0 0 0 x R-type ori 0 1 0 1 0 0 0 0 Or func op 6 Main Control 6 ALUop N ALU Control (Local) lw 0 1 1 1 0 0 0 1 Add sw x 1 x 0 1 0 0 1 Add jump x x x 0 0 0 1 x Subtract xxx beq x 0 x 0 0 1 0 x

ALUctr 3

168

Chapter 5.2 - Processor Design 2

ALU

op 6

Main Control

The Encoding func of ALUop ALU


6 ALUop N Control (Local)

ALUctr 3

In

this exercise, ALUop has to be 2 bits wide to represent: R-type instructions (1) I-type instructions that require the ALU to perform: (2) Or, (3) Add, and (4) Subtract To implement the full MIPS ISA, ALUop has to be 3 bits to represent: R-type instructions (1) I-type instructions that require the ALU to perform: (2) Or, (3) Add, (4) Subtract, and (5) And (Example: andi)

ALUop (Symbolic) ALUop<2:0>


169

R-type R-type 1 00

ori Or 0 10

lw Add 0 00

sw Add 0 00

jump Subtract xxx xxx 0 01


Chapter 5.2 - Processor Design 2

beq

op 6

The Decoding of the func func Field ALU ALUctr 6 Main


Control ALUop N Control (Local) lw Add 0 00 11 rd P. 286 text: shamt 3 sw Add 0 00 beq Subtract 0 01 6 funct jump xxx xxx 0 R-type ori Or 0 10 16 rt R-type 1 00 21 rs

ALUop (Symbolic) ALUop<2:0> 31 R-type op 26

funct<5:0> 10 0000 10 0010 10 0100 10 0101 10 1010

Instruction Operation add subtract and or set-on-less-than

ALUctr

ALUctr<2:0> 000 001

ALU Operation And Or Add Subtract Set-on-less-than

ALU

010 110 111

170

Chapter 5.2 - Processor Design 2

The Truth Table for ALUctr


ALUop (Symbolic) ALUop<2:0> R-type R-type 1 00 ori Or 0 10 lw Add 0 00 sw Add 0 00 beq Subtract 0 01

funct<3:0> 0000 0010 0100 0101 1010

Instruction Op. add subtract and or set-on-less-than

ALUop bit<2> bit<1> bit<0> 0 0 0 0 x 1 0 1 x 1 x x 1 x x 1 x x 1 x x 1 x x


171

func bit<3> bit<2> bit<1> bit<0> x x x x x x x x x x x x 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0

ALU Operation Add Subtract Or Add Subtract And Or Set on <

ALUctr bit<2> bit<1> bit<0> 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 1 1 1


Chapter 5.2 - Processor Design 2

ALUop 0 1 1 x x x

The Logic Equation for ALUctr<2> func


bit<3> bit<2> bit<1> bit<0> x 0 1 x 0 0 x 1 1 x 0 0 ALUctr<2> 1 1 1 1 x x

bit<2> bit<1> bit<0>

This makes func<3> a dont care

ALUctr<2>

= !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0>

172

Chapter 5.2 - Processor Design 2

The Logic Equation for ALUctr<1> ALUop func


bit<2> bit<1> bit<0> 0 0 1 1 1 0 x x x x 0 1 x x x bit<3> bit<2> bit<1> bit<0> x x 0 0 1 x x 0 0 0 x x 0 1 1 x x 0 0 0 ALUctr<1> 1 1 1 1 1

ALUctr<1>

= !ALUop<2> & !ALUop<0> + ALUop<2> & !func<2> & !func<0>

173

Chapter 5.2 - Processor Design 2

The Logic Equation ALUop for ALUctr<0> func


bit<2> 0 1 1 bit<1> 1 x x bit<0> x x x bit<3> x 0 1 bit<2> x 1 0 bit<1> x 0 1 bit<0> x 1 0 ALUctr<0> 1 1 1

ALUctr<0>

= !ALUop<2> & ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> & func<0> + ALUop<2> & func<3> & !func<2> & func<1> & !func<0>

174

Chapter 5.2 - Processor Design 2

The ALU func Control Block


6 ALUop 3
ALUctr<2> ALUctr<1> ALUctr<0>

ALU Control (Local)

ALUctr 3

!ALUop<2> ALUop<2> = !ALUop<2> ALUop<2> = !ALUop<2> + ALUop<2> +

& ALUop<0> + & !func<2> & func<1> & & !ALUop<0> + & !func<2> & !func<0> & ALUop<0> & !func<3> & func<2> & !func<1> & func<0> ALUop<2> & func<3> & !func<2> & func<1> & !func<0>

!func<0>

175

Chapter 5.2 - Processor Design 2

Step 5: Logic for Each Control Signal


nPC_sel

<= if (OP == BEQ) then Br else +4 ALUsrc <= if (OP == Rtype) then regB else immed ALUctr <= if (OP == Rtype) then funct elseif (OP == ORi) then OR elseif (OP == BEQ) then sub else add ExtOp <= _____________ MemWr <= _____________ MemtoReg <= _____________ RegWr: <=_____________ RegDst: <= _____________
176 Chapter 5.2 - Processor Design 2

nPC_sel

<= if (OP == BEQ) then Br else +4 ALUsrc <= if (OP == Rtype) then regB else immed ALUctr <= if (OP == Rtype) then funct elseif (OP == ORi) then OR elseif (OP == BEQ) then sub else add ExtOp <= if (OP == ORi) then zero else sign MemWr <= (OP == Store) MemtoReg <= (OP == Load) RegWr: <= if ((OP == Store) || (OP == BEQ)) then 0 else 1 RegDst: <= if ((OP == Load) || (OP == ORi)) then 0 else 1

Step 5: Logic for each control signal

177

Chapter 5.2 - Processor Design 2

op 6

The Truth Table for the RegDst func Main ALUSrc Control ALU
Main Control

ALUop 3 00 0000 R-type 1 0 0 1 0 0 0 x R-type 1 0 0

Control (Local)

ALUctr 3

op RegDst ALUSrc MemtoReg RegWrite MemWrite nPC_sel Jump ExtOp ALUop (Symbolic) ALUop <2> ALUop <1> ALUop <0>
178

00 1101 10 0011 10 1011 00 0100 00 0010 ori lw sw beq jump 0 0 x x x 1 1 1 0 x 0 1 x x x 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 x x Or Add Add Subtract xxx 0 0 0 x 0 1 0 0 x 0 0 0 0 x 1

Chapter 5.2 - Processor Design 2

A Real MIPS Datapath (CNS T0)

179

Chapter 5.2 - Processor Design 2

op 6 Instr<31:26>

Main Control

Summary: A Single Cycle ALUop ALU Processor 3 ALUctr RegDst func


Control ALUSrc Instr<5:0> 6 nPC_sel Rd Rt Rs Rt busA 32 0 Clk 5 Instruction<31:0> Instruction Fetch Unit 3

<21:25>

<16:20>

<11:15>

<0:15>

RegDst

1 Mux 0 RegWr 5 5

ALUctr

Rt Zero

Rs

Rd

Imm16 MemtoReg 0

busW 32 Clk

Rw Ra Rb 32 32-bit Registers busB 32

MemWr

ALU

32 WrEn Adr Data Memory 32

Mux

Mux

Extender

1 32 ALUSrc

imm16 Instr<15:0>

Data In 32 Clk

16

180

ExtOp

Chapter 5.2 - Processor Design 2

Recap: An Abstract View of the Critical Path file and Register(Load) ideal memory:
The

CLK input is a factor ONLY during write operation During read operation, behave as combinational logic:

Address valid Output valid after access time.


Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew 32 Data Address Data In Clk
Chapter 5.2 - Processor Design 2

Ideal Instruction Memory Instruction Address Rd 5

Instruction Rs 5 Rt 5 Imm 16 A

Next Address

32

Rw Ra

Rb

32 32-bit Registers

32 B

Ideal Data Memory

ALU

PC

Clk
181

Clk

32

Clk PC Old Value Rs, Rt, Rd, Op, Func

Worst Case Timing (Load)


Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Old Value Instruction Memoey Access Time New Value Delay through Control Logic New Value New Value New Value New Value New Value

ALUct r ExtOp ALUSrc MemtoReg RegWr busA busB Addres s busW 182

Register Write Occurs

Old Delay Value Extender & Mux through Old Value Old Value Old Value

Register File Access Time New Value New Value ALU Delay New Value

Data Memory Access Time New Chapter 5.2 - Processor Design 2

Drawback of this Single Cycle


Processor
Long

cycle time: Cycle time must be long enough for the load instruction: PCs Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew Cycle time for load is much longer than needed for all other instructions
183 Chapter 5.2 - Processor Design 2

Summar Single cycle datapath: CPI = 1, CCT long y


5 steps to design a processor

1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic
Processor Control Memory Output Input

Control is the hard part MIPS makes control easier


Datapath Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates

184

Chapter 5.2 - Processor Design 2

You might also like