You are on page 1of 5

14:332:331ComputerArchitecture

Quiz#3Solutions

Problem 1 (30 points) Consider the singlecycle datapath shown in Figure 1 below. The latencies of the
componentsarespecifiedinthetable1.
TABLE1
Component Latency(ps)
ALU 100ps
Adder 10ps
ALUControlUnit 20ps
Shifter 5ps
ControlUnit/ROM 40ps
Sign/zeroextender 2ps
21Multiplexor 4ps
InstructionMemory 200ps
DataMemory 260ps
PCRegister(readaction) 2ps
Registerfile 20ps
Logic(1ormorelevelsofgates) 1ps


Figure1
In the Table 2 indicate the components that determine the critical path for the respective instruction, in the order
thatthecriticalpathoccurs.Ifacomponentisused,butnotpartofthecriticalpathoftheinstruction(iehappensin
parallelwithanothercomponent),itshouldnotbeinthetable.Theregisterfileisusedforreadingandforwriting;it
will appear twice for some instructions. All instructions begin by reading the PC register with a latency of 2ps.
NeglectingthedelaysintheALUControlandtheControl,aswellasthoseofthesignextensionunit,thedelaysforthe
criticalpathare:
Table2
Instr Datapathcomponentsused Total
ComponentLatency latency
RType PC IMemory(Read) MUX Reg MUX ALU MUX Reg 354ps
2 200 4 20 4 100 4 20
j PC IMemory(Read) Shifter MUX PC 213ps
2 200 5 4 2
lw PC IMemory(Read) MUX Reg ALU DM MUX Reg 610ps
2 200 4 20 100 260 4 20
areElementsUsedByInstruction
a) CalculatethetotallatencyforeachtypeofinstructionintheTable2.(13points)
b) Giventhedatapathlatenciesabove,whichinstructiondeterminestheoverallmachinecriticalpath(latency)?
(4points)
Themachinecriticalpathisdeterminedbylwinstructionsbecausetheytakethelongesttime.

c) ModifythedatapathinFigure1(youcandrawdirectlyonit)toaccommodatejumpandlinkinstruction(Jal).
TheaddressofthereservedregisterwhichwillstorethePC+4hastobemadeavailabletotheRegisterFile.
Similarly,PC+4hastobemadeavailableatthewriteaddressportoftheRegisterFile.(HintuseMemtoReg
Mux)(13points)
PC+4iftappedintotheMemtoRegmultiplexer.TheJumpcontrollikecontrolsthelastmultiplexerbeforetheaddressof
thesubroutineisfedintotheprogramcounter.Thenumber31(theaddresswherePC+4)isstoredintotheregisterfile
isaddedasinputtothemuxatthewriteaddressportoftheregisterfile.



Problem2(35points)

a) DrawthegraphicalrepresentationofthedatapathforthefollowingMIPScode,indicatinghowisthepipeline
dealingwiththebranchhazardifitisdetectedintheMEMphase(bruteforceapproach)?(13points)
70beq$2,$4,200
74and$10,$5,$6
78or$12,$7,$8
82sub$4,$2,$3
..
200j300

CC1 CC2 CC3 CC4 CC5 CC6CC7
70beq$2,$4,100 IF ID EX MEM WB
74and$10,$5,$6IF ID EXBubble Bubble
78or$12,$7,$8IF IDBubbleBubbleBubble
82sub$4,$2,$3IFBubble Bubble Bubble
100j300 IF IDEX
ThebranchdecisionishappeningintheMEMphase.Bubblesareinsertedsincethecontrolhazardisdetected,andwe
flush3instructions.

b) NowconsiderthemodifieddatapathinFigure2.Inwhichstageofthepipelineisthebeqdetectiontakingplace?
Howisthearchitecturedealingwiththecontrolhazardinthiscase(whatcontrollinevaluesareinvolved?)Add
theregistersinvolvedandtheaddresscalculation.Drawthegraphicalrepresentationagain.(13points)

In the modified datapath, the beq decision takes place in the ID stage of the pipeline. It improves by making the
decision earlier, so only needs to flush one instruction. The control lines asserted are: Branch, (the control block
decoded the beq instruction, IF.Flush (the control block flushes the instruction and from the IF/ID pipeline
register,andzero (iftheregisters$2and$4areequal).ThislinetogetherwithBranchbeingassertedmaketheoutput
fromtheandgatehigh,suchthatthebranchaddressisfedtothePC.Theregisterfileswillcompare$2and$4todetect
ifthebranchistakenornot.
Theaddresscalculationis:100>signextension>leftshift2>addPC+4.
Thenewpipelineis:
CC1 CC2 CC3 CC4 CC5 CC6 CC7
beq $2, $4,100 IF ID EX MEM WB
and $10, $5, $6 IF Bubble Bubble Bubble Bubble
j 200 IF ID EX MEM WB


Figure2

c) If branches occur 20% of the time in an application which runs on both architectures, by how much is the
performanceofthesecondarchitecturefasterthanthefirstarchitecture,assumingclockissame,andsoisthe
instructioncount.(9points)
WeknowthatthePerformanceisinverseofCPUExecutiontime.ThusPerformanceID/PerformanceMEM=
ExectimeMEM/ExectimeID.Sinceinstructioncountandclockaresame,PerformanceID/PerformanceMEM=
CPIMEM/CPIID
CPIMEM=1*80%+4*20%=1.6cycles/instruction
CPIID=1*80%+2*20%=1.2cycles/instruction
Performance improvement = 1.6 / 1.2 = 1.33. Thus by moving branch decision to the ID stage we make the
architecture33%faster.

Problem3(40points)
a) ThearchitectureinFigure3handlesthefollowingcodeexample.Aretheredependencies?Drawthegraphical
representationandexplainwhathappens.(13points)

add$t7,$t6,$t8
or$s5,$t4,$t5
addi$t1,$t1,7
sub$s1,$t2,$t3
beq$s1,$s5,Label
lw$s6,100($t8)

Dependencyexistsbetweenthe$s1registerofsubandbeqinstructions.The$s1computedinsub(inEXE)has
tobecomparedwith$s5(inIDphase).Hence,astallisrequiredtomakesurethecorrectvalueof$s1isbeingcompared
toby$s5.

Cc1 Cc2 Cc3 Cc4 Cc5 Cc6 Cc7 Cc8 Cc9 Cc10 Cc11
add,t7,t6,t8 If Id Exe Mem wb
ors5,t4,t5 If Id Exe Mem Wb
addit1,t1,7 If Id Exe Mem Wb
subs1,t2,t3 If Id Exe Mem Wb
Stall
beqs1,s5,label If Id Exe Mem Wb
lws6,100(f8) If Id Exe mem wb


b) Now consider that the addi instruction generates an overflow. Redraw the graphical representation and
explainhowthearchitecturedealswiththissituation.Whatregistersandcontrollinesareinvolved,howand
why?(13points)
CC1 CC2 CC3 CC4 CC5 CC6
addi t1,t1,7 IF ID EX Bubble Bubble
sub s1,t2,t3 IF ID Bubble Bubble Bubble
80000180hex IF ID EX MEM

TheoverflowhappenswhenaddiisintheEXphase.TheALUraisestheOverflowlinewhichisconnectedtotheControl
block.TheControlblockinturntellstheMUXattheinputtothePCtowriteinahardcodedaddress(80000180hex)
whichistheOSremedialsoftwareaddress.Thisexceptioncausestheaddi instructiontobe
flushedwiththeinstructionsafterit,andtheaddress80000180hextobeloadedtodealwiththeexception.
Furthermore,theaddressoftheinstructioncausingtheexceptionisloadedintheException
ProgramCounter(EPC),sothatexecutionoftheapplicationcanresumewhentheOShasfinishedtheremedial
action.ThecauseoftheexceptionisloadedintheCauseregister.
An ID.Flush signal is raised so to input 0 control signals during the ID phase. This will create a bubble for the sub
instructionfollowingtheonethatcreatedtheoverflow.
AnEX.Flushsignalisusedtocausenewmultiplexorstozerothecontrollinessuchthattheerroneousresultsproduced
byaddiarenotstoredintheregisterfile.


Figure3


c) Now assume that the above code runs on a static dualissue pipeline. Draw the pipeline again (graphic
representation).Whatmodificationsareneededfor
TheInstructionMemory;
TheregisterFile;
ALU(s)
Signextenderunit(s)
Sizeofpipelineregisters(14points)

Thestaticdualissuepipelinewouldissueinstructionsinpairsandassignanoopifoneoftheinstructionsofthe
pairhasadependency.

Cc1 Cc2 Cc3 Cc4 Cc5 Cc6 Cc7


Addt7,t6,t8 If Id Exe Mem Wb
Ors5,t4,t5 If Id Exe Mem Wb
Addit1,t1,t7 If Id Exe Mem Wb
Subs1,t2,t3 If Id Exe Mem Wb
Noop
Beq If Id Exe Mem Wb
s1,s5,label
Lw If Id Exe Mem wb
s6,100(t8)


Pipelinedatapathmodifications
TheInstructionMemorywillneedtwooutputportssotwoinstructionsmaybefetchedateachclockcycle.
The register File will have four output ports such that registers for two instructions may be fetched
simultaneouslyduringIDphase.Similarlytherewillbetwowriteportstotheregisterfile.
ALU(s)TherewillbetwoALUs,onedoingmemoryrelatedoperations(addresses)
Signextenderunit(s)Therewillbetwosignextenderunits,oneforeachALU
Sizeofpipelineregisterswillincreasetoaccommodatetheextradatabeingpipelines.

You might also like