Professional Documents
Culture Documents
Vin
Base
Emitter
NAND gate
Vcc
Vout
V2
V1
NOR gate
Vcc
Vout
V1
V2
159.233 Lecture 1 - 1
NOT
A
0
1
X
1
0
B
NAND
A
0
0
1
1
B
0
1
0
1
X
1
1
1
0
B
NOR
A
0
0
1
1
B
0
1
0
1
X
1
0
0
0
159.233 Lecture 1 - 2
B
AND
A
0
0
1
1
B
0
1
0
1
X
0
0
0
1
B
OR
A
0
0
1
1
B
0
1
0
1
X
0
1
1
1
1
NAND
159.233 Lecture 1 - 3
Boolean Algebra
Another way to describe circuits is by using Boolean Algebra.
Variables (normally capital letters) can be either 0 or 1 and can be
combined by :
AB
A+B
>AorB
>notA
B
0
1
0
1
C
1
0
1
1
Look at each row that produces a 1 in the C column. Form the expression
using AND and NOT that generates a 1. OR all the rows that produce a 1
together:
C
A B +A B +AB
159.233 Lecture 1 - 4
Use standard ANDs and ORs to start with. Use more than two inputs if
necessary.
Convert to NANDs and NORs when finished.
Inverting circles can be added to/removed from either end of a line.
3 input devices can be formed from 2 input devices with the output
combined with the third input
Cache memories
~100ns
~10ns
159.233 Lecture 1 - 5
159.233 Lecture 1 - 6
159.233 Lecture 1 - 7
X4
X1
X4
X1
X3
X3
X2
X5
X2
cache
before
cache
after
159.233 Lecture 1 - 8
Direct mapping
The position or index of a data item in the cache is determined directly
from its memory address.
Consider a cache of 8 entries each entry 32 bits wide.
Direct mapping will place data from memory into locations in the cache at
positions corresponding to the low-order 3 bits of their memory address:
location
000000
000001
000010
000011
000100
000101
000110
000111
001000
..
010101
..
111111
data
23
44
13
48
17
56
44
22
39
cache index
000
001
010
011
100
101
110
111
000
29
101
33
111
159.233 Lecture 1 - 9
cache
index
000
001
010
011
100
101
110
111
cache
data
23
28
34
48
17
29
92
33
corresponding
memory
000000
100001
011010
000011
000100
010101
100110
111111
2.
match the cache tag with the cache tag part of the address
100110
cache index is 110 -> cache tag is 100
cache tag part of 100110 is 100
100 == 100
-> hit
010011
cache index is 011 -> cache tag is 000
cache tag part of 010011 is 010
000 != 010
-> miss
159.233 Lecture 1 - 10
To increase the size of the cache we can either increase the number of
positions, or increase the size of the data stored at those positions
Example of a cache with 32 entries each 32 bytes of data:
cache
index
2
cachetag
10237
9
31
==?
Valid
cachetag
byte
select
1
4
0
cachedata
1
31
0
0
1
10237
36
31
Increasing the number of bytes for each cache entry involves a penalty
when there is a miss. More bytes need to be copied from memory into
the cache. Spatial locality means that this is not necessarily a bad thing!
Using the cache index to find the correct data line before comparing the
cache tag involves two levels of computation.
159.233 Lecture 1 - 11
A fully associative cache combines the cache index with the cache tag.
byte
select
1
4
0
cachetag
482933
31
Valid
cachedata
1
31
cachetag
11335
==?
33561
==?
482933
==?
7562
==?
36
159.233 Lecture 1 - 12
Cache replacement:
In a directly mapped cache:
If there is a cache miss then the cache is updated and the new entry
replaces the old.
In a fully associative cache:
If there is a cache miss then the new entry can be placed anywhere.
Where do we put it?
Using the Least Recently Used algorithm - hardware keeps track of which
cache entry has not been accessed for the longest time.
This can be tricky to implement!
159.233 Lecture 1 - 13
Full Adder
A full adder adds two binary numbers (A,B) together and includes
provision for a carry in bit (Cin) and a carry out bit (Cout).
The truth table for a full adder is:
A
B
0
0
0
Cin
0
0
1
0
1
0
Cout
0
0
0
Sum
0
1
1
159.233 Lecture 1 - 14
0
1
1
1
1
1
0
0
1
1
1
0
1
0
1
1
0
1
1
1
0
1
0
0
1
->
Sum
Cout
=
=
A
B Cin
A BCin
A A
B B
+
+
A B
Cin
A
B Cin
+
+
A
B
Cin
AB
Cin
+
+
ABCin
ABCin
Cin Cin
Sum
B
Cout
Cin
Cin
159.233 Lecture 1 - 15
Decoder
A decoder accepts a binary encoded number as input and puts a logic 1
on the corresponding output line.
For 2 inputs
3 inputs
->
->
4 output lines
8 output lines
Inputs
I1
0
0
1
1
I0
0
1
0
1
A
1
0
0
0
Decoder
B
0
1
0
0
C
0
0
1
0
I1 I1
D
0
0
0
1
A=
I1
I0
B=
I1 I0
C=I1
I0
D=I1I0
I0 I0
A
I1
I1
I0
C
I0
D
159.233 Lecture 1 - 16
Decoder
Outputs
OE
0
0
0
0
1
1
1
1
I1
0
0
1
1
0
0
1
1
I0
0
1
0
1
0
1
0
1
I1 I1
A
1
0
0
0
0
0
0
0
I0 I0
OE
C
0
0
1
0
0
0
0
0
D
0
0
0
1
0
0
0
0
A=
I1
I0 OE
B=
I1 I0OE
C=I1I0
OE
D=I1I0OE
OE
A
Multiplexor
159.233 Lecture 1 - 17
Output
Inputs
Control inputs
address
2 address lines
3 address lines
0
0
0
0
C1
0
0
1
1
C0
0
1
0
1
Q
I0
I1
I2
I3
OE(
C1
C0I0+
C1C0I1+C1
C0I2+C1C0I3)
159.233 Lecture 1 - 18
C1 C1
C0 C0
OE
I0
I1
I2
I3
159.233 Lecture 1 - 19
Demultiplexor
1 input is copied to one of several outputs
Common configurations are:
1 input
1 input
2 address lines
3 address lines
Input
4 outputs
8 outputs
Outputs
Control inputs
address
Glitches
So far we have assumed that all logic devices are infinitely fast.
Outputs immediately reflect inputs.
In practice this is not the case. There is a short delay between an input
logic level changing and the corresponding output change.
Gate delays for TTL are typically 5 nanoseconds.
20 cm of wire will also delay a signal by 1 nanosecond.
159.233 Lecture 1 - 20
A
1
A
A+
=
A
TRUE
A
A
A+A
This spurious 0 is called a glitch.
These glitches may or may not have disastrous effects.
It is possible to design circuits that have no glitches. This requires extra
hardware and considerable effort.
Another approach is to design circuits that are insensitive to glitches.
Wait a fixed time after a change - during which the glitches will go away.
This idea is the basis for clocked circuits that we will come to later.
159.233 Lecture 1 - 21
Memory
Feedback
If we loop the output of a circuit back into its input, we have a condition
called feedback, and some interesting results are obtained.
Will eventually settle on one input 0 and the other 1 (either way)
159.233 Lecture 1 - 22
RS Flip-flop
S
159.233 Lecture 1 - 23
Vcc
Vcc
159.233 Lecture 1 - 24
Circuits can use a clock to provide timing pulses. Logic levels can be
read on the clock pulse, which is allowed to go high only when glitches
are unlikely.
The RS Flip-flop can be clocked.
Clock
Q
The output of the AND gates will only reflect S and R when the Clock is 1
When the Clock is 0 the outputs of the AND gates are always 0
One problem with the RS Flip-flop is what happens when R and S are 1.
Q 0 !
In this state the only stable state is both Qand
159.233 Lecture 1 - 25
Clocked D Flip-flop
The D flip-flop removes this ambiguity
D
Q
Clock
Q
The Clock can now be considered a Strobe. A single clock pulse.
When the strobe is high, the value of D will be transferred to Q
Q will remain at that value until the strobe returns to a high state.
The D Flip-flop acts as a simple 1 bit memory.
As described, this type of flip-flop is level triggered. It depends on the
strobe or clock being in a high state.
In practice this is not so desirable. It is better to transfer the value of D to
the flip-flop when the strobe changes from a 0 to a 1
This is called edge triggered.
Edge triggered flip-flops are far more common.
They can be either 0>1 rising edge, or 1>0 falling edge
Clock
159.233 Lecture 1 - 26
J is equivalent to s.
K is equivalnet to R.
159.233 Lecture 1 - 27
Register
We construct a register by connecting together a group of D flip-flops
Data
in
D Q
D Q
Q3
D Q
Q2
D Q
Q1
Q0
159.233 Lecture 1 - 28
Bus
A bus is a common set of wires connecting multiple devices together.
Buses are used to connect the data and address lines between computers, memory
and i/o devices.
One problem with normal logic components is that their outputs are always active either a 1 or a 0 (5V or Gnd).
If we connect the output of two gates together, where one output is a 0 and the other
is a 1, then the result will be somewhere in between
If we want to use a bus architecture we must make sure that only one device at a
time outputs a logic signal of 0 or 1
All other devices must go into a high impedance state
ie they look like they are not connected at all.
Tri-state devices have this capability.
A tri-state device is like a normal gate but with the added capability of going into this
off-line mode.
In tri-state devices the OE output-enable line drives the outputs into this third state
rather than into the 0 logic state as previously.
Not all devices are tri-state. To connect these devices to a bus we need to put a tristate driver chip in between the device and the bus.
A tri-state driver just sends its input to its output, but also has an OE input to allow the
output to go not connected.
OE
1stchip
3rdchip
etc..
2ndchip
(Some memory chips are several bits wide - normally a power of 2)
The desired bit is selected by the use of address lines.
159.233 Lecture 1 - 30
R/W
Data
Address
159.233 Lecture 1 - 31
select 1 flip-flop
perform the write on a clock pulse
the pulse must arrive after the chip is selected and a write operation
specified.
Use a decoder and the address lines to select a particular flip-flop
AND this signal with the
in.
CS ,
Datain
D
Decoder
A0 A1
R/W
CS
159.233 Lecture 1 - 32
D Q
D Q
D Q
mux
Data
out
D Q
Decoder
A0A1
A0 A1
R/W CS
Place the correct bit on the output line by using a multiplexor
CS , R/
W to enable a tri-state buffer.
Use the
159.233 Lecture 1 - 33
1Mb Memories
We can make the above memory larger by making it Byte wide rather
than bit wide.
This just means we have to copy 8 times the flip-flop column and its
associated data in and data out lines.
The lines from the decoder and CS/RW logic go to each of the flip-flop
columns.
For large Bit wide memories, we soon run into large numbers of pins to
connect to the chip.
1Mbit needs:
20 address lines
1
data line
1
CS line
1
R/W line
2
power/ground
The decoder within the chip will need 1 million output lines!
To get round these problems, we arrange the memory as a 2 dimensional
grid, 1024 x 1024 elements.
We use 2 decoders and send the address in 2 chunks - the row and
column addresses.
1Mbit now needs:
10 address lines
1
data line
1
CAS/RAS strobe (column/row strobe)
1
CS line
1
R/W line
2
power/ground
The 2 decoders now have only 1024 lines each as outputs.
Shift Registers
159.233 Lecture 1 - 34
Contain several flip-flops in a row. One bit is input at one end on each
clock pulse. Each other bit moves one place to the right (or left). The
outputs of each flip-flop are available simultaneously.
Q3
Data
Q2
D
Q1
D
Q0
D
Clock
We can use shift registers for serial to parallel conversion. Input 8 bits on
8 pulses, then read data simultaneously.
Division by 2
We can use the above principle to implement a divide by 2 circuit.
In the parallel load register (lecture 4) we can use an extra control line
load
shift/
to determine whether the input to a flip-flop comes from the data lines or
from the previous flip-flop (the left-most flip-flop gets a 0 in shift mode)
(use the control line as the address of a 2 input multiplexor, the output is
connected to the input of the flip-flop)
Try this as an exercise.
(design your own 2 input mux using AND/OR gates)
159.233 Lecture 1 - 35
Multiplication by 2
Multiplication can be done by shifting to the left.
Higher powers of two can be done by shifting several times.
An extra flip-flop can be added to the ends of the chain.
In the case of division by 2, this flip-flop will contain the remainder.
In the case of multiplication the flip-flop will flag whether an overflow has
occurred.
The 74LS75B is a parallel load/shift right register 4 bits wide, which (with
external assistance) can do shift left as well.
159.233 Lecture 1 - 36
Counters
Add one to the current number.
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
CLK
Q
Use the Qn-1 output as a clock to the Qn flip-flop
5v
CLK
K
Q0
K
Q1
K
Q2
Q3
159.233 Lecture 1 - 37
CLK
Q0
Q1
Q2
This is an ideal case - unfortunately the output of a flip-flop, like any gate, is slightly
delayed with respect to a changing input.
So the toggling of Q1 will be delayed by a few nanoseconds.
Q2 will only toggle on a change in Q1 so it will be delayed by twice the amount.
This effect will keep on compounding.
These glitches can have quite a disturbing effect.
Consider the case when the counter is at 0111
At the next pulse we would expect the counter to show 1000
However because of the delays the following patterns will all occur
0110
0100
0000
1000
Circuits can be designed so that this ripple effect can be safely ignored,
However it is possible to design a counter that changes all the gates
simultaneously.
159.233 Lecture 1 - 38
Synchronous Counters
We cannot use the preceding bit as a pulse.
The same pulse must be used for each flip-flop.
How do we know when to toggle a flip-flop?
The synchronous counter uses the following fact:
a bit will toggle if all the low-order bits in the previous state are 1
5v
J
K
J
K
Clock
Flip-flops toggle at the same time - but only if J-Ks are high.
J-Ks are formed by ANDing the output of the previous bits.
Because the new output comes after the toggle by a few nanoseconds
The new state is not involved in any of the inputs
The AND gates are sampling the previous state.
We are deliberately using to our advantage the gate delay time.
159.233 Lecture 1 - 39
A controller -
eg a CPU
Mechanical
eg a lift
Electrical
Hydraulic etc
For a computer processor:
159.233 Lecture 1 - 40
commands
Controller
processor
status
memory
159.233 Lecture 1 - 41
159.233 Lecture 1 - 42
Z
F
Q
Each state starts in a square box and lasts until you reach the next
square box.
159.233 Lecture 1 - 43
State0
State1
T
Z
F
Q
State2
159.233 Lecture 1 - 44
2.
3.
On the tick of the clock transfer that new state into the register
4.
repeat indefinitely
>
>
0
1
>
>
>
>
00
01
10
11
>
>
000
001
etc
If we have seven states we will have to use 3 bits - one of the states will
never occur.
159.233 Lecture 1 - 45
status
inputs
combinatorial
logicfornext
state
current
state
Register
outputs
next
state
clock
159.233 Lecture 1 - 46
State0
State1
T
Z
F
Q
State2
159.233 Lecture 1 - 47
Current state
AB
0
00
1
2
In state 0
Next state
AB
1
01
2
10
0
00
0
00
01
10
Condition
Z
Z
the B flip-flop
the A flip-flop
->
->
In state 1
the B flip-flop
the A flip-flop
->
->
0
0
In state 2
the B flip-flop
the A flip-flop
->
->
0
0
We could use a combination of logic gates to generate our new state but this gets very messy for large numbers of states / inputs.
A very neat way of solving the problem is to use table look-up
Given our present state look it up in a table and read off the new state.
A multiplexor acts as a table look up.
T F
T F
1
2
1
2
1 0
1 0
Current state
159.233 Lecture 1 - 48
ZZT F
ZZT F
0 or 1
1
2
1 or 0
1
2
0 0
0 0
Current state
ZZT F
ZZT F
New A
1
2
New B
1
2
AB
AB
Current state
159.233 Lecture 1 - 49
Output signals
There are two standard ways of generating output signals from an ASM.
One way or the other will normally involve less chips.
Use one mux chip for each output.
Feed the current state into the mux. For each of the possible states use
the correct inputs to give the desired output signal.
ZZT F
1
2
3
AB
Use a decoder chip on the current state.
Z
0
Decoder
B
2
3
00
01
F
X
T
Y
F
11
10
T
W
F
Z
T
Current state
AB
0
00
1
2
3
01
10
11
Next state
AB
1
01
2
10
3
11
0
00
2
10
0
00
0
00
2
10
3
11
Condition
XY
XY
W
T
ZX
ZX
Z
159.233 Lecture 1 - 51
In state 0
the B flip-flop
the A flip-flop
->
->
X Y+X
In state 1
the B flip-flop
the A flip-flop
->
->
F
W
In state 2
the B flip-flop
the A flip-flop
->
->
F
F
In state 3
the B flip-flop
the A flip-flop
->
->
X Y +X
ZX+Z
>
>
X+Y
X+ Y
>
X+ Z
XYZT F
0
1
New B
2
3
WXY ZT F
AB
Current state
0
1
New A
2
3
AB
Current state
159.233 Lecture 1 - 52
A Multiplication Circuit
Phase 1:
Analyse the problem in terms of its inputs
Generate an architecture
Phase 2:
Design a controller (ASM)
0000
0111 partialproducts
0000
0111
01000110
product
This way is not satisfactory:
We have to remember all the partial products before we can sum them to
get the final product -> we would need lots of registers
Instead we can keep a running total in a single register.
If the multiplicand and multiplier are each n bits wide, the running total
needs to be 2n bits wide.
159.233 Lecture 1 - 53
Multiplier
1. Start the running total with the multiplier in the low order bits.
2. Increment a loop counter.
3. Check the state of the lowest bit.
4. If it is a 1 add the multiplicand to the high order bits of the total.
5. Shift the running total 1 bit to the right.
6. Repeat from 2 until the loop counter gets to n.
The multiplier has now disappeared from the total.
The total itself has the first terms in the addition shifted down to the low
order end
ie
Counter
1
00001010
>
Low
Bit
0
>
00000101
>
>
00111010
>
>
00011101
>
>
01000110
>
Add
00000101
01110000
01110101
00011101
01110000
10001101
Shift
>
00000101
>
00111010
>
00011101
>
01000110
answer
159.233 Lecture 1 - 54
Loworder
result
Highorder
result
LOBIT
carry
LDPROD
SHIFT
SRA
LDMPL
R
RSTA
SRB
carry
Multiplier
Adder
INCR
CLRCNT
2bit
Counter
EQZ
Multiplicand
159.233 Lecture 1 - 55
00
CLRCNT
F
START
T
LDMPLR
RSTA
01
INCR
LOBIT
10
LDPROD
F 11
SHIFT
EQZ
T
DONE
Current state
AB
0
00
1
2
3
01
10
11
Next state
AB
0
00
1
01
2
10
3
11
3
11
0
00
1
01
Condition
START
START
LOBIT
LOBIT
EQZ
EQZ
159.233 Lecture 1 - 56
LDMPLR
RSTA
START
0
Decoder
B
2
3
EQZ
CLRCNT
INCR
LDPROD
SHIFT
DONE
NewA
1
2
3
A B
0
NewB
1
2
3
B
A B
2bit
register
Designing a Computer
von Neumann model
general purpose architecture
programmed by stored instructions
159.233 Lecture 1 - 57
Bus
OEPC
Tristate
buffer
ProgramINCPC
counter LDPC
OECONST
Tristate
buffer
OEACC
Tristate
buffer
LDACC Accumulator
CO/C1
const
mux
OEPORT
Tristate
buffer
Memory
InputPort
zero EQZ
detect
Arithmetic
LogicUnit
LDABR ABuffer
Register
xor
invertor
sub
LDMAR Memory
Address
Register
Program Counter
PC
Arithmetic/logic
ALU
Accumulator
Acc
Input port
IO
Const mux
Buffer register
Memory address
register
ABR
EQZ
MAR
Memory
Bus
8 bits wide
159.233 Lecture 1 - 59
Acc
LDACC
Adder
LDABR
ABR
159.233 Lecture 1 - 60
Subtraction:
a
(2scomplement)b
2scomplementb
complementb
b
0
1
0
1
EOR
0
1
1
0
borb
159.233 Lecture 1 - 61
LDACC
Acc
Adder
LDABR
ABR
EOR
array
sub
We now need to add a zero detect circuit to the Accumulator, and a tristate gate to attach the accumulator to the Bus.
zero detect:
EQZ
Accumulator
159.233 Lecture 1 - 62
OEACC
Tristate
buffer
zerodetect
LDACC
Acc
Adder
LDABR
ABR
EQZ
EOR
array
sub
The Controller
The controller is an ASM that implements the instruction set of the
computer
When the power is first turned on, and after each instruction has
completed the controller:
uses the value in the PC to address memory
the value at that memory location is the next instruction
issues a sequence of signals to the architecture corresponding
to the instruction
The input signals to the controller are:
159.233 Lecture 1 - 63
159.233 Lecture 1 - 64
Instruction set
The pico-computer only has 8 instructions:
0
LDA operand
ADD operand
SUB operand
STA operand
JPZ operand
IN
CLR
HLT
So a set of instructions to input three numbers and add them together is:
5
3 100
3 101
1 101
1 100
Halt
159.233 Lecture 1 - 65
Instructions in detail
For all these instructions we may assume that the PC has already been
incremented
LDA operand
Algorithm:
LDA
OECONST
LDABR
3
4
OEPC
LDMAR
INCPC
OEMEM
LDMAR
OEMEM
LDACC
159.233 Lecture 1 - 66
1
ADD operand
Algorithm:
ADD
OEACC
LDABR
OEPC
LDMAR
INCPC
OEMEM
LDMAR
OEMEM
LDACC
2
SUB operand
Algorithm:
SUB
OEACC
LDABR
OEPC
LDMAR
INCPC
OEMEM
LDMAR
OEMEM
LDACC
SUB
159.233 Lecture 1 - 67
3
STA operand
Algorithm:
STA
OEPC
LDMAR
INCPC
OEMEM
LDMAR
OEACC
WEMEM
4
JPZ operand
Algorithm:
JPZ
OEPC
LDMAR
INCPC
OEMEM
EQZ
T
LDPC
159.233 Lecture 1 - 68
5
IN
Algorithm:
1
IN
OECONST
LDABR
OEPORT
LDACC
DISABLE
6
CLR
Algorithm:
1
CLR
OECONST
LDABR
OECONST
LDACC
7
HALT
Algorithm:
1
Disable the clock
HALT
DISABLE
0
OEPC
LDMAR
INCPC
1
OEMEM
LDIR
INSTR
LDA
2 ADD
6 SUB 10
OECONST OEACC
OEACC
LDABR
LDABR
LDABR
3
OEPC
LDMAR
INCPC
OEPC
LDMAR
INCPC
8
4
OEMEM
LDMAR
11
OEPC
LDMAR
INCPC
STA
14
OEPC
LDMAR
INCPC
JPZ
19
IN 21
CLR
OECONST OECONST
LDABR
LDABR
HALT
17
OEPC
LDMAR
INCPC
12
15
18
OEMEM
OEMEM
OEMEM
OEMEM
LDMAR
LDMAR
LDMAR
13
16
5
9
OEMEM
OEMEM
OEACC
OEMEM
LDACC
LDACC
WEMEM
LDACC
SUB
EQZ
T
LDPC
20
OEPORT
LDACC
DISABLE
23
OECONST DISABLE
LDACC
22
159.233 Lecture 1 - 70
159.233 Lecture 1 - 71
0
RESET
1
OEPC
LDMAR
INCPC
7
DISABLE
PLOAD
ACTION
PLOAD
n
2
none
load
addr
OEMEM
LDIR
OEPORT
LDMAR
INSTR
hlt
inc
addr
INCMAR
load
memory
OEPORT
WEMEM
sta+jpz
lda+add+
3
sub+in+clr
LDABR
INSTR
lda
OECONST
in+clr
OECONST
add+sub
OEACC
4
OEPC
LDMAR
INCPC
5
OEMEM
JPZ
LDMAR
6
EQZ
y
?
pload.hlt
clr
sub
sta
pload.in
lda+add
SUB
DISABLE
OECONST
LDACC
OEMEM
LDACC
OEPORT
LDACC
DISABLE
OEACC
WEMEM
LDPC
159.233 Lecture 1 - 72
Nextstate
ABC
Condition
000
001
001
2
7
010
111
pload
pload
010
3
6
4
011
110
100
lda+add+sub+in+clr
hlt
sta+jpz
011
4
6
100
110
lda+add+sub
in+clr
100
101
101
6
1
110
001
jpz
jpz
110
001
111
0
7
000
111
pload
pload
159.233 Lecture 1 - 73
State0
A
B
C
=
=
=
false
false
true
A
B
C
=
=
=
pload
true
pload
A
B
C
=
=
=
hlt+sta+jpz
hlt+lda+add+sub+in+clr
lda+add+sub+in+clr
A
B
C
=
=
=
true
in+clr
false
A
B
C
=
=
=
true
false
true
A
B
C
=
=
=
jpz
jpz
A
B
C
=
=
=
false
false
true
A
B
C
=
=
=
pload
pload
pload
State1
State2
State3
State4
State5
jpz
State6
State7
159.233 Lecture 1 - 74
databus
instregister
decoder
T F
0
1
2
NewA
3
4
5
6
7
A BC
0
1
2
NewB
3
4
5
6
7
A BC
0
1
2
NewC
3
4
5
6
7
A BC
la ss j
dd ut p
ad ba z
ich
nll
rt
p
l
o
a
d
A
B
C
3bit
register
159.233 Lecture 1 - 75
Microprogramming
The Pico-computer uses hardware to implement its controller.
Combinations of status lines are used to create the next state of the
ASM, and multiplexors choose between them given the current state.
The muxs are performing a table-lookup
- a job that could just as easily be performed by a ROM.
- The Microprogrammed approach.
Benefits:
If the instruction set needs to be altered, or corrections made, it is a lot
easier to change a ROM, than to change or add to the combinatorial logic
of the hardwired approach.
A processor can be made to look like, emulate, another processor, by
changing the micro-code, an instruction set of another processor can be
implemented (as long as the architecture is compatible).
Extra instructions can be implemented depending on the type of program
the processor is executing.
If the program is heavily maths based, we could implement one
instruction that performs a Fast Fourier Transform on a set of numbers.
If its text based, we could implement a strcpy instruction that performs a
block move of data.
In both these instances we are only invoking 1 instruction instead of
many. So the total cycle time will be reduced because of the smaller
number of instruction fetches. (Data fetches and ALU operations will be
the same)
159.233 Lecture 1 - 76
Micro-program controller
processor
control
line
s
Bus
opcode
ROM
nextaddr
condition
select
addr
out
opcode/
nextaddr
in
program ld/
inc
counter
F
T
EQZ
Clock
There is no longer an instr register. The -controller loads the opcode
from the Bus when necessary into its own program counter.
The -program counter is reset to zero on power up.
-addr - initially points to address 0 of -ROM.
-ROM contains a sequence of data bytes.
These data bytes appear as output and control lines:
processor control lines
condition select
next -addr
opcode/(next -addr)
159.233 Lecture 1 - 77
159.233 Lecture 1 - 78
O
E
P
-addr cond opcode next C
sel /addr addr
0
1
.
8 .
goto fetch
1
1
.
10 .
goto LDA
2
1
.
14 .
goto ADD
3
1
.
18 .
goto SUB
4
1
.
22 .
goto STA
5
1
.
25 .
goto JPZ
6
1
.
29 .
goto CLR
7
1
.
31 .
goto HLT
fetch 8
0
.
1
mar=pc
9
1
1
.
+; goto Bus
lda 10
0
.
.
abr=0
11
0
.
1
mar=pc; pc++
12
0
.
.
mar=m[mar]
13
1
.
8 .
a=abr+m[mar]; goto fetch
L
D
P
C
I
N
C
P
C
O
E
C
O
N
S
T
O
E
A
C
C
L
D
A
C
C
L
L O
D
D E
A S M M
B U A E
R B R M
W
E
M
E
M
D
I
S
A
B
L
E
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . 1 . . .
. 1 . . . . . . 1 . . pc+
. . 1 . . 1 . . . . .
. 1 . . . . . 1 . . .
. . . . . . . 1 1 . .
. . . . 1 . . . 1 . .
159.233 Lecture 1 - 79
O
E
P
-addr cond opcode next C
sel /addr addr
add 14
0
.
.
abr=a
15
0
.
1
mar=pc; pc++
16
0
.
.
mar=m[mar]
17
1
.
8 .
a=abr+m[mar]; goto fetch
sub 18
0
.
.
abr=a
19
0
.
1
mar=pc; pc++
20
0
.
.
mar=m[mar]
21
1
.
8 .
a=abr-m[mar]; goto fetch
sta 22
0
.
1
mar=pc; pc++
23
0
.
.
mar=m[mar]
24
1
.
8 .
m[mar]=a;goto fetch
jpz 25
0
.
1
mar=pc; pc++
26
2
.
28 .
if EQZ goto 28
27
1
.
8 .
goto fetch
28
1
.
8 .
pc=m[mar];goto fetch
clr 29
0
.
.
abr=0
30
1
.
8 .
a=abr+0;goto fetch
hlt 31
1
.
8 .
disable;goto fetch
L
D
P
C
I
N
C
P
C
O
E
C
O
N
S
T
O
E
A
C
C
L
D
A
C
C
L
L O
D
D E
A S M M
B U A E
R B R M
W
E
M
E
M
D
I
S
A
B
L
E
. . . 1 . 1 . . . . .
. 1 . . . . . 1 . . .
. . . . . . . 1 1 . .
. . . . 1 . . . 1 . .
. . . 1 . 1 . . . . .
. 1 . . . . . 1 . . .
. . . . . . . 1 1 . .
. . . . 1 . 1 . 1 . .
. 1 . . . . . 1 . . .
. . . . . . . 1 1 . .
. . . 1 . . . . . 1 .
. 1 . . . . . 1 . . .
. . . . . . . . . . .
. . . . . . . . . . .
1 . . . . . . . 1 . .
. . 1 . . 1 . . . . .
. . 1 . 1 . . . . . .
. . . . . . . . . . 1
159.233 Lecture 1 - 80
Condition
opcode/
Next
Select
(next addr)
address
1
0
8
=>
LD is True; -PC will load data
=>
Data will come from Next addr
=>
The -PC will be 8 next clock tick
8
LDMAR;OEPC
CS = 0
0
=>
=>
0
0
mar = pc (=0)
-PC will inc to 9 on next clock tick
9
INCPC;OEMEM
CS = 1
op/n = 1
Bus = 1
1
=>
=>
=>
=>
1
0
pc =1; Bus = mem[0]; (Bus=LDA = 1)
LD is True; -PC will load data
Data will come from Bus/opcode
-PC will be 1 on next clock tick
1
CS = 1
op/n = 0
Next addr = 10
1
=>
=>
=>
0
10
LD is True; -PC will load data
Data will come from Next addr
The -PC will be 10 next clock tick
10
0
LDABR;OECONST =>
CS = 0
=>
0
0
Bus = 0; abr = Bus; (abr=0)
-PC will inc to 11 on next clock tick
11
0
OEPC;INCPC;LDMAR
CS = 0
=>
0
0
mar = pc (mar = 1); pc++; (pc = 2)
-PC will inc to 12 on next clock tick
=>
12
OEMEM;LDMAR
CS = 0
0
=>
=>
0
0
mar = m[mar] (mar = m[1] = 100)
-PC will inc to 13 on next clock tick
13
OEMEM;LDACC
CS = 1
op/n = 0
Next addr = 8
1
=>
=>
=>
=>
0
8
acc = abr + m[mar] (acc = 0 + m[100])
LD is True; -PC will load data
Data will come from Next addr
The -PC will be 8 next clock tick
159.233 Lecture 1 - 81
12
6
4096
decoding
circuit
200
ROM
control
lines
159.233 Lecture 1 - 82
Vertical Microprogramming
This considerably reduces the amount of ROM needed, but at a cost the hardware for the decoding circuit is fixed and so we lose some of the
benefits of being able to program at this level.
One solution to this is to replace the decoding circuit by yet another ROM
- nano-ROM.
Use table lookup to generate the control lines
6
nanoROM
200bitx64
12
6
4096
200
ROM
control
lines
Construction of an ALU.
The picocomputer has a very primitive ALU. It consisted of just an Adder
with the ability to subtract using 2s compliment arithmetic. There was no
Logic in it at all!
159.233 Lecture 1 - 83
4bitALU
Carryout
Function
Where the Function input selects the desired combination of the two
inputs.
If the Function input consists of 2 control lines, 4 functions are
possible:
eg:
00 Add
C = A + B
01 Subtract
C = A - B
10 And
C = A and B
11 Or
C = A or B
If we increase the number to 3 control lines then further functions (8 in
total) are possible - eg Xor
159.233 Lecture 1 - 84
add
sub
C
and
or
function
This ALU can be constructed using a 74153 4-input mux, a 7483 4-bit
adder, and and or gates. The subtract function needs a little work. It has
to generate the inverse of B, and the carry-in signal to the adder, as well
as being used by the address lines of the mux.
f0
carry in
sub
f1
B0
159.233 Lecture 1 - 85
Adders
There is one small problem with designing an n-bit adder. It is normal to
design a 1-bit adder and then use n of them to handle n-bits - a ripplecarry adder
carry in
sum
carry out
159.233 Lecture 1 - 86
=>
Cin1
=
=
=
=
g0
g1
g2
g3
+
+
+
+
+
(p0.c0)
(p1.g0) + (p1.p0.c0)
(p2.g1) + (p2.p1.g0) + (p2.p1.p0.c0)
(p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0)
(p3.p2.p1.p0.c0)
159.233 Lecture 1 - 87
A carry is generated if carry in for some earlier bit is a one, and all
intermediate bits propagate a carry. This is called a carry-lookahead
adder
Now we have groups of 4-bits with the carry being handled as quickly as
the addition.
To extend this scheme to more than 4-bits we can either:
block them together as before and let the carry ripple through
a partial carry-lookahead adder
or
use a higher level carry lookahead on each block of 4-bits
P0 = p3.p2.p1.p0
G0 = g3 + (p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0)
C1 = G0 + (P0.c0)
and similarly for C2, C3 and C4
159.233 Lecture 1 - 88
0111
1100
=>
=>
0011
1110
159.233 Lecture 1 - 89
Example of 2 * 6
Multiplicand
0010
1
0010
0010
2
0010
0010
3
0010
0010
4
0010
0010
Step
Initial values
00: do nothing
ASR
10: sub mcand
ASR
11: do nothing
ASR
01: add mcand
ASR
Shift Regs
0000 0110 0
0000 0110 0
0000 0011 0
1110 0011 0
1111 0001 1
1111 0001 1
1111 1000 1
0001 1000 1
0000 1100 0
Write:
Eg
ADD
AX,[102]
The OpCode is ADD, there are two operands AX and [102], and one
address [102]
Assembler language syntax is specific to the processor, and varies
widely.
Types of instructions
Five address machines
Machines with only one instruction type:
If we need to put all the information necessary into a single instruction,
then it must contain the following:
OpCode Dest Addr, Src1 Addr, Src2 Addr, Cond Code, True Addr,
False Addr
5 addresses
e.g.
SUB
->
5 address machine
[0],[1],[2],NZ,4,5
Which subtracts the value stored at location 2 from that at 1 and stores
the result in 0. If the result is not zero, then the next instruction is at
location 4 otherwise the next instruction is at 5.
Some very early computers used this format, but it is not used today. It
has the disadvantage that all instructions are very big. Addresses are
normally 32 bits long, so the instruction will be ~160 bits long.
159.233 Lecture 1 - 91
159.233 Lecture 1 - 92
:Arithmetic/Logic instrs
:Control instructions
[0],[1],[2]
4
159.233 Lecture 1 - 93
:Arithmetic/Logic instrs
:Memory/Memory move
:Control instructions
->
2 address
[0],[1]
[0],[2]
4
Although this particular program is still the same length as the other two,
in general programs will be shorter as several ALU instructions performed
in sequence will only be ~64 bits. Shorter programs tend to be slower, but
this is not always the case.
159.233 Lecture 1 - 94
:Control instructions
:Register/Memory moves
:Register/Memory ALU
A,[1]
A,[2]
[0],A
4
159.233 Lecture 1 - 95
:Memory/Register moves
:Control instructions
:Register/Memory moves
:Register/Register ALU
e.g.
MOV
MOV
SUB
MOV
JNZ
A,[1]
B,[2]
A,B
[0],A
4
159.233 Lecture 1 - 96
Stack machine:
One, final, zero address instruction set is for a stack based machine
OpCode Cond Code, True Addr
:Control instructions
PUSH
:Stack operation
POP
Src Addr
Src Addr
:Stack operation
e.g.
PUSH
PUSH
SUB
POP
JNZ
[1]
[2]
[0]
4
There are a few stack machines around today e.g. the Inmos
Transputer.
Addressing modes
The Pico-computer has one addressing mode: Direct.
lda 100
jnz 4
Each instruction contains an address.
Either the address is used directly for the jnz instruction, or the contents
of the address are transferred between memory and the accumulator.
There are several different assembler syntaxes for addresses
The pico-computer uses the address (a number) on its own. Another way
is to enclose the number in square brackets.
159.233 Lecture 1 - 97
Direct Addressing:
add
a,[100]
159.233 Lecture 1 - 98
Immediate Addressing:
add
a,100
Register Addressing:
add
a,b
a,[b]
159.233 Lecture 1 - 99
a,[b+10]
Indexed Addressing:
add
a,[b+c]
a,[b+c+10]
a,@[100]
a,[b]+
a,-[b]
Instruction Size.
Instructions can be
either
or
MIPS.
RISC machines trade simplicity of processor v Registers.
The MIPs processor has fixed size instructions - 32 bits.
There are 32 registers but register $0 is reserved for the value 0
register $31 is the PC.
R type instructions:
ALU operations have three operands all of them registers
Opcode
6
add
rs
5
rt
5
rd
5
shamt
5
$1,$2,$3
sub-fct
6
bits
;$1 = $2 + $3
rs
5
rt
5
displ
16
$1,100($2)
bits
;$1 = mem[$2+100]
J type instructions:
Jump and procedure call.
Opcode
6bits
j
displ
26
10000
bits
;goto 10000
add
$1,$2,$3
;$1 = $2 + $3;
sub
$1,$2,$3
;$1 = $2 - $3;
and
$1,$2,$3
or
$1,$2,$3
;$1 = $2 | $3;
slt
$1,$2,$3
jr
$31
;goto $31;(return;)
lw
$1,100($2)
sw
$1,100($2)
beq
$1,$2,100
10000
;goto 10000
jal
10000
;$31=PC+4;goto 10000;
;(function call;)
ADD
P
C
addr
instr
Instrmem
R-type instructions:
These instructions only involve Registers and the ALU - remember that
the MIPS chip has a load/store architecture. Data stored in memory must
first of all be transferred to a register before performing any operation on
it.
read
data1
read
reg1
read
reg2
write read
data2
reg
instr
32
ALU
32
write
data
Registers
32
$4,$10,$14
will add register 14 to register 10 and put the result into register 4
The 32 bits making up this instruction has the form
Opcode
Rs
Rt
Rd
Shamt
Subfct
I-type instructions
These instructions load/store data from memory to register
lw
$1,100($2)
instr
read read
data1
reg1
read
reg2 read
write data2
reg
ALU
addr read
data
write
data
write
data
Registers
Datamem
16
32
sign
extend
Branch instructions
These also have a 16bit immediate data in the instruction, which needs
to be added to the program count if the branch is taken
As the PC is always a multiple of 4, the immediate value has also to be
shifted left by 4.
PC+4
branch
address
shift
left2
instr
read
reg1
read
reg2
ADD
read
data1
ALU
branch
control
logic
write read
reg
data2
write
data
Registers
16
32
sign
extend
The ALU produces an output that will control whether the conditional
branch is taken or not.
The branch address is formed from (PC + 4) + the displacement
These separate parts can now be combined into one complete picture for
the MIPs processor architecture.
m
u
x
ADD
P
C
shift
left2
addr
instr
Instrmem
read read
data1
reg1
read
reg2 read
write data2
reg
ADD
addr read
data
m
u
x
ALU
write
data
write
data
Registers
Datamem
16
m
u
x
32
sign
extend
The multiplexors have been included so that we can control which data is
to be forwarded.
The MUX sending data to the PC will be controlled by the branch control
logic
The MUX before the adder will be controlled by whether the instruction is
a load/store or not.
The MUX after the Data mem will control whether either the output from
the ALU is written back to the register bank (an R-type instruction), or
whether data from memory is written to a register (a load instruction)
Pipelining
Pipelining is an implementation technique where multiple instructions
have overlapped execution.
159.233 Lecture 1 - 111
In a car assembly plant, the work done to produce a new car is broken
down into several small jobs. As a car progresses along the assembly
line each of these jobs is performed in sequence.
The total time taken to produce a car is still the same, but several cars
can be assembled simultaneously.
A new car appears at a rate determined by how long each of the small
jobs take.
The MIPs processor can be thought of as having several steps:
lw $1,100($0)
10ns
5ns
10ns
10ns
5ns
Instr
Reg
fetch
ALU
Data
fetch
Reg
EX
MEM
WB
ID
IF
1.
IF:
2.
ID:
3.
EX:
4.
MEM:
5.
WB:
40ns
40ns
IF ID
10ns
EX
MEM
40ns
5ns
WB
10ns 5ns 10ns
IF ID
10ns
EX
MEM
5ns
WB
10ns 5ns 10ns
IF ID
EX
10ns
MEM
5ns
WB
10ns
IF
10ns
ID
IF
10ns
10ns
EX
MEM
ID
IF
EX
ID
10ns
10ns
10ns
WB
MEM
EX
WB
MEM
WB
After we perform the ID and WB steps we must wait 5ns each step now takes the same amount of time - 10ns.
159.233 Lecture 1 - 113
10ns
IF
10ns
ID
IF
10ns
10ns
EX
MEM
ID
IF
EX
ID
10ns
10ns
10ns
WB
MEM
EX
WB
MEM
WB
In cycle 3, the ALU is adding reg 0 to 100 for the first instruction
At the same time the immediate value 100 is changing to 200 as the
second instruction is being decoded.
Each pipeline stage needs to know the data for the instruction that it is
currently working on.
Before each of the stages, we need to include a register to remember
details of the current instruction for that stage..
We pass the contents of this register to the next stage at beginning of the
next cycle.
159.233 Lecture 1 - 114
IF/ID
m
EX/MEM u
x
ID/EX
ADD
shift
left2
P
C
Inst
mem
Regs
file
m
u
x
MEM/WB
ADD
ALU
Data
mem
m
u
x
sign
ext
Hazards
Data hazards:
Consider the following sequence of instructions:
sub
and
$2,$1,$3
$12,$2,$5
sub
read
$1,$3
IM
Reg
write
$2
DM
and
read
$2,$5
IM
Reg
Reg
write
$12
DM
Reg
sub reads regs1 and 3 in cycle 2 and passes them to the ALU.
It is not until the cycle 5 that it writes the answer to the Reg File
and reads regs 2 and 5 in cycle 3 and passes them to the ALU.
Reg 2 has not yet been updated!
When and reads the Reg File it gets the wrong value for Reg 2.
This is called a data hazard.
Data hazards occur if an instruction reads a Register that a previous
instruction overwrites in a future cycle.
We must eliminate data hazards or pipelining produces incorrect results.
There are three ways to remove data hazards:
read
$1,$3
IM
Reg
write
$2
DM
Reg
nop
IM
Reg
DM
Reg
nop
IM
Reg
DM
nop
IM
sub
nop
nop
nop
and
Reg
and
read
$2,$5
IM
Reg
$2,$1,$3
$12,$2,$5
This puts the responsibility of removing hazards onto the compiler writer,
and involves no extra hardware.
read
$1,$3
IM
Reg
ID/EX
EX/MEM
MEM/WBwrite
$2
DM
Reg
IF/ID
IF/ID
and
IM
stall
stall
stall
read
$2,$5
Reg
The third way is to notice that although reg 2 does not contain valid
information at the time that the next instruction wants to read it, the
correct information is available!
It just isnt in the right place.
The EX/MEM register contains the output from the ALU.
The result that is destined for reg 2 is part of the EX/MEM register.
IF/ID
sub
read
$1,$3
IM
Reg
ID/EX
DM
IF/IDreadID/EX
and
$2,$5
IM
MEM/WB
write
$2
EX/MEM
Reg
Reg
MEM/WBwrite
$12
EX/MEM
DM
Reg
We can insert the output of the EX/MEM register into the input of the ALU
- data forwarding.
This again involves extra hardware.
It removes all of the stalls and nops of the two previous solutions.
$2,100($1)
$12,$2,$5
In this case the load instruction doesnt get the data from memory until
the DM cycle finishes
IF/ID
lw
IM
ID/EX
read
$1
Reg
DM
IF/IDreadID/EX
and
$2,$5
IM
MEM/WB
write
$2
EX/MEM
Reg
MEM/WBwrite
$12
EX/MEM
Reg
DM
Reg
The original MIPs chip relied on the compiler writer to insert nops trading the hardware space needed to control stalls.
These days processors must run as fast as possible. The extra hardware
for forwarding/stalling is included on the chip.
Clever Compilers!
If the compiler re-orders instructions, hardware stalls for load instructions
can be eliminated:
lw
lw
add
lw
$2,(100)$10
$3,(200)$11
$4,$2,$3
$5,(300)$12
;|
;|one stall added here
Two values are copied from memory to regs 2 and 3, and are then added
together.
A further value is then copied from memory to reg 5
The second lw into reg 3, involves a stall.
If the compiler reorders the instructions:
lw
lw
lw
add
$2,(100)$10
$3,(200)$11
$5,(300)$12
$4,$2,$3
;$3 is available!
The result is the same - but now the add instruction is able to get the
value of reg 3 from the forwarding unit The load of reg 5 has inserted on extra stage into the pipeline.
write
$2
read
$10
lw
Reg
IM
DM
Reg
write
$3
lw
read
$11
IM
Reg
DM
add
read
$2,$3
stall
IM
Reg
Reg
DM
read
$12
lw
Reg
IM
lw
IM
write
$2
read
$10
Reg
DM
lw
read
$11
IM
Reg
Reg
write
$3
DM
lw
read
$12
IM
Reg
Reg
DM
add
read
$2,$3
IM
Reg
no stalls!
159.233 Lecture 1 - 122
Branch Hazards
As soon as we branch to a new instruction, all the instructions that are in
the pipeline behind the branch become invalid!
..
+100
lw
beq
add
$5,(400)$14
$3,$2,100
$7,$8,$9
sub
$7,$8,$9
Either the add or sub instruction after the beq will be executed
depending on the contents of regs 2 and 3.
We can include extra hardware to calculate the branch offset in the
decode cycle. Data forwarding then makes it possible to do the branch
just one cycle later - insert a nop.
lw
IM
write
$5
read
$5,$14
Reg
DM
beq
read
$3,$2
IM
Reg
Reg
DM
Reg
nop
IM
Reg
DM
add
read
$8,$9
IM
Reg
sub
read
$8,$9
IM
Reg
Reg
DM
DM
$5,(400)$14
$3,$2,100
can be re-ordered:
beq
lw
$3,$2,100
$5,(400)$14
beq
read
$3,$2
IM
Reg
lw
IM
DM
Reg
write
$5
read
$14
Reg
DM
add
read
$8,$9
IM
Reg
sub
read
$8,$9
IM
Reg
Reg
DM
DM
Power Control
Power-down uses considerable less power than idle mode, which again
uses less power than normal operations.
Interrupts
Ext Int 0
Timer 0 interrupt
Ext Int 1
Timer 1 interrupt
Serial Port
Either the service routine can fit in 8 bytes (possible) or it will JMP to a
longer routine.
Interrupts can be controlled by using the IE register
EA
ES
EA is a global interrupt enable bit. Interrupts will not happen unless this is
set.
ET1/0 are the timer interrupts
ES is the serial interrupt.
The serial interrupt can happen either on RI or TI being set. The ISR will
have to read these bits to decide which requires servicing. These bits
should be cleared by software.
PS
ASM51
SIM51
0002h
0001h
0000h
reset
for PC
02h
01h
00h
R0 - R7.
00h - 07h
08h - 0fh
Bank 2 is at
10h - 17h
Bank 3 is at
18h - 1fh
Register banks are used when we have more than one activity being
performed at a time.
It is a simple matter to switch to the relevant set of registers, when we
need to swap from one task to the other.
or
0d0h
Most SFRs are concerned with specific i/o functions, we will deal with
them later
SP
PSW
ACC
B
the
the
the
the
stack pointer
program status word
accumulator
B register
Assembler code
test.asm
$MOD2051
cseg
org 00h
mov a,#34
clr a
end
;put 34 in acc
;clear the acc
asm51 test
Notes:
159.233 Lecture 1 - 135
2.
7422
e4
mov a,#34
clr a
The numbers in the 1st column are the address where the machine
code (2nd column) is stored.
and test.hex, which contains machine code in hexadecimal format. It
is read by the simulator.
3.
4.
5.
6.
7.
The program does not stop cleanly. Location 3 can contain anything
and will be executed after the clr instruction
159.233 Lecture 1 - 136
8.
9.
Assembler
Use .asm as an extension for assembler source
In a command prompt
C:> cd \159233
C:159233> asm51 filename
Errors will be reported in the file filename.lst
Simulator
C:159233>
00:
08:
10:
18:
20:
28:
30:
38:
sim51 filename
Data
00 00 00 00 00 00 00 00 ........000:
00 00 00 00 00 00 00 00 ........002:
00 00 00 00 00 00 00 00 ........003:
00 00 00 00 00 00 00 00 ........004:
00 00 00 00 00 00 00 00 ........005:
00 00 00 00 00 00 00 00 ........006:
00 00 00 00 00 00 00 00 ........007:
00 00 00 00 00 00 00 00 ........008:
Code SFR
7422
MOV
A,#22
ACC 00
E4
CLR
A
B 00
FF
MOV
R7,A
SP 07
FF
MOV
R7,A
PSW 00
FF
MOV
R7,A
IP 00
FF
MOV
R7,A
IE 00
FF
MOV
R7,A
P1 FF in
FF
MOV
R7,A
FF out
159.233 Lecture 1 - 137
40: 00 00 00 00 00 00 00 00 ........009: FF
MOV
R7,A
P3 FF in
48: 00 00 00 00 00 00 00 00 ........00A: FF
MOV
R7,A
FF out
50: 00 00 00 00 00 00 00 00 ........00B: FF
MOV
R7,A
SCON 00
58: 00 00 00 00 00 00 00 00 ........00C: FF
MOV
R7,A
SBUF 00 in
60: 00 00 00 00 00 00 00 00 ........00D: FF
MOV
R7,A
00 out
68: 00 00 00 00 00 00 00 00 ........00E: FF
MOV
R7,A
TCON 00
70: 00 00 00 00 00 00 00 00 ........00F: FF
MOV
R7,A
TMOD 00
78: 00 00 00 00 00 00 00 00 ........010: FF
MOV
R7,A
PCON 00
Flags T0 0000
LCD
RegBank 0
C N AC N OV N T1 0000
Display
Cycles
0
PC 0000 DPTR 0000
Tab->next window
F10-EXIT v 1.2
Constants
Numbers:
Can be in
decimal
hexadecimal
binary
34
22h
00100010b
A
ABCD
41h
Hello,0dh,0ah
Expressions:
constants can be formed using expressions:
2+2
1 SHR 8
operators are:
Assembler directives:
+ - * / mod
shr shl
not and or xor
Segment directives:
cseg
dseg
bseg
selects the current segment.
cseg for code and constants
dseg for data.
cseg is active by default
159.233 Lecture 1 - 139
Location counter:
org
number
sets the value of the current segments location. If cseg is active, then
code will be placed at address number after the org directive
Location counter for each segment defaults to 0
End directive
end
signals the end of the source file.
Equate directive
TEN equ 10
Whenever the symbol TEN occurs in the text it will be replaced by 10
Data directive
X
data
PSW
data
34h
0d0h
mov a,#34
Symbols must
DB directive
copyright_msg:
db
(c) Copyright, 1996
allocates space for initialised data in the cseg a byte at a time. Can only
be used when cseg is active
DS directive
buffer:
ds
16
reserves so many bytes (16) of data in RAM for variable data. The dseg
must be active.
DBIT directive
io_map:
dbit
12
reserves so many bits (12) of data in bit addressable RAM for variable
data. The bseg must be active.
The file mod2051 contains many of these directives.
Addressing Modes
There are a variety of ways to address data.
Register Direct
mov a,R3
There are 8 registers R0 - R7. Register direct will access the contents
of the specified register. The contents of R3 are placed into the
accumulator
Register Indirect
mov a,@R0
Only R0 and R1 may be used.
The register now contains an address.
The contents of that address are placed in the accumulator.
159.233 Lecture 1 - 142
Relative addressing
sjmp
start
The instruction contains a signed 8 bit offset that is relative to the first
byte of the following instruction,
Absolute addressing
ajmp
start
The instruction contains an 11 bit destination address to anywhere in the
2KByte program space
Implied addressing
Certain instructions imply a particular address - such as the accumulator.
The PSW contains status bits that reflect the current state of the CPU
7
CY
6
AC
5
F0
4
RS1
3
RS0
2
OV
1
F1
0
P
Arithmetic instructions:
159.233 Lecture 1 - 144
add
addc
subb
inc
inc
dec
dec
mul
div
da
a,<byte>
a,<byte>
a,<byte>
a
<byte>
a
<byte>
ab
ab
a
Addressing
Dir
Ind
X
X
X
X
X
X
A only
X
X
A only
X
X
A and B only
A and B only
A only
modes
Reg
X
X
X
Imm
X
X
X
X
X
Logical instructions:
Addressing modes
Dir
Ind
Reg
anl
a,<byte>
X
X
X
anl
<byte>,a
X
anl
<byte>,#data
X
same for orl and xrl
clr
a
A only
cpl
a
A only
rl
a
A only
rlc
a
A only
same for rr
swap
a
A only
anl
orl
xrl
clr
cpl
AND
OR
XOR
clear
NOT
rl
rlc
rr
rotate right
swap
Imm
X
Data transfer:
mov
mov
mov
push
pop
xch
xchd
a,<src>
<dst>,a
<dst>,<src>
<src>
<dst>
a,<byte>
a,@Ri
Addressing
Dir
Ind
X
X
X
X
X
X
X
X
X
X
X
modes
Reg
X
X
X
Imm
X
X
;c = c and bit
;c = c and not bit
;c = bit
;bit = c
;c = 0
;bit = 0
;c = 1
;bit = 1
;c = not c
;bit = not bit
;jump if c == 1
;jump if c != 1
;jump if bit == 1
;jump if bit != 1
;jump if bit == 1
;then clear bit
Program Branching:
jmp
addr
call
addr
ret
nop
;or sjmp/ajmp/ljmp
;or acall/lcall
;return from subroutine
; no operation
jz
jnz
rel
rel
;jump if A == 0
;jump if A != 0
djnz
<byte>,rel
cjne
cjne
a,<byte>,rel
<byte>,#data,rel
;jump if A != <byte>
;jump if <byte> != #data
sjmp
is an infinite loop.
The relative offset is not 0 though!
It is 0feh
Assembler Programs
$mod2051
n1
n2
n3
data
data
data
14h
15h
16h
org 0
mov
add
mov
a,n1
a,n2
n3,a
end
Write an assembler program to subtract the two numbers stored at
locations 14h and 15h in RAM and place the answer at location 16h
n3 = n1 - n2;
$mod2051
n1
n2
n3
data
data
data
14h
15h
16h
org 0
mov
clr
subb
mov
a,n1
c
a,n2
n3,a
end
data
65h
org 0
mov
orl
mov
a,c1
a,#20h
c1,a
end
Write an assembler program to convert a string at location 45h to lower
case.
i = 0;
while (s[i]) {
s[i] |= 0x20;
i++;
}
$mod2051
dseg
org 45h
s:
ds
32
l1:
cseg
org 0
mov
mov
jz
orl
mov
inc
mov
jmp
r0,#s
a,@r0
l2
a,#20h
@r0,a
r0
a,@r0
l1
159.233 Lecture 1 - 151
l2:
nop
end
a = (*r0);
}
$mod2051
dseg
org 40h
buffer: ds 64
;
cseg
;use R0 to point to first character
;and R1 to point to next one along
;R2 is where we store the answer
;
org 0
;
mov
R0,#buffer ;init R0 and R1
mov
R1,#buffer
inc
R1
mov
R2,#00
;set counter to 0
mov
a,@R0
;look at a char
l0:
jz
stop
;stop when its NUL
clr
c
subb
a,@R1
;sub next char
jnz
l1
inc
R2
;add to counter if zero
l1:
inc
R0
;move to the next pair
inc
R1
mov
a,@R0
;look at next char
sjmp
l0
;loop around the string
stop:
sjmp
stop
end
Subroutine/Function calls
jnz
ret
fct
Another scheme is to have the calling program save registers before the
acall
and recover them afterwards.
push
acall
pop
5
sub
5
This suffers from the drawback that the main routine may save registers
that the subroutine doesn't use.
A final scheme is for the subroutine to document which registers it uses
and for the calling program to save only the ones that it wants.
Status flags are typically always altered by the actions of the subroutine.
Parameter passing:
1.
In Registers.
In a parameter block.
3.
Parameters can be retrieved by finding out the return address (on the
stack) and reading the parameters from that area of memory.
Can't do this if the program is in PROM!
The return address on the stack has to be fiddled before the ret will
work. Otherwise the program will return and try and execute the first
parameter.
4.
On the stack
SP
4a
49
return
48
address
47
ARGUMENT 1
46
ARGUMENT 2
45
ARGUMENT 3
44
ARGUMENT 4
Input / Output
The 2051 has:
2 parallel ports - port 1 and port 3
2 timers - timer 0 and timer 1
1 serial port.
(Output pins are shared between uses,
not all of the I/O capability will be available).
Timers:
The timers can be used as timers or counters.
(use either the system clock, or an external pulse).
Timer 1 can be used as the bit rate clock for the serial port.
The timers have several modes we will look at
Mode 1:
16 bit up counter - each cycle will cause the 16 bit pair TH0/TL0 to
increment. When the count gets to 0ffffh it wraps to 0000h and sets
an overflow flag.
Mode 2:
8 bit up counter with automatic reload.
159.233 Lecture 1 - 159
The TMOD register controls the mode setting for the two timers:
0
M1
M0
timer 1
M1
M0
timer 0
Serial I/O
Has 4 modes - look only at mode 1 (8bit UART).
SBUF is really two registers. One for input and one for output. Serial I/O
works in full-duplex, so the two registers will have different values.
Serial I/O is controlled by the SCON register
SM0 SM1 0
TI
RI
The TI bit indicates that the previous output character has been
transmitted. It should be cleared by software.
The RI bit indicates that a character has been received. It should be
cleared by software.
In the simulator (if the timer and serial i/o systems have been configured)
- the serial output is displayed at the bottom of the screen. Keys typed on
the keyboard appear as received characters in SBUF
Parallel I/O
Parallel ports are for general I/O. They are all bi-directional - but if we
want to input data we must ensure that we write 1s to the pins. (they are
not full-duplex!)
The simulator is configured so that the parallel ports drive an LCD
display.
P1 holds the character to be displayed.
P3.5 acts as a strobe signal. The LCD display will read the value from
P1 when P3.5 changes from a 1 to a 0.
The real LCD is a little more complicated.
It must first be intialised.
After sending a character, either the CPU must delay for a small time
before sending another, or must wait for a signal from the LCD.
The LCD display also has a command mode (clearing the screen,
moving the cursor etc) and a data mode (ASCII characters).