You are on page 1of 25

Dr. M.

Arif Wahla
EE Dept
arif@mcs.edu.pk

Military College of Signals
National University of Sciences & Technology (NUST), Pakistan
Class webpage: http://learn.mcs.edu.pk/courses/

Arithmetic Coding
Lemple Ziv Coding
12:18 PM
2
Lecture #3
Arithmetic Coding
Lempel Ziv Coding
For large n, the implementation of Huffman Coding: The
Retired Champion can easily become unwieldy or unduly
restrictive. The problem includes:

The size of the Huffman code table is q
n
, representing an
exponential increase in memory and computational
requirements.

The code table needs to be transmitted to the receiver.

The source statistics are assumed stationary. If there are
changes, an adaptive scheme is required which re-estimates
the probabilities, and recalculates the Huffman code table.
The solution to this problem is Arithmetic Coding.


Fall 2011
3
Arithmetic Coding
Consider the N-length source message S
i1
, S
i2
,,S
iN
where {S
i
: i=1,2,,q} are
the source symbols and S
ij
indicates that the j
th
character in the message is the
source symbol s
i
.
Arithmetic coding assumes that following probabilities are available.


The goal of arithmetic coding is to assign a unique interval along the unit number
line or probability line [0,1] of length equal to the probability of the given
source message, with its position on the number line
given by the cumulative probability of the given source message.






Fall 2011
4
Arithmetic Coding
Arithmetic coding completely bypasses the idea of
replacing an input symbol with a specific code.
Instead, it takes a stream of input symbols and replaces it
with a single floating point output number.
The longer (and more complex) the message, the more bits
are needed in the output number.
It was not until recently that practical methods were found
to implement this on computers with fixed sized registers.
Fall 2011
5
Arithmetic Coding
Example











If we pick in such a way that it is possible to later decompose b back into
original sequence, the code can be decoded
Fall 2011
6
Arithmetic Coding
} 05 . 0 , 15 . 0 , 3 . 0 , 5 . 0 { ) (
] , , , [
3 2 1 0
= =
=
A P P
a a a a A
A
111 ) ( 05 . 0
110 ) ( 15 . 0
10 ) ( 3 . 0
0 ) ( 5 . 0
3 3
2 2
1 1
0 0
=
=
=
=
a C p
a C p
a C p
a C p
] 1 95 . 0 [ Interval 05 . 0
] 95 . 0 8 . 0 [ Interval 15 . 0
] 8 . 0 5 . 0 [ Interval 3 . 0
] 5 . 0 0 [ Interval 5 . 0
3 3
2 2
1 1
0 0
= =
= =
= =
= =
I p
I p
I p
I p
1 0 interval in lying numbers real into encoded is each Suppose < s e
i i i
A a | |
...
as of versions scaled the adding by ... sequence the encode we Suppose
2 2 1 1 0
1 0
+ A + A + = | | |
|
b
s s
i
1 and 0 between interval in the lying also scale
decreasing lly monotonica is and to ing correspond number code the is
i i i
S |

i

Construct the code interval to represent a block of symbols


as


Any convenient b within this range is a suitable codeword
representing the entire block of symbols.
Algorithm on next slide

Fall 2011
7
Arithmetic Coding Process
] , , [ H b L H L I
b
< s =

Fall 2011
8
Arithmetic Coding Algorithm
1
1
Assume each has been assigned an interval [ , ]
initialize 0, 0 and 1
REPEAT
read next
.
.
1
UNTIL all have been encoded
i
i
i
j j
j j
i
j j l
j j h
i
a A I S S
i l h
i i
j L H
H L
a
L L S
H L S
j j
a
+
+
e =
= = =
A =
= + A
= + A
= +
Encode the
sequence a
1
a
0
a
0
a
3
a
2





Fall 2011
9
Example 1.6.2:
] 1 95 . 0 [ Interval 05 . 0
] 95 . 0 8 . 0 [ Interval 15 . 0
] 8 . 0 5 . 0 [ Interval 3 . 0
] 5 . 0 0 [ Interval 5 . 0
3 3
2 2
1 1
0 0
= =
= =
= =
= =
I p
I p
I p
I p
1 1

______________________________________
i j j j j
j a L H L H
+ +
A
1
0 0 1 1 0.5 0.8 a
0
1 0.5 0.8 0.3 0.5 0.65 a
0
2 0.5 0.65 0.15 0.5 0.575 a
3
3 0.5 0.575 0.075 0.57125 0.575 a
2
4 0.57125 0.575 0.00375 0.57425 0.5748125 a
any within the final interval will suffice for a codeword
one choice is = 0.5748125

1
1
0 and 1
initialize 0
.
.
1
j j
i
i
j j
j j l
j j h
L H
j
H L
L L S
H L S
j j
+
+
= =
=
A =
= + A
= + A
= +

Fall 2011
10
Arithmetic Coding Algorithm
1
1
Assume each has been assigned an interval [ , ]
initialize 0, 0 and 1
REPEAT
read next
.
.
1
UNTIL all have been encoded
i
i
i
j j
j j
i
j j l
j j h
i
a A I S S
i l h
i i
j L H
H L
a
L L S
H L S
j j
a
+
+
e =
= = =
A =
= + A
= + A
= +
Encode the sequence a
1
a
0
a
0
a
3
a
2





Fall 2011
11
Example 1.6.2:
] 1 95 . 0 [ Interval 05 . 0
] 95 . 0 8 . 0 [ Interval 15 . 0
] 8 . 0 5 . 0 [ Interval 3 . 0
] 5 . 0 0 [ Interval 5 . 0
3 3
2 2
1 1
0 0
= =
= =
= =
= =
I p
I p
I p
I p
5748125 0 is choice one
codeword a for suffice will interval final e within th any
.5748125 0 .57425 0 00375 . 0 575 . 0 57125 . 0 4
0.575 .57125 0 075 . 0 575 . 0 5 . 0 3
.575 0 .5 0 15 . 0 65 . 0 5 . 0 2
65 . 0 .5 0 0.3 8 . 0 5 . 0 1
.8 0 .5 0 1 1 0 0
________ __________ __________ __________

2
3
0
0
1
1 1
. b
b
a
a
a
a
a
H L H L a j
j j j j i
=
A
+ +

Fall 2011
12
Decoding Arithmetic Codes - Algorithm
initialize 0 , 1 and H-L
REPEAT
find such that

OUTPUT Symbil
.
.
UNTIL last ymbol have been decoded
i
i
i
i
h
l
L H
i
b L
I
a
H L
H L S
L L S
H L
s
= = A =

e
A
A =
= + A
= + A
A =
: follows as is procedure decoding the value code Given the b
Tricks:
Use a special stop symbol for
sequences of variable length
Pay attention to precision in
calculations.

Decode b =0.5748125

Solution
Fall 2011
13
Example 1.6.3
i
next next next
______________________________________________________________
0 1 1
i
L H I H L a A A
1 1
0 0
0.8 0.5 0.3
0.5 0.8 0.3 0.65 0.5 0.15
0.5 0.65
I a
I a
0 0
3 3
0.15 0.575 0.5 0.075
0.5 0.575 0.075 0.575 0.57125 0.00375
0.57125 0.575 0.0
I a
I a
2 2
0375 0.5748125 0.57425 0.0005625 I a
] 1 95 . 0 [ Interval 05 . 0
] 95 . 0 8 . 0 [ Interval 15 . 0
] 8 . 0 5 . 0 [ Interval 3 . 0
] 5 . 0 0 [ Interval 5 . 0
3 3
2 2
1 1
0 0
= =
= =
= =
= =
I p
I p
I p
I p

i
b L
I

e
A
Huffman codes require the knowledge of probability
distribution of source symbols, which may not always be
available.

Dictionary codes dynamically construct their own
coding/decoding table on the fly by looking at the present
data stream. Probability distribution is not known.

The strings are coded instead of symbols.

These codes are only efficient for long files.

LZ codes belong to a practical class of dictionary codes.


15
Dictionary Codes and Lempel-Ziv Coding

Lempel-Ziv Codes suffer no significant decoding delay at receiver.

Prior knowledge of decoding table is not required. Required
information is transmitted within the message.

Huffman codes assign variable length code to fixed symbol size,
whereas LZ codes, encode the string of variable length with fixed
code size.

LZ coding is a mirror image of Huffman coding.


The LZ algorithm which we will consider in our course is a slight
modification of original LZW algorithm.
16
Lempel-Ziv Coding
Addr-
ess
m
Dictionary
Entry
n a
i

0 0 Null
1 0

a
o
2 0 a
1


m 0 a
m


M 0 a
M

Initializing LZ Algorithm
To define the structure of dictionary
Each entry (n,a
i
) in dictionary is given an address
m.
a
i
is a symbol drawn from souce A and n is a
pointer to an other location.
n is represented by a fixed length word of b bits.
Dictionary contains total number of entries less
than or equal to 2
b
.
The algo is initialized by constructing first M+1
entries.
The 0 address entry is a null symbol. It is used to let
decoder know the end of string.
Pointer n=0 for first M+1 entries, it points to the
null entry at address 0.
m=m+1 points to next blank location in dictionary

Initialize pointer n =0 and
m=M+1
1. Fetch next source symbol a
i
; where i=0,1,2,,M-1.
2. If the ordered pair < n, a
i
> is already in dictionary then
Next n = dictionary address of entry < n, a
i
> ;
else
transmit n
create new dictionary entry <n, a
i
> at dictionary address m
m= m+1
n = dictionary address of entry < 0, a
i
> ;
3. Return to step 1.


18
LZ Algorithm
Addr
-ess
m
Dic Entry
n a
i

0 0 null
1 0

0

2 0 1
19
Present
n
Source
a
i

Present
m
Transmit
n
Next
n
Dic Entry
n , a
i
0 1 3 2 2
1
3 2
2, 1
2
4
0
2
2 2, 0
5
1 1
0
1
1, 0 1
6 0
1
5 5
6 5
2
5, 1
1
7 0 4
7
4
1
2
4 4, 1
0 1 3 2
2 1 3 2 2 2,1
2 0 4 2 1 2,0
1 0 5 1 1 1,0
1 0 6 5
5 1 6 5 2 5,1
2 0 7 4
4 1 7 4 2 4,1
2 1 8 3
3 0 8 3 1 3,0
1 0 9 5
5 1 9 6
6 0 9 6 1 6,0
1 1 10 1 2 1,1
20
Present
n
Source Present
m
Transmit
n
Next
n
Dic Entry
n , a
i
Address
m
Dic Entry
n a
i

0 0 null
1 0

0

2 0 1
3 2 1
4 2 0
5 1 0
6 5 1
7 4 1
8 3 0
9 6 0
10 1 1
Decoder must construct a dictionary similar to the encoder
We know that encoder doesnt transmit as many code-
words as it has source symbols
21
Lempel-Ziv Decoding
0 1 3 2
2 1 3 2 2 2,1
2 0 4 2 1 2,0
1 0 5 1 1 1,0
1 0 6 5
5 1 6 5 2 5,1
2 0 7 4
4 1 7 4 2 4,1
2 1 8 3
3 0 8 3 1 3,0
1 0 9 5
5 1 9 6
6 0 9 6 1 6,0
1 1 10 1 2 1,1
22
Present
n
Source Present
m
Transmit
n
Next
n
Dic Entry
n , a
i
Address
m
Dic Entry
n a
i

0 0 null
1 0

0

2 0 1
3 2 1
4 2 0
5 1 0
6 5 1
7 4 1
8 3 0
9 6 0
10 1 1
Decoder must construct a dictionary similar to
the encoder
We know that encoder doesnt transmit as
many code-words as it has source symbols
Decoding Operation goes as:
Reception of any code-word means that a new
dictionary entry must be constructed
Pointer n for this new entry is the same as the
received codeword
Source symbol a
i
for this entry is not yet known
because it is the route symbol for next string (not yet
transmitted). Such entry is called partial entry <n,
?> at address m
This entry can fill in the missing symbol a
i
of
previous entry at address m-1

23
Lempel-Ziv Decoding
Address
m
Dic Entry
n a
i

0 0 null
1 0

0

2 0 1
3 2 1
4 2 0
5 1 0
6 5 1
7 4 1
8 3 0
9 6 0
10 1 1
Source symbol a
i
for this entry is not yet known because it is the route symbol
for next string (not yet transmitted). Such entry is called partial entry <n, ?>
at address m
This entry can fill in the missing symbol a
i
of previous entry at address m-1
It can also decode the source string associated with codeword n
Root symbol is the first symbol of the string having pointer 0
The last symbol of the string is the symbol belonging to entry at address
pointed by the pointer of last updated entry.
m=m+1 should updated probably just after completing the entry at address m
If pointer n points to the entry having pointer n=0 then we decode string
If pointer n points to the entry having pointer n nonzero, then this non zero
pointer will further connect us to another address. This will continue until we
reach a zero pointer
24
Lempel-Ziv Decoding
Lempel-Ziv Decoding
25
Address
m
Dictionary
entry
n a
i
Address
m
Dic Entry
n a
i

0 0 null
1 0

0

2 0 1
3 2 1
4 2 0
5 1 0
6 5 1
7 4 1
8 3 0
9 6 0
10 1 1 Decoded
message
3 2,?
4 2, ?
1 1
5 1,?
0
6 5,?
1
0 0 0
0
7 4,?
1 1
0
8
0 0, null
1 0,0
2 0,1
3,?
Huffman coding require the knowledge of apriori, otherwise we have to
determine the apriori through estimation such as:


For the source with M alphabets, ther average # of bits/code for the two cases will
be


12:18 PM
Concept of Probability and Entropy
26
Huffman Coding Efficiency
2
2
2
2 2
1 1

and
1 1

( )

practicaly , so
1

Let the mean sqaured error be and using Lagrange multiplier


1

( )
i i i i
i i
i i i i i
i i
i i
i i
i
i i
i i
L p l L p l
M M
L L L p l l e l
M M
l l
L e l
M
L l l
M
o
= =
A = = +
~
A =

| |
A =
|
\ .


2
0 L
o
(
(
(

A ~

You might also like