Introduction To Information Theory Channel Capacity and Models

Introduction to Information
theory
channel capacity and models
A.J. Han Vinck
University of Essen
May 2009
This lecture
 Some models
 Channel capacity
 Shannon channel coding theorem
 converse
some channel models
Input X P(y|x) output Y
transition probabilities
memoryless:
- output at time i depends only on input at time i
- input and output alphabet finite
Example: binary symmetric channel (BSC)
1-p
Error Source
0 0
E
p
X Y = X ⊕E
+ 1 1
Input Output
1-p
E is the binary error sequence s.t. P(1) = 1-P(0) = p

X is the binary information sequence
Y is the binary output sequence
from AWGN
to BSC
Homework: calculate the capacity as a function of A and σ 2

Other models
1-e
0 0 (light on) 0 0
e
X p Y
1-p 1 (light off)
1
e
1 1
P(X=0) = P0
P(X=0) = P0
Z-channel (optical) Erasure channel

(MAC)
Erasure with errors
1-p-e
0 0
e
p
p e
1 1
1-p-e
burst error model (Gilbert-Elliot)
Random error channel; outputs independent

Error Source P(0) = 1- P(1);
Burst error channel; outputs dependent

P(0 | state = bad ) = P(1|state = bad ) = 1/2;
Error Source
P(0 | state = good ) = 1 - P(1|state = good ) = 0.999
State info: good or bad transition probability

Pgb
Pgg good bad Pbb
Pbg
channel capacity:
I(X;Y) = H(X) - H(X|Y) = H(Y) – H(Y|X) (Shannon 1948)
H(X) H(X|Y)
X Y
channel
notes:
capacity depends on input probabilities
max I(X; Y) = capacity
because the transition probabilites are fixed
P( x )
Practical communication system design
Code book
Code receive
message word in
estimate
2k channel decoder
Code book
with errors
n
There are 2k code words of length n
k is the number of information bits transmitted in n channel uses
Channel capacity
Definition:
The rate R of a code is the ratio k/n, where
k is the number of information bits transmitted in n channel uses
Shannon showed that: :

for R ≤ C
encoding methods exist
with decoding error probability 0
Encoding and decoding according to Shannon
Code: 2k binary codewords where p(0) = P(1) = ½

Channel errors: P(0 →1) = P(1 → 0) = p
i.e. # error sequences ≈ 2nh(p)
Decoder: search around received sequence for codeword
with ≈ np differences
space of 2n binary sequences

decoding error probability
1. For t errors: |t/n-p|> Є

→ 0 for n → ∞
(law of large numbers)
2. > 1 code word in region

(codewords random)
2 nh( p)
P(> 1) ≈ (2 k − 1)
2n
→ 2 − n (1− h ( p)− R ) = 2 − n (C BSC− R ) → 0
k
for R = < 1 − h (p)
n
and n → ∞
channel capacity: the BSC
1-p I(X;Y) = H(Y) – H(Y|X)
0 0 the maximum of H(Y) = 1

X p Y since Y is binary
1 1 H(Y|X) = h(p)
1-p = P(X=0)h(p) + P(X=1)h(p)
Conclusion: the capacity for the BSC CBSC = 1- h(p)

Homework: draw CBSC , what happens for p > ½
channel capacity: the BSC
Explain the behaviour!

1.0
Channel capacity
0.5 1.0
Bit error p
channel capacity: the Z-channel
Application in optical communications
0 0 (light on) H(Y) = h(P0 +p(1- P0 ) )

X p Y
H(Y|X) = (1 - P0 ) h(p)
1-p 1 (light off)
1
For capacity,
P(X=0) = P0 maximize I(X;Y) over P0
channel capacity: the erasure channel
Application: cdma detection
1-e
0 0 I(X;Y) = H(X) – H(X|Y)
e
H(X) = h(P0 )
X Y
H(X|Y) = e h(P0)
e
1 1
Thus Cerasure = 1 – e
P(X=0) = P0
(check!, draw and compare with BSC and Z)
Erasure with errors: calculate the capacity!
1-p-e
0 0
e
p
p e
1 1
1-p-e
0 0
1/3
example 1 1
1/3
2 2
 Consider the following example
 For P(0) = P(2) = p, P(1) = 1-2p
H(Y) = h(1/3 – 2p/3) + (2/3 + 2p/3); H(Y|X) = (1-2p)log23
Q: maximize H(Y) – H(Y|X) as a function of p

Q: is this the capacity?
hint use the following: log2x = lnx / ln 2; d lnx / dx = 1/x

channel models: general diagram
P1|1 y1
x1
P2|1 Input alphabet X = {x1, x2, …, xn}
P1|2
x2 y2
P2|2 Output alphabet Y = {y1, y2, …, ym}
: Pj|i = PY|X (yj|xi)
:
:
:
: In general:
:
xn calculating capacity needs more
Pm|n
theory
ym
The statistical behavior of the channel is completely defined by
the channel transition probabilities Pj|i = PY|X (yj|xi)
* clue:
I(X;Y)
is convex ∩ in the input probabilities
i.e. finding a maximum is simple

Channel capacity: converse
For R > C the decoding error probability > 0
Pe
k/n
C
Converse: For a discrete memory less channel
channel
Xi Yi
n n n n
I ( X ; Y ) = H (Y ) −∑ H (i Y | X
n n n
i
) ≤∑ H (i Y ) −∑ H (i Y | iX ) =∑I ( iX ;i Y ) nC≤
i =1 i 1= i 1 = i 1 =
Source generates one

source encoder channel decoder
out of 2k equiprobable
m Xn Yn m‘
messages
Let Pe = probability that m‘ ≠ m

converse R := k/n
k = H(M) = I(M;Yn)+H(M|Yn)
1 – C n/k - 1/k ≤ Pe
≤ I(Xn;Yn) + 1 + k Pe
≤ nC + 1 + k Pe
Pe ≥ 1 – C/R - 1/nR
Hence: for large n, and R > C,
the probability of error Pe > 0
We used the data processing theorem
Cascading of Channels
I(X;Z)
X Y Z
I(X;Y) I(Y;Z)
The overall transmission rate I(X;Z) for the cascade can

not be larger than I(Y;Z), that is:
I(X; Z) ≤ I(Y; Z)
Appendix:
Assume:
binary sequence P(0) = 1 – P(1) = 1-p
t is the # of 1‘s in the sequence
Then n → ∞ , ε > 0
Weak law of large numbers
Probability ( |t/n –p| > ε ) → 0
i.e. we expect with high probability pn 1‘s

Appendix:
Consequence:
1. n(p- ε ) < t < n(p + ε ) with high

probability
n ( p + ε) n n
2. ∑   ≈ 2nε  ≈ 2nε2 nh ( p)
n ( p −ε) t  pn 
1 n
lim n log2 2nε  → h ( p) h (p) = − p log 2 p − (1 − p) log 2 (1 − p)
3. n→ ∞  pn 
Homework: prove the approximation using ln N! ~ N lnN for N large.
N − N
Or use the Stirling approximation: N!→ 2π N N e
Binary Entropy: h(p) = -plog2p – (1-p) log2 (1-p)
1
h
0 .9 Note:
0 .8
0 .7 h(p) = h(1-p)
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1
p
Capacity for Additive White Gaussian Noise
Noise
Input X Output Y
Cap := sup [H(Y) − H( Noise )]

p( x )
x 2 ≤S / 2 W W is (single sided) bandwidth
Input X is Gaussian with power spectral density (psd) ≤S/2W;
Noise is Gaussian with psd = σ 2

noise
Output Y is Gaussian with psd = σ y

2=
S/2W + σ 2
noise
For Gaussian Channels: σ y

2=
σ x
2
+σ noise
2
Noise
X Y X Y
Cap = 12 log 2 (2πe(σ 2x + σ 2noise )) − 12 log 2 (2πeσ 2noise ) bits / trans.
σ 2noise + σ 2x
= 12 log 2 ( ) bits / trans.
σ 2
noise
σ 2noise + S / 2 W
Cap = W log 2 ( ) bits / sec .
σ 2noise
1 −z2 / 2 σ2z
p(z) = e ; H(Z) = 21 log2 (2πeσ2z ) bits
2π σ2z
Middleton type of burst channel model
0 0
1 1
Transition
probability P(0)
channel 1
channel 2
Select channel k …
with probability channel k has
Q(k) transition
probability p(k)
Fritzman model:
multiple states G and only one state B

Closer to an actual real-world channel
1-p
G1 … Gn B
Error probability 0 Error probability h
Interleaving: from bursty to random
bursty
Message interleaver channel interleaver -1 message

encoder decoder
„random error“
Note: interleaving brings encoding and decoding delay
Homework: compare the block and convolutional interleaving w.r.t. delay

Interleaving: block
Channel models are difficult to derive:

- burst definition ?
- random and burst errors ?
for practical reasons: convert burst into random error
read in row wise 1 0 1 0 1

transmit
0 1 0 0 0
0 0 0 1 0 column wise
1 0 0 1 1
1 1 0 0 1
De-Interleaving: block
read in column 1 0 1 e 1
read out
wise
0 1 e e 0
0 0 e 1 0
this row contains 1 error 1 0 e 1 1

row wise
1 1 e 0 1
Interleaving: convolutional
input sequence 0
input sequence 1 delay of b elements
•••
input sequence m-1 delay of (m-1)b elements
in
Example: b = 5, m = 3
out

Introduction To Information Theory Channel Capacity and Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Information Theory Channel Capacity and Models

Uploaded by

Copyright:

Available Formats

Introduction to Information

Input X P(y|x) output Y

E is the binary error sequence s.t. P(1) = 1-P(0) = p

Homework: calculate the capacity as a function of A and σ 2

Z-channel (optical) Erasure channel

Random error channel; outputs independent

Burst error channel; outputs dependent

State info: good or bad transition probability

Shannon showed that: :

Code: 2k binary codewords where p(0) = P(1) = ½

space of 2n binary sequences

1. For t errors: |t/n-p|> Є

2. > 1 code word in region

1-p I(X;Y) = H(Y) – H(Y|X)

0 0 the maximum of H(Y) = 1

Conclusion: the capacity for the BSC CBSC = 1- h(p)

Explain the behaviour!

Application in optical communications

0 0 (light on) H(Y) = h(P0 +p(1- P0 ) )

Application: cdma detection

 For P(0) = P(2) = p, P(1) = 1-2p

H(Y) = h(1/3 – 2p/3) + (2/3 + 2p/3); H(Y|X) = (1-2p)log23

Q: maximize H(Y) – H(Y|X) as a function of p

hint use the following: log2x = lnx / ln 2; d lnx / dx = 1/x

i.e. finding a maximum is simple

For R > C the decoding error probability > 0

Source generates one

Let Pe = probability that m‘ ≠ m

The overall transmission rate I(X;Z) for the cascade can

i.e. we expect with high probability pn 1‘s

1. n(p- ε ) < t < n(p + ε ) with high

Cap := sup [H(Y) − H( Noise )]

x 2 ≤S / 2 W W is (single sided) bandwidth

Input X is Gaussian with power spectral density (psd) ≤S/2W;

Noise is Gaussian with psd = σ 2

Output Y is Gaussian with psd = σ y

For Gaussian Channels: σ y

Cap = 12 log 2 (2πe(σ 2x + σ 2noise )) − 12 log 2 (2πeσ 2noise ) bits / trans.

multiple states G and only one state B

Message interleaver channel interleaver -1 message

Note: interleaving brings encoding and decoding delay

Homework: compare the block and convolutional interleaving w.r.t. delay

Channel models are difficult to derive:

read in row wise 1 0 1 0 1

this row contains 1 error 1 0 e 1 1

You might also like