Telecommunications Engineering II by Jorma Kekalainen

Lecture notes Telecommunications Engineering II by Jorma Kekalainen
Telecommunications
Engineering II
Jorma Kekalainen
Contents
Topics include
Signals and noise
Fourier analysis
Digital transmission
Statistical and linear algebra tools for advanced
telecommunications
Multi-carrier modulation
Coding and information theory
Diversity techniques
Wireless overview
1
Information Sources
1. Usually, the most precise sources are the original
sources, i.e. standards, recommendations or other
specifications.
You can pull them from the Internet e.g.
- ITU-T www.itu.int/ITU-T/
- IETF www.ietf.org
- 3GPP www.3gpp.org
or from elsewhere.
2. You can look for material from corresponding

courses in the Internet
3. Some may find that the books are easier to read.
4. Many slides are adapted from the following books or
lecture notes based on those books
Books
Carlson et al.: Communication Systems: An Introduction to Signals and
Noise in Electrical Communication
Haykin: Communication Systems
Haykin & van Veen: Signals & Systems
Freeman: Radio System Design for Telecommunications
Goldsmith: Wireless Communications
Murthy et al.: Ad Hoc Wireless Networks: Architectures and Protocols
Pahlavan et al.: Principles of Wireless Networks: A Unified Approach
Rappaport: Wireless Communications: Principles and Practice
Roddy: Satellite Communications
Skolnik: Introduction to Radar Systems
Stallings: Wireless Communication and Networks
Tse: Fundamentals of Wireless Communication
2
Introduction
Communication, message and signal
3
Basic model for telecommunication
Measure of information
4
Transmission channels
Cables
wire pairs (e.g., ordinary telephone line)
coaxial cable
waveguide (metallic waveguide and optical fiber)
More or less free space radio transmission
broadcasting
point-to-point microwave transmission
satellite position transmission
cell networks
(Portable magnetic/electronic/optical memory equipment)
Mode of communication
5
Telecommunication and EM spectrum
Analogue vs. digital communication
6
Constraints
Note: Latency (one-way) is the time from

start of packet transmission to the start
of packet reception.
Reliability of communication
7
Example: Performance of coding
Digital communication system: Structure
Noise
Transmitted Received Received
Info. signal signal info.
SOURCE
Source Transmitter Channel Receiver User
Transmitter
Source Channel
Formatter Modulator
encoder encoder
Receiver
Source Channel
Formatter Demodulator
decoder decoder
8
Formatting and transmission of baseband

signal
Digital info.
Textual Format
source info.
Pulse
Analog Transmit
Sample Quantize Encode modulate
info.
Pulse
Bit stream waveforms Channel
Format
Analog
info. Low-pass
Decode Demodulate/
filter Receive
Textual Detect
sink
info.
Digital info.
Digital transmission system
9
Signals of communication system
Signals and Systems
Deterministic signals and

LTI systems
10
Time functions
21
Continuous-time vs. discrete-time
22
11
Continuous-valued vs. discrete-valued

signal
23
Deterministic vs. random (stochastic)
24
12
Causal vs. anticausal vs. non-causal
25
Even and odd signals
26
13
Continuous-time signals
27
Sine signal
28
14
Sinusoids
29
Cosine signal
30
15
Equivalence of sinusoidals
31
Some trigonometric identities
32
16
Exponentials
33
Other classification
34
17
Period
35
Complex exponentials
36
18
Complex exponential
37
Complex exponential
38
19
Complex exponential vs. real sinusoids
39
Complex exponential vs. real sinusoids
40
20
Signal power and energy
41
Energy signals
42
21
Power signals
43
Impulse function
Note: For analogue impulse

Lim {x(t)|t0 } = 44
22
Impulse
45
Dirac-delta function
46
23
Application
47
Advanced definition
48
24
Step function
49
Another definition
50
25
Step function
51
Application
52
26
Step response and masking
53
Synthesis
54
27
Signum function
55
Rectangular pulse
56
28
Rectangular pulse
Also notation:
57
Basic operations on signals
58
29
Amplitude scaling
59
Examples
60
30
Time scaling
61
Note: Downsampling is sometimes called decimation
Examples
62
31
Time shifting
63
Examples
64
32
Reflection
65
LTI systems
66
33
System
System (e.g. electric network) is specified by:

the functional description of the system blocks,
the interconnection rules between system blocks,
and
the topology.
67
Example: Basic blocks
68
34
Example: Basic blocks
69
System: Definition
70
35
Basic system concepts
71
Memoryless components and systems

The output of the system vo(t0) at time t0 depends only on the input at the
same time vi(t0).
Voltage divider: Discrete gain block:
72
36
Systems and components with

memory
The system output at t0 depends on past values of the input t t0.
Capacitor: Inductor: Unit delay:
73
Basic system concepts
74
37
Linear, time-invariant systems
75
Linearity and superposition

Analog signals and systems
a1 aN are any arbitrary constants, and x1(t) xN(t) are any arbitrary continuous-
time signals.
Discrete signals and systems
a1 aN are any arbitrary constants, and x1(n) xN(n) are any arbitrary discrete-
time signals
76
38
LTI system
77
Linearity condition
78
39
Linearity condition
79
Note
80
40
Principle of superposition
81
Time-invariance
82
41
Impulse response
83
84
42
Impulse response
85
Example
86
43
Response of LTI system
87
Convolution
88
44
Convolution
89
Convolution
90
45
Impulse response
91
Causal system
92
46
Causal LTI system
93
Causal LTI system
94
47
Time-invariance and causality
95
Cascade LTI systems
96
48
Parallel LTI systems
97
Examples: Interconnections of
systems/components
Series: e.g. transmitter-channel-receiver
Parallel: e.g. multiple antennas Feedback: e.g. control systems (phase-

locked loop)
98
49
Properties of LTI systems
99
Step response
100
50
System models: Impulse response
101
System models: Frequency response
102
51
System models: Differential equation

Example:
103
System models: Difference equation

Example:
104
52
Fourier Analysis
Eigenfunctions of LTI system
106
53
Complex exponential input
107
Complex exponential as an
eigenfunction
108
Note: Only one frequency, namely f0.
54
Frequency function = eigenvalue
Note: Of course, actually H(f0) is the value of the corresponding function

109
H(f) at some fixed frequency f0.
Sinusoids are eigenfunctions of LTI

systems
110
55
Frequency content
111
Fourier transform
112
56
Fourier transform pair
Note:
113
Frequency content of sinusoid
114
57
115
116
58
Spectrum of sinusoid
117
Positive and negative frequency
118
59
Frequency content of the rectangular

pulse
119

pulse
120
60

pulse
121

pulse
122
61
Spectrum function value at DC
Note: 123
Discrete-Time Fourier Series (DTFS =

DFT)
124
62
Fourier Series (FS)
125
Discrete-Time Fourier Transform

(DTFT)
126
63
Fourier Transform (FT)
127
F-representations
128
64
Existence of F-representations
129
Gibbs effect
130
65
Gibbs effect
131
Properties of F-representations
132
66
133
134
67
Note
135
Major properties of the F-transform
136
68
137
138
69
139
F{convolution}
140
70
y=xh XH=Y
141
Power and energy
142
71
Energy
143
Parsevals formula
144
72
Power spectrum
145
Some useful F-transform pairs
146
73
147
148
74
149
150
75
Modulation
151
Modulation
152
76
Periodic extension
153
Transform of the periodic extension
154
77
Properties of the discrete-time F-

series
Note: The DTFS is also called the Discrete Fourier Transform (DFT)
155
Properties of the discrete-time F-

transform
156
78
Properties of the F-series
157
Properties of the F-transform
158
79
Frequency response and filtering
159
Impulse response Frequency response
(y=xh XH=Y)
160
80
Frequency response
161
Frequency function impulse function
Note: Previously presented A(f0)=|H(f0)| and (f0)=arg{H(f0)} are the162

values of the corresponding functions at the frequency f0.
81
Distortionless system
163
Distortionless system
164
82
Distortion and dispersive system
165
Ideal filters
166
83
Ideal filters
Ideal filters are physically unrealizable, in the sense that their
characteristics cannot be achieved with a finite number of
elements.
For example. the impulse response of an ideal filter is a sinc-

function, which is infinite long and noncausal.
The filters used in practical real time applications must be causal, that is
h(t)=0 for t<0

167
Realizable filters
168
84
Real filters and their ideal

counterparts
169
Equalizers
If we know the frequency response of a system then, given the
output, we can discover the input using the equation
H(j)0
To build an inverse system, we must design it so that its frequency

response is H(j)-1
Inverse systems are required in communications, where they are
known as equalizers.
The system to be inverted is the channel.
Ideal equalizers, like ideal filters, cannot be realized.
A communications channel introduces a delay that cannot be
undone!
Even distortionless equalizers are generally unrealizable.
Example: Loading coils on telephone lines. Inductors are placed in shunt across the line
every km or so to improve frequency flatness over voice frequencies.
85
Example: Equalization
If the transfer function of the channel that causes linear
distortion is known, it can be compensated by the inverse
transfer function.
If the transfer functions of the channel and equalizer are

Hc(f) and Heq(f), respectively, then the transfer function of
the overall transmission system is
H(f) = Hc(f)Heq(f)
Channel Equalizer
171
Quadrature filter and Hilbert transform
172
86
Quadrature filter and Hilbert transform
173
Simple example of Hilbert transform

The simplest and most obvious Hilbert transform
pair follows directly from the phase shift property
of the quadrature filter.
Specifically, if
then a phase shift of -90 produces
This calculation can be generalized to any signal that

consists of a sum of sinusoids.
Most other Hilbert transforms involve performing the
174
convolution operation.
87
Correlation functions and spectral

densities
175
Introduction to correlation and spectral

density
Here we study signals using the time average and signal power or
energy.
Taking the Fourier transform of a correlation function leads to

frequency-domain representations in terms of spectral density
functions.
Spectral densities allow us to deal with a broader range of signal

models, not necessarily Fourier transformable (e.g., random
signals).
So far and here we consider deterministic signals whose

behaviour is known for all possible times.
Note: Random signals occur in communication systems both as

unwanted noise and as desired information-bearing signals
176
88
Correlation of energy signals
177
Energy cross-spectral density and

autocorrelation
178
89
Example: PRN (Pseudo Random Number)

code correlation
Correlation between PRN1 and PRN1
Correlation between PRN1 and PRN2 PRN1 PRN2
Energy spectral density
Note:
180
90
Correlation of power signals
Note:
181
Power spectral density
182
Spectral density function G(f) represents the distribution of the power or energy in the
frequency domain. The area under G(f) equals the average power or total energy.
91
W-K theorem and power cross-spectral

density
183
Properties of correlation and spectral

density
Note: Spectral Function of frequency 184

Power Density Integration gives power
92
Correlation and spectral density

properties of I/O-signals
185
Periodic and discrete-time signals
186
93
Inner product
The inner product for two m 1 complex vectors
is given by
Similarly, we define the inner product of two (possibly

complex-valued) signals s(t) and r(t) as follows:
The inner product obeys the following linearity property
where a1, a2 are complex-valued constants, and s1, s2, r are

signals (or vectors).
The complex conjugate of a vector or row vector x, denoted as x*, is obtained by taking the
complex conjugate of each element of x. The Hermitian of a vector x, denoted as xH, is187
its
conjugate transpose: xH = (x)T.
Energy and norm
The energy Es of a signal s is defined as its

inner product with itself:
where s denotes the norm of s.
If the energy of s is zero, then s must be

zero.
188
94
Example: Matched filter

For a complex-valued signal s(t), the matched filter is defined as a
filter with impulse response sMF(t) = s(t).
Note that SMF(f) = S(f).
If the input to the matched filter is x(t), then the output is given by
The matched filter, therefore, computes the inner product between

the input x and all possible time translates of the waveform s.
In particular, the inner product <x, s> equals the output of the matched
filter at time 0.
For example, if x(t)=s(tt0) (i.e., the input is a time translate of s), then
the magnitude of the matched filter output is maximum at t=t0.
We can, then, intuitively see how the matched filter would be useful,
for example, in delay estimation using peak picking.
189
Matched filter for a complex-valued

signal
190
95
Autocorrelation function
The inverse Fourier transform of the energy spectral density Es(f) is
termed the autocorrelation function Rs(), since it measures how closely
the signal s matches delayed versions of itself.
Since |S(f)|2 =S(f)S(f)=S(f)SMF(f), where sMF(t)=s(t) is the matched
filter for s introduced earlier.
We therefore have that
Thus, Rs() is the outcome of passing the signal s through its matched
filter, and sampling the output at time , or equivalently, correlating the
signal s with a complex conjugated version of itself, delayed by .
While the preceding definitions are for finite energy deterministic
signals, we revisit these concepts in the context of finite power random
processes later.
191
Bandpass signals and complex

baseband representation
192
96
Real signals
Many signals in communication systems are

real bandpass signals with a frequency
response that occupies a narrow bandwidth
2B centered around a carrier frequency fc
with 2B << fc, as illustrated in figure.
Bandpass signal S(f)
193
Real signals
The bandwidth 2B of a bandpass signal is roughly

equal to the range of frequencies around fc where the
signal has non-negligible amplitude.
Bandpass signals are commonly used to model
transmitted and received signals in communication
systems.
These are real signals since the transmitter circuitry
can only generate real sinusoids (not complex
exponentials) and the channel just introduces an
amplitude and phase change at each frequency of the
real transmitted signal.
194
97
Conjugate symmetry
Since bandpass signals are real, their frequency response has
conjugate symmetry, i.e. a bandpass signal s(t) has
|S(f)| = |S(f)| and S(f) = S(f).
However, bandpass signals are not necessarily conjugate

symmetric within the signal bandwidth about the carrier
frequency fc, i.e. we may have
|S(fc+f)| |S(fcf)| or S(fc+f) S(fcf)
for some f B.
This asymmetry in |S(f)| is illustrated in the previous figure.
Bandpass signals result from modulation of a baseband signal by
a carrier, or from filtering a deterministic or random signal with
a bandpass filter.
195
Bandpass signal
196
98
Bandpass signal
197
Canonical representation
198
99
Direct-conversion modem
199
Polar decomposition
200
100
Bandpass and baseband equivalent system
201
A baseband communication refers to a system that does not include modulation.
Bandpass and lowpass (baseband)

equivalent
202
101
Baseband vs. passband signals

A signal s(t) is said to be baseband if
for some W>0.

That is, the signal energy is concentrated in a band around DC.
Similarly, a channel modeled as a linear time-invariant system is
said to be baseband if its transfer function H(f) satisfies the
previous equation.
A signal s(t) is said to be passband if
where fc>W>0.
A channel modeled as a linear time-invariant system is said to be
passband if its transfer function H(f) satisfies the previous
equation
203
Baseband vs. passband signals
The spectrum S(f) for a real- The spectrum S(f) for a real-valued passband
valued baseband signal. The signal. The bandwidth of the signal is B.
bandwidth of the signal is B. The figure shows a frequency fc within the
band in which S(f) is nonzero. Typically, fc is
much larger than the signal bandwidth B.204
102
Complex baseband representation

We often employ passband channels, which means that we must
be able to transmit and receive passband signals.
However, all the information carried in a real-valued passband

signal is contained in a corresponding complex-valued baseband
signal.
This baseband signal is called the complex baseband

representation or complex envelope of the passband signal.
This equivalence between passband and complex baseband has

profound practical significance, since the complex envelope can
be represented accurately in discrete time using a much smaller
sampling rate than the corresponding passband signal sp(t),
205
Complex baseband representation

Modern communication transceivers can implement complicated signal
processing algorithms digitally on complex baseband signals, keeping the
analog processing of passband signals to a minimum.
Thus, the transmitter encodes information into the complex baseband

waveform using encoding, modulation and filtering performed using
digital signal processing (DSP).
The complex baseband waveform is then upconverted to the

corresponding passband signal to be sent on the channel.
Similarly, the passband received waveform is downconverted to complex

baseband by the receiver, followed by DSP operations for
synchronization, demodulation, and decoding.
This leads to a modular framework for transceiver design, in which

sophisticated algorithms can be developed in complex baseband,
independent of the physical frequency band that is ultimately employed
for communication.
206
103
Time domain representation of a passband

signal
Any passband signal sp(t) can be written as
(1)
where sc(t) (c for cosine) and ss(t) (s for sine)

are real-valued signals, and fc is a frequency
reference typically chosen in the band occupied by
Sp(f).
The factor of 2 is included only for convenience in

normalization, and is often omitted as previously in
the canonical or standard representation of the
passband signal in terms of baseband signals.
207
I- and Q-components
The waveforms sc(t) and ss(t) are also referred to as the in-phase (or I)
component and the quadrature (or Q) component of the passband signal
sp(t), respectively.
The complex envelope, or complex baseband representation of sp(t) is

now defined as
(2)
We can rewrite (1) as
(3)
Note: To check (3), plug in (2) and Eulers identity on the right-hand side (3) 208
to obtain the expression (1).
104
Envelope and phase of a passband

signal
The complex envelope s(t) can also be represented in polar form,
defining the envelope e(t) and phase (t) as
(4)
Plugging
into (3), we obtain yet another formula for the passband signal s:
(5)
The equations (1), (3) and (5) are three different ways of expressing
the same relationship between passband and complex baseband in the
time domain.
209
Orthogonality of I and Q channels
The passband waveform
corresponding to the I component, and the passband

waveform
corresponding to the Q component, are orthogonal.
That is,
210
105
Upconversion and downconversion

Equation (1) immediately tells us how to upconvert from baseband
to passband.
To downconvert from passband to baseband, consider
The first term on the right-hand side is the I component, a baseband

signal.
The second and third terms are passband signals at 2fc, which we can
get rid of by lowpass filtering.
Similarly, we can obtain the Q component by lowpass filtering
211
Upconversion and downconversion
212
106
Sampling and PCM
Digital communication system: Structure
Noise
SOURCE
Source Transmitter Channel Receiver User
Transmitter
Source Channel
Formatter Modulator
encoder encoder
Receiver
Source Channel
decoder decoder
107
Formatting and transmission of baseband

signal
Digital info.
Textual Format
source info.
Pulse
Analog Transmit
Sample Quantize Encode modulate
info.
Pulse
Bit stream waveforms Channel
Format
Analog
info. Low-pass
Decode Demodulate/
filter Receive
Textual Detect
sink
info.
Digital info.
Introduction
216
108
Sampling and pulse modulation

Mathematical functions and electric signals are frequently displayed as
continuous curves.
A smooth curve drawn can be displayed by using samples that have
sufficiently close spacing.
When the samples are represented as, e.g., voltage pulses, we obtain a
discrete-time signal.
In information transmission, we can use samples instead of a continuous
time signal.
Instead of CW-modulation methods, we can use pulse modulation methods.
A digital signal is obtained by representing the discrete samples as discrete
number of amplitude values (quantization), usually by using the binary
system.
Number of bits determines the number of discrete values.
In digital transmission systems are used digital modulation methods.
217
Sampled signals
218
109
Analog and digital amplitude modulations
219
Why digital communications?
220
110
Why digital communications?
221
Regenerative repeater in digital

communications
222
111
Digital vs. analog
223
Digital transmission system
224
112
Schematic diagram of a PCM coder

decoder
225
A paralleltoserial (P/S) converter
Sampling concepts
Continuous-time signal
Discrete-time signal obtained

by uniform sampling
Digital signal represented

as discrete sample values
226
113
Periodic signals and F-transform
227
Impulse train
228
114
Periodic sampling
229
Reconstruction of signals from samples
230
115
Example: Perfect reconstruction
231
Example: Aliased reconstruction

(undersampled)
232
116
Sampling theorem
233
Reconstruction as interpolation
234
117
Reconstruction as interpolation
235
Bandlimited interpolation
236
118
Nyquist sampling theorem
237
Time domain interpretation
238
119
Frequency domain interpretation
239
Example: Sampling of sinusoid
240
120
Determination of sampling frequency

from signal waveform
241
Sampling with pre-filtering
242
121
Reconstruction of continuous signal from

samples
243
Bandwidth of signal
Baseband versus bandpass:
Baseband Bandpass
signal signal
Local oscillator
Bandwidth dilemma:
Bandlimited signals are not realizable!
Realizable signals have infinite bandwidth!
244
122
Bandwidth of signal: Approximations

Different definition of bandwidth:
a) Half-power bandwidth d) Fractional power bandwidth
b) Noise equivalent bandwidth e) Bounded power spectral density
c) Null-to-null bandwidth f) Absolute bandwidth
(a)
(b)
(c)
(d)
245
(e)50dB
Sampling of analog signals

Time domain Frequency domain
xs (t ) = x (t ) x(t ) X s ( f ) = X ( f ) X ( f )
x(t )
| X( f )|
x (t ) | X ( f ) |
xs (t )
| Xs( f )|
246
123
Aliasing effect & Nyquist rate
LP filter
Nyquist rate
aliasing
247
Undersampling & aliasing in time domain
248
124
Sampling theorem
Analog Sampling Pulse amplitude
signal process modulated (PAM) signal
Sampling theorem: A bandlimited signal with no spectral components

beyond , can be uniquely determined by values sampled at uniform
intervals of
The sampling rate (Nyquist rate) is
In practice, it is need to sample faster than this because the receiving filter
will not be sharp.
249
Sampling theorem
Statement: Any signal with a bandwidth of W can be completely reconstructed if it is
sampled at a rate of 2W.
Original
waveform
samples
Capacitor
discharges
Capacitor
charges
Thus by sampling first at the transmitter and then passing the samples through a ideal
LPF the original waveform can be completely reconstructed 250
125
Example: Effect of under sampling
Original Incorrectly
signal at 3 Hz Samples at less reconstructed
sampling rate at 2 Hz signal at 1 Hz
Thus when any wave is sampled at a frequency that is less than double
the maximum signal frequency, the recovered wave will not be of the same
frequency as the input waveform. This distortion is called aliasing .
The sampling frequency has to be adjusted such that fs > 2fm 251
Pulse modulation
252
126
Pulse modulation
Signal
PAM
PWM
PPM
253
Encoding (PCM)
Pulse code modulation (PCM): Encoding the

quantized signals into a digital word (PCM word or
codeword).
Each quantized sample is digitally encoded into an l bits
codeword where L in the number of quantization levels and
254
127
Digitizing analog speech

PAM & PCM
PAM (Pulse Amplitude Modulation).
The amplitude of a train of pulse is varied
according to the amplitude of the analog signal
(modulating signal)
PCM (Pulse Code Modulation).

The analog signal modulates a train of pulse (PAM).
In effect the analog signal is sampled and the
samples are coded to a binary value which is a
function of the amplitude of the sampled analog
signal
PCM = PAM + QUANTIZATION + COMPANDING 255
Schematic diagram of a PCM coder

decoder
256
A paralleltoserial (P/S) converter
128
Pulse code modulation

PCM was developed by Reeves in 1937
PCM (ADPCM) is the preferred method of
communication within the PSTN
PCM is a type of coding that is called waveform coding

because it creates a coded form of the original voice
waveform.
PCM is a waveform coding method defined in the ITU-T

G.711 specification.
257
Example: Quantization and Pulse Code

Modulation (PCM)
Quantization
Quantizing error or noise
Approximations mean it is impossible to recover
original exactly
4 bit system divides amplitude range to 16 levels
8 bit sample gives 256 levels
8-bit quality comparable with analog speech
transmission in PSTN
8000 samples per second of 8 bits each gives
64kbps = digital speech channel
258
129
Diagrammatic representation of 3 bit PCM

4
CODE No. 7 3.5
3
CODE No. 6 2.5
2
CODE No. 5 1.5
1
CODE No. 4 0.5
0
CODE No. 3 -0.5
-1
CODE No. 2 -1.5
-2
CODE No. 1 -2.5
-3
CODE No. 0 -3.5
-4
SAMPLE VALUE 0.0 3.35 1.75 - 0.25 -1.4 -2.3 -3.5

NEAREST Q LEVEL 0.5 3.5 1.5 -0.5 -1.5 -2.5 -3.5
QUANT ERROR +0.5 +0.15 -0.25 +0.25 +0.1 +0.2 0.0
CODE NUMBER 4 7 5 3 2 1 0 259
ENCODED BITS 100 111 101 011 010 001 000
Pulse Code Modulation (PCM)
260
130
Single channel simplex PCM transmission

system
Quantization
Companding
261
PCM block diagram
pulse pulse code

amplitude modulated
modulated signal
sampling
(pam) signal (pcm)
clock
quantizer digitized
sampling voice
and
circuit signal
compander
analog
voice voice band
signal filter
262
131
Quantization
263
Memoryless quantization
264
132
Uniform quantizer
265
Example: Uniform quantizer
266
133
Uniform quantization
Amplitude quantizing: Mapping samples of a continuous
amplitude waveform to a finite set of amplitudes.
Out
In
Average quantization noise power
Quantized
Signal peak power

values
Signal power to average

quantization noise power
267
Example
Derive quantization noise for uniform quantization in case of
signal x[-1,1] and the number of quantization levels is M.
Quantization error is
where x is the exact value of the sample and xj is corresponding

quantized value.
The mean square value of the quantized error is
where j is quantization interval and p(x) pdf of x

268
134
Quantization levels xj and non-uniform

quantization intervals j
0-level
269
Example
If Max{j}<< then
In whole range
270
135
Example
Note:
If Max{j} << then
So
P(xj) is the probability of

the level xj
271
Example
If j= = constant j, then
Note x[-1,1], so =2/M

and
where M is the number of levels and n is the number of codeword bits.
Note that when M is doubled the mean square of quantization error or

quantization noise power is reduced to one quarter. 272
136
Example: SNRqdB
Derive SNRqdB for uniform quantization in case of uniformly distributed
and normalized signal |x(t)|1 and the number of quantization levels is M.
The average signal power is
For uniformly distributed signal |x(t)|1
273
Example: SNRqdB
So
and
Because B=nW, for binary system and uniform signal
where B is channel bandwidth and W is signal bandwidth.
In PCM SNR increases exponentially as a function of channel bandwidth.

274
137
Compressor + expander = compander
275
Quantization error
Quantizing error: The difference between the input and
output of a quantizer e(t ) = x (t ) x (t )
Process of quantizing noise

Quantizer
Model of quantizing noise
y = q ( x)
AGC x(t ) x (t )
x(t ) x (t )
x
e(t )
+
e(t ) =
x (t ) x (t )
276
138
Non-linear quantization
Use of equal quantization intervals throughout the entire

dynamic range of an input analog signal (for low & high
energy signals) results in:
Low level signals have a low SNRq.
High level signals have a high SNRq.
Most voice signals are of low levels.

Thus, efficient way to improve voice signal quality at
lower signal levels is use a non-uniform (non-linear)
quantization process
277
Non-linear PCM and companding

Speech signals consist predominantly of small amplitude signals and the large
amplitude signals occur with much smaller probability.
Hence it is logical that the smaller amplitudes are quantized with more precision.
This means that the step size is maintained small for the region where the signal
amplitude is small.
Correspondingly, the step size for the large signals are made large.
This will of course result in large quantization error in case of large signals.
But this is tolerable since the large signal amplitudes do not occur very often.
This process of varying the step size during the encoding process is called
compressing and the corresponding receiver will do expanding to reverse the
distortion introduced at the encoder.
This process is called companding.
This type PCM is known as non-linear or logarithmic PCM.
278
139
Compressor and expander
279
Non-uniform quantization
It is done by uniformly quantizing the compressed signal.
At the receiver, an inverse compression characteristic, called
expansion is employed to avoid signal distortion.
compression+expansion companding
y = C (x) x
x(t ) y (t ) y (t ) x (t )
x y
Compress Quantize Expand
Transmitter Channel Receiver
280
140
Non-linear quantization
The voltage range between the lowest level and the
highest level is divided into segments in a non-linear
manner logarithmic
The lower the voltage levels, the smaller the range of

a segment.
The range of a segment gets larger for higher voltage

levels
The number of steps for each segment is the same
281
Companding
During the companding process, input analog signal
samples are compressed into logarithmic segments and
then each segment is quantized and coded using uniform
quantization.
Companding (compression and expansion) increases SNR

performance (minimize quantization noise) while keeping
the number of bits used for quantization constant.
282
141
Speech compression
=255
Log.
Lin.
A=87.6
283
Piecewise linearized A-curve

Number of the
Equation of the segment Segment
segment
284
142
Piecewise linearized A-curve

The logarithmic part is replaced with piecewise linear segments
285
Transfer characteristics of a compander

Vo
Compression
CODE 1111
CODE 1101
1.2
Vi
1.2
CODE 0010
CODE 0000
286
expanding
143
Differential quantizier
287
1-point DPCM
288
144
Baseband transmission
To transmit information through physical channels, PCM
sequences (codewords) are transformed to pulses (waveforms).
Each waveform carries a symbol from a set of size M.
Each transmit symbol represents k =log2 M bits of the PCM words.
PCM waveforms (line codes) are used for binary symbols (M=2).
M-ary pulse modulation are used for non-binary symbols

(M>2). Eg: M-ary PAM.
For a given data rate, M-ary PAM (M>2) requires less bandwidth than
binary PCM.
For a given average pulse power, binary PCM is easier to detect than M-
ary PAM (M>2).
289
Example: 8-ary PAM vs. binary

PAM
290
145
Example: Binary PAM and 4-ary PAM
Binary PAM 4-ary PAM

(rectangular pulse) (rectangular pulse)
3B
A.
11
1 B
T
T T 01
T -B 00 T T
0 10
-A. -3B
291
Other PCM waveforms: Examples
PCM waveforms category:
Nonreturn-to-zero (NRZ)
Return-to-zero (RZ)
+V 1 0 1 1 0 +V 1 0 1 1 0
NRZ -V Manchester -V
Unipolar-RZ +V Miller +V
0 -V
+V +V
Bipolar-RZ 0 Dicode NRZ 0
-V -V
0 T 2T 3T 4T 5T 0 T 2T 3T 4T 5T
292
146
PCM waveforms: Selection criteria
Criteria for comparing and selecting PCM

waveforms:
Spectral characteristics (power spectral
density and bandwidth efficiency)
Bit synchronization capability
Error detection capability
Interference and noise immunity
Implementation cost and complexity
293
Summary: Baseband formatting and

transmission
Digital info. Bit stream Pulse waveforms
(Data bits) (baseband signals)
Textual Format
source info.
Pulse
Analog Sample Quantize Encode modulate
info.
Sampling at rate Encoding each q. value to

f s = 1 / Ts l = log 2 L bits
(sampling time=Ts) (Data bit duration Tb=Ts/l)
Quantizing each sampled Mapping every m = log 2 M data bits to a

value to one of the symbol out of M symbols and transmitting
L levels in quantizer. a baseband waveform with duration T
Information (data- or bit-) rate: Rb = 1 / Tb [bits/sec]

Symbol rate : R = 1 / T [symbols/sec]
Rb = mR 294
147
Codec
sampling
quantizing
encoding
Analog Digital
signal signal
Sampler Quantizer Encoder
295
Typical digital passband transmitting

system
M U
Analog Digital O P
signal C signal D C
O U O RF signal
D TDM L N
E Digital A V
C Base Band T E
O R
R T
296
148
Appendix: Sampling and

Quantization
A more detailed review
297
Ideal (or impulse) sampling
298
149
Illustration of ideal sampling
299
Spectrum of the sampled waveform
300
150
Reconstruction of m(t)
301
Bandlimited interpolation
302
151
Sampling theorem
A signal having no frequency components above W Hertz is
completely described by specifying the values of the signal at
periodic time instants that are separated by at most 1/(2W)
seconds.
303
Natural sampling
304
152
Illustration of natural sampling
305
Signal reconstruction in natural sampling
306
153
Flat-top sampling
307
Spectrum of ms(t) in flat-top sampling
308
154
Equalization
309
Pulse modulation
310
155
PWM & PPM waveforms with a sinusoidal

message
311
Quantization
312
156
Memoryless quantization
313
Uniform quantizer
314
157
Input and output of a midrise uniform

quantizer
315
Signal-to-quantization Noise Ratio (SNRq)
316
158
317
318
159
Optimal quantizer
319
Optimal quantizer
320
160
Example of optimal quantizer design
321
Lloyd-Max conditions and iterative

algorithm
322
161
Robust quantizers
323
-law and A-law companders
324
162
SNRq of non-uniform quantizers
325
SNRq of non-uniform quantizers
326
163
SNRq of -law compander
327
SNRq of -law compander
328
164
8-bit quantizer for the Gaussian-

distributed message
One sacrifices performance for larger input power levels to obtain a 329
performance that remains robust over a wide range of input levels.
SNRq with 8-bit -law quantizer (L = 256,

= 255)
330
165
Differential quantizers
331
Linear predictor
332
166
Normal equations (or the Yule-Walker

Equations)
333
Reconstruction of m[n] from the

differential samples
334
167
Reconstruction of m[n] from the

differential samples
335
Intersymbol Interference
(ISI)
Pulse shaping and
equalization
336
168
Symbols and signals
337
Digital pulse amplitude modulation
338
169
Digital pulse amplitude modulation
339
Symbols and bits
340
170
Bandwith constraint
341
Inter-Symbol Interference (ISI)
342
171
Attenuation and dispersion effects: ISI
Inter-symbol interference (ISI)
343
Inter-Symbol interference (ISI) seems to be an

unavoidable phenomenon of both wired and wireless
communication systems.
Sent
Received
344
172
Ideal and sent shape

Next figure shows a data sequence, 1,0,1,1,0, which we wish to send.
This sequence is in form of square pulses.
Square pulses are nice as an abstraction but in practice they are hard to
create and also require far too much bandwidth.
So we shape them as shown in the dotted line.
The shaped version looks essentially like a square pulse and we can quickly
tell what was sent even visually.
Advantage of (an arbitrary) shaping at this point is that it reduces
bandwidth requirements and can actually be created by the hardware.
345
Symbols are spread by the medium

Next figure shows each symbol as it is received.
We can see what the transmission medium creates a tail of energy that lasts
much longer than intended.
The energy from symbols 1 and 2 goes all the way into symbol 3.
Each symbol interferes with one or more of the subsequent symbols.
The circled areas show areas of large interference.
346
173
Received vs. transmitted signal

Next figure shows the actual signal seen by the receiver.
It is the sum of all these distorted symbols.
Compared to the dashed line that was the transmitted signal, the received
signal looks quite different.
The receiver actually sees the value of the amplitude at the timing instant
only (the little yellow dot in the picture).
Notice that for symbol 3, this value is approximately half of the transmitted
value, which makes this particular symbol more susceptible to noise and
incorrect interpretation and this phenomena is the result of this symbol
delay and smearing.
347
ISI
This spreading and smearing of symbols such that the energy
from one symbol effects the next ones in such a way that the
received signal has a higher probability of being interpreted
incorrectly is called Inter-Symbol-Interference or ISI.
ISI can be caused by many different reasons.
It can be caused by filtering effects from hardware or frequency
selective fading, and from non-linearity effects.
Communication system designs for both wired and wireless
nearly always need to incorporate some way of controlling ISI.
348
174
ISI effects: Band-limited filtering of

channel
ISI due to filtering effect of the
communications channel (e.g. wireless
channels)
Channels behave like band-limited filters
H c ( f ) = H c ( f ) e j c ( f )
Non-constant amplitude Non-linear phase
Amplitude distortion Phase distortion

349

ISI appears in the detection process due to the filtering
effects of the system
Overall equivalent system transfer function
H ( f ) = Ht ( f )H c ( f )H r ( f )
creates echoes and hence time dispersion
causes ISI at sampling time
ISI effect
z k = sk + nk + i si
ik
350
175
Inter-symbol interference (ISI): Model

Baseband system model
x1 x2
{xk } Tx filter Channel r (t ) Rx. filter
zk
{xk }
ht (t ) hc (t ) hr (t ) Detector
t = kT
T Ht ( f ) Hc ( f ) Hr ( f )
x3 T n(t )
Equivalent model
x1 x2
{xk } Equivalent system
h(t )
z (t ) zk
{xk }
Detector
t = kT
T H( f )
x3 T n (t )
filtered noise
H ( f ) = Ht ( f )H c ( f )H r ( f )
351
What can we do about ISI?

The main problem is that energy, which we wish to confine to one symbol, leaks
into others.
So one of the simplest things we can do to reduce ISI is to just slow down the
signal.
Transmit the next pulse of information only after allowing the received signal has
damped down.
The time it takes for the signal to die down is called delay spread, whereas the
original time of the pulse is called the symbol time.
If delay spread is less than or equal to the symbol time then no ISI will result.
Slowing down the bit rate was the main way ISI was controlled on the old days.
Of course, in our march for ever higher bit rates, slowing down the data rate is an
easy but an unacceptable solution.
Nowadays digital electronic allows us to do signal processing controlling ISI and
transmission speeds increase accordingly.
352
176
Pulse shaping
353
Pulse shaping to reduce ISI
Goals and trade-off in pulse-shaping

Reduce ISI
Efficient bandwidth utilization
Robustness to timing error (small side lobes)
354
177
Why not sinc pulse
355
Vestigial-symmetry theorem and raised

cosine pulse
356
178
Pulse shaping
The main tool used to counter ISI is pulse shaping.
How can pulse shaping help control ISI?
The secret lies in the digital demodulation process used.
When the timing pulse slices the signal to determine the value
of the signal at that instant, it does not care what the signal
looked like before or after it.
So if there was some way we could keep the symbols from
interfering in such a way that they do not affect the amplitude
at the slicing instant, we can counter ISI successfully.
357
Only sampling moments are important
Look at the wildly bouncing signal below.

However, the receiver only sees the points at the timing pulses (shown
below the signal) and rest of the variation has no effect.
So as long at these points, we can reduce the effect of adjacent symbols,
thats all we need to do to mitigate the effect.
We only care about

what the signal does
at the moment of
sampling. What it
does in between is
unimportant.
358
179
Nyquist bandwidth constraint

Nyquist bandwidth constraint (on equivalent system):
The theoretical minimum required system bandwidth to
detect Rs [symbols/s] without ISI is Rs/2 [Hz].
Equivalently, a system with bandwidth W=1/2T=Rs/2 [Hz]
can support a maximum transmission rate of 2W=1/T=Rs
[symbols/s] without ISI.
1 R R
= s W s 2 [symbol/s/Hz]
2T 2 W
Bandwidth efficiency, R/W [bits/s/Hz] :
An important measure in DCs representing data
throughput per Hz of bandwidth.
Showing how efficiently the bandwidth resources are used
by signaling techniques. 359
Equivalent system: Ideal Nyquist pulse

(filter)
Ideal Nyquist filter Ideal Nyquist pulse
H( f ) h(t ) = sinc(t / T )
T 1
0 f 2T T 0 T 2T t
1 1
2T 2T
1
W= 360
2T
180
Starting case: Square pulse shape

We will start by looking at the use of a square pulse.
It is an intuitive shape and we want to see what if anything is wrong with
using it.
Lets define some terms
Ts = symbol time, 1 second in the example below.
Rs, the symbol rate is inverse of symbol time, Rs = 1/ Ts.
R is directly related to bandwidth such that larger the symbol rate, the more
bandwidth is required.
The square pulse in time-domain 361
Spectrum of the square pulse
Since the symbol time is 1 second, the symbol rate is

1 symbol per second.
The frequency response of this square pulse (its
Fourier transform) is given by the equation
where Ts = symbol time (1 sec)
362
181
Square pulse sinc function
The frequency
response of the
square pulse is a
sinc function.
Lowpass
bandwidth is one
half of the
bandpass case.
363
Lowpass and bandpass bandwidth
In the previous case, the symbol time is 1 second.

The symbol rate hence is also equal to 1.
The frequency response of the square pulse is in the shape of a
sinc function (sinx/x).
It has a maximum value of AT and it crosses the zero at
integer multiples of R.
The lowpass bandwidth which is defined as the distance from
origin to the first zero crossing, is equal to twice the symbol
rate 1 Hz.
The bandpass case is twice that.
This lowpass, bandpass business may be confusing but we can
understand it better if we realize that bandwidth is always
measured on the positive axis.
364
182
Square pulses and corresponding

frequency functions
The effect of square

pulse symbol times
and their frequency
response
365
Pulse bandwidth
A narrow pulse has a wide frequency response.

A wide pulse has smaller bandwidth.
For each pulse, the bandwidth which we measure is
only on the positive half and is equal and its symbol
rate in Hz.
The important thing to note at this point is that a
square pulse of symbol rate R has a bandwidth of R
Hz (for bandpass signal it is twice that.)
366
183
An important relationship
Bandwidth of a square pulse

= R for lowpass signals,
= 2R for bandpass.
The frequency response of the square pulse goes on
forever.
This is not a good thing, because it would interfere
with others and interference is not allowed by the
authorities.
367
Disadvantages of the square pulse
1. The ideal square pulse is difficult to create in time

domain because of rise time and a decay time.
2. Its frequency response goes on forever and decays
slowly.
The second lobe is only 13 dB lower than the first
one.
3. It is very sensitive to ISI.
368
184
Duality
If a square pulse gives us a sinc function in the

frequency domain, then we could use a sinc function
as a pulse shape in time domain and get a square
wave frequency response.
We could use a pulse that is shaped like a sinc
function instead of a square pulse and get that very
nice boxcar spectrum, with nothing spilling outside
the bandwidth.
369
Example
A sequences of bits (1011) by shaping the bits as sinc

pulses
370
185
Comparison between square and sinc pulse
Using the sinc pulse cuts the bandwidth requirement

to one-half compared to the square pulse.
371
Nyquist bandwidth
The bandwidth achieved by the sinc pulse is called the Nyquist
bandwidth.
It requires only 1/2 Hz per symbol.
Can we find something even better?
It turns out that we have not been able to find any other shape
that can improve on this.
It is an ultimate limit for perfect reconstruction of the signal.
Band-limited spectrum in frequency domain with no energy
going to waste and small total bandwidth requirement seems to
be great!
But not so great however, because a sinc pulse is actually no
more possible to build than is a square pulse.
372
186
Disadvantages of sinc pulse

1. In time domain a true sinc pulse is of infinite length with tails
extending to infinity so the energy can theoretically continue
to add up even after the signal has ended.
We can only design an approximation to the real sinc pulse of
a finite length.
But truncation leads to an imperfect pulse that does not have a
true sinc pattern and allows ISI to leak in.
2. The pulse tails that fall in the adjacent symbols decay at the
rate of 1/x so if there is some error in timing, this pulse is not
very forgiving.
It requires near-perfect timing to achieve decent performance.
373
Nyquist pulses (filters)
Nyquist pulses (filters):

Pulses (filters) which result in no ISI at the sampling time.
Nyquist filter:
Its transfer function in frequency domain is obtained by
convolving a rectangular function with any real even-
symmetric frequency function
Nyquist pulse:
Its shape can be represented by a sinc(t/T) function
multiply by another time function.
Example of Nyquist filters: Raised-Cosine filter
374
187
Raised Cosine filter

Raised-Cosine filter
A Nyquist pulse (No ISI at the sampling time)
1 for | f |< 2W0 W

| f | +W 2W0
H ( f ) = cos 2 for 2W0 W <| f |< W
4 W W0
0 for | f |> W
cos[2 (W W0 )t ]
h(t ) = 2W0 (sinc(2W0t ))
1 [4(W W0 )t ]2
W W0
Excess bandwidth: W W Roll-off factor r =
W0
0 r 1
0
375
Raised Cosine (RC) filter: Nyquist pulse

approximation
| H ( f ) |=| H RC ( f ) | h(t ) = hRC (t )

1 r=0 1
r = 0.5
0.5 0.5 r =1
r =1 r = 0.5
r =0
1 3 1 0 1 3 1 3T 2T T 0 T 2T 3T
T 4T 2T 2T 4T T
Rs
Baseband W sSB= (1 + r ) Passband W DSB= (1 + r ) Rs
2
376
r = roll-off factor
188
Raised cosine
Nyquist offered ways to build (realizable) shapes that had the
same good qualities as the sinc pulse and less of the
disadvantages.
One class of pulses he proposed are called the raised cosine
pulses.
They are really a modification of the sinc pulse.
The sinc pulse has a bandwidth of W0, where W0 is specified
as
W0 = 1/2T
The raised cosine pulses have an adjustable bandwidth which
can be varied from W0 to 2W0.
We want to get as close to W0, which is called the Nyquist
bandwidth.
377
Roll-off factor
The factor r related the achieved bandwidth to the ideal bandwidth W0 as
W W0
Roll-off factor r =
W0
0 r 1
where W0 is Nyquist bandwidth, and W is the utilized bandwidth.
The factor r is called the roll-off factor.
It indicates how much bandwidth is being used over the ideal bandwidth.
The smaller this factor, the smaller bandwidth and the more efficient the
scheme.
The percentage over the minimum required W is called the excess
bandwidth.
It is 100% for roll-off of 1.0 and 50% for roll-off of 0.5.
378
189
Roll-off
The alternate way to express the utilized bandwidth is
The typical roll-off values used for communications

range from .2 to .4.
Obviously we want to use as small a roll-off as
possible, since this gives the smallest bandwidth.
379
Sinc and cosine parts

How the class of raised cosine pulse is defined in time
domain?
Time domain presentation includes product of two parts
The first part is the sinc pulse.
The second part is a cosine correction applied to the sinc pulse
to make it behave better.
The bandwidth is now adjustable.
It can be anywhere from 1/2 Rs to Rs.
It is greater than the Nyquist bandwidth by a factor (1+ r).
For r = 0, the impulse response of RC filter reduces to the sinc
pulse.
380
190
Raised cosine impulse response
Roll-off
factor r=
381
Frequency response of the raised cosine

pulses of Rs = 1
Roll-off
factor r=
382
191
Example of pulse shaping

Raised Cosine pulse at the output of matched filter
Amp. [V]
Baseband received waveform at

the matched filter output
(zero ISI)
t/T
383
Root-raised cosine
The whole raised cosine can be applied at once at the transmitter but in
practice it has been found that concatenating two filters each with a root
raised cosine response (called split-filtering) works better.
So to implement the raised cosine response, we split the filtering in two
parts to create a matched set.
In frequency domain, we take the square root of the frequency response
hence the name root-raised cosine.
Split filtering of raised cosine response, a root-raised cosine filter at the transmitter and
one at the receiver, giving a total response of a raised cosine.
384
192
Square-root raised cosine pulse
385
Impact of AWGN only
386
193
Square-root raised cosine pulse
387
Example of pulse shaping

Square-root Raised-Cosine (SRRC) pulse shaping
Amp. [V]
Baseband tr. Waveform
Third pulse
t/T
First pulse
Second pulse
Data symbol
388
194
Monitoring transmission quality using eye

diagram
389
Eye diagram
The optimum sampling time

corresponds to the
maximum eye opening.
ISI at that time partially = Sensivity to timing
closes the eye and thereby error
reduces the noise margin.
If synchronization is
derived from the zero-
crossings, as it usually is,
zero-crossing distortion
produces jitter and results
in non-optimum sampling 390
times.
195
Eye pattern
Eye pattern: Display on an oscilloscope which sweeps
the system response to a baseband signal at the rate
1/T (T symbol duration) the superposition of
successive symbol intervals
Distortion
due to ISI
Noise margin
amplitude scale
Sensitivity to
timing error
Timing jitter
time scale 391
Example of eye pattern: BPAM, SRRC pulse
Perfect (ideal) channel (no noise and no ISI)
392
196
AWGN (Eb/N0=20 dB) and no ISI
393
AWGN (Eb/N0=10 dB) and no ISI
394
197
Example of eye pattern with ISI:

BPAM, SRRC pulse
Distorted (non-ideal) channel and no noise
hc (t ) = (t ) + 0.7 (t T )
395

BPAM, SRRC pulse
AWGN (Eb/N0=20 dB) and ISI
hc (t ) = (t ) + 0.7 (t T )
396
198

Binary-PAM, SRRC pulse
AWGN (Eb/N0=10 dB) and ISI
hc (t ) = (t ) + 0.7 (t T )
397
Multipath: Power-delay profile

Power
path-1
path-2
path-3
multi-path path-2
propagation
Path Delay
path-1
path-3
Mobile Station (MS)
Base Station (BS)
Channel Impulse Response:

Channel amplitude |h| correlated at delays .
Each tap value at kTs Rayleigh distributed
(actually the sum of several sub-paths)
398
199
Example: Power delay profile (WLAN/indoor)
399
Multipath: Time-dispersion => frequency

selectivity
The impulse response of the channel is correlated in
the time-domain (sum of echoes)
Manifests as a power-delay profile, dispersion in channel
autocorrelation function A()
Equivalent to selectivity or deep fades in the
frequency domain
Delay spread:
~ 50ns (indoor) 1s (outdoor/cellular).
Coherence Bandwidth:
Bc = 500kHz (outdoor/cellular) 20MHz (indoor)
Implications: High data rate: symbol smears onto the
adjacent ones (ISI).
400
200
Multipath intensity profile
Multipath
effects
~ O(1s)
401
Doppler: Non-stationary impulse response
Set of multipaths
changes ~ O(5 ms)
402
201
Doppler: Dispersion (frequency) => time-selectivity

The Doppler power spectrum shows dispersion/flatness ~ Doppler
spread (100-200 Hz for vehicular speeds)
Equivalent to selectivity or deep fades in the time domain
correlation envelope.
Each envelope point in time-domain is drawn from Rayleigh
distribution. But because of Doppler, it is not IID, but correlated for
a time period ~ Tc (correlation time).
Doppler Spread: Ds ~ 100 Hz (vehicular speeds at 1GHz)
Coherence Time: Tc = 2.5-5ms.
Implications: A deep fade on a tone can persist for 2.5-5 ms!
Closed-loop estimation is valid only for 2.5-5 ms.
Note: A collection of random
variables is independent and
identically distributed (IID) if
each random variable has the
same probability distribution as
the others and all are mutually
independent. White noise is an
example of IID.
403
Time-varying (fading) channel impulse response

Note: IID refers to
sequences of
random variables.
"Independent and
identically
distributed"
implies an
element in the
sequence is
independent of
the random
variables that
came before it.
Note 1: At each tap, channel gain |h| is a Rayleigh distributed r.v.. The
random process is not IID.
Note 2: Response spreads out in the time-domain (), leading to inter-
symbol interference and deep fades in the frequency domain:
frequency-selectivity caused by multi-path fading
Note 3: Response completely vanish (deep fade) for certain values of t:
Time-selectivity caused by doppler effects (frequency-domain 404
dispersion/spreading)
202
Dispersion-selectivity duality I
405
Dispersion-selectivity duality II
406
203
Fading terminology
Flat fading: no multipath ISI effects.
E.g. narrowband, indoors
Frequency-selective fading: multipath ISI effects.
E.g. broadband, outdoor.
Slow fading: no doppler effects.

E.g. indoor WiFi home networking
Fast fading: doppler effects, time-selective channel
E.g. cellular, vehicular
Broadband cellular + vehicular => Fast + frequency-

selective
407
Inter-Symbol-Interference (ISI) due to

multipath fading
Transmitted signal:
Received Signals:
Line-of-sight:
Reflected:
The symbols add up

on the channel Delays
Distortion!
Multipath Radio Channel
204
What is an equalizer?
We use it for music in everyday life!

Eg: default settings for various types of
music to emphasize bass, treble etc
Essentially we are setting up a (f-domain)
filter to cancel out the channel multipath
filtering effects
409
Equalization
Step 1 waveform to sample transformation Step 2 decision making
Demodulate & Sample Detect
z (T ) Threshold m i
r (t ) Frequency Receiving Equalizing
comparison
down-conversion filter filter
For bandpass signals Compensation for

channel induced ISI
Received waveform Baseband pulse

Baseband pulse Sample
(possibly distorted)
(test statistic)
410
205
Equalization: Channel is a LTI filter

ISI due to filtering effect of the
communications channel (e.g. wireless
channels)
Channels behave like band-limited filters
H c ( f ) = H c ( f ) e j c ( f )
Non-constant amplitude Non-linear phase
Amplitude distortion Phase distortion

411
Pulse shaping and equalization principles

No ISI at the sampling time
H RC ( f ) = H t ( f ) H c ( f ) H r ( f ) H e ( f )
Square-Root Raised Cosine (SRRC) filter and Equalizer

H RC ( f ) = H t ( f ) H r ( f )
Taking care of ISI
H r ( f ) = H t ( f ) = H RC ( f ) = H SRRC ( f ) caused by tr. filter
1
He ( f ) = Taking care of ISI
Hc ( f ) caused by channel
Equalizer: enhance weak frequencies, dampen strong frequencies to flatten

the spectrum
Since the channel Hc(f) changes with time, we need adaptive equalization,
i.e. re-estimate channel & equalize 412
206
Equalization: Slow fading channel

Example of a (somewhat) frequency selective, slowly changing (slow
fading) channel for a mobile user
413
Equalization: Fast fading channel

Example of a highly frequency-selective, fast changing (fast fading)
channel for a mobile user
414
207
Equalizing filters
Baseband system model
a1
a (t kT ) Tx filter
k Channel r (t ) Equalizer Rx. filter z (t ) z k {a k }
k ht (t ) hc (t ) he (t ) hr (t ) Detector
t = kT
Ta a Ht ( f ) Hc ( f ) He ( f ) Hr ( f )
2 3
n(t )
Equivalent model H ( f ) = H t ( f )H c ( f )H r ( f )
a1
a (t kT )
k
Equivalent system z (t ) x(t ) Equalizer z (t )
zk {ak }
k h(t ) he (t ) Detector
t = kT
Ta a H( f ) He ( f )
2 3 n (t )
filtered (colored) noise
n (t ) = n(t ) hr (t )
415
Equalizer types
416
208
Recursive Least Squares (RLS) filters
The Recursive least squares (RLS) adaptive filter is

an algorithm which recursively finds the filter
coefficients that minimize a weighted linear least
squares cost function relating to the input signals.
This in contrast to the least mean squares (LMS)
algorithms that aim to reduce the mean square error.
In the derivation of the RLS, the input signals are
considered deterministic, while for the LMS they are
considered stochastic.
Compared to most of its competitors, the RLS
exhibits extremely fast convergence.
However, this benefit comes at the cost of high
computational complexity, and potentially poor
tracking performance when the filter to be estimated
(the "true system") changes.
417
Filter coefficients
The idea behind RLS filters is to minimize a cost function C by
appropriately selecting the filter coefficients wn , updating the
filter as new data arrives.
The error signal e(n) and desired signal d(n) are defined in the
negative feedback diagram below:
The error implicitly depends on the filter coefficients through

the estimate :
418
209
Cost function
The weighted least squares error function C the cost function
we desire to minimize being a function of e(n) is therefore
also dependent on the filter coefficients:
where 0<1 is the "forgetting factor" which gives exponentially

less weight to older error samples.
419
Linear equalizer
A linear equalizer effectively inverts the channel.
n(t)
Equalizer
Channel
1
Hc(f) Heq(f)
Hc(f)
The linear equalizer is usually implemented as a

tapped delay line.
On a channel with deep spectral nulls, this equalizer
enhances the noise (both signal and noise pass thru equalizer).
poor performance on frequency-selective

fading channels
420
210
Noise enhancement with spectral nulls
421
Decision Feedback Equalizer (DFE)

DFE
n(t)
x(t) ^
x(t)
Forward +
Hc(f)
Filter -
Feedback
Filter
The DFE determines the ISI from the previously detected

symbols and subtracts it from the incoming symbols.
This equalizer does not suffer from noise enhancement
because it estimates the channel rather than inverting it.
The DFE has better performance than the linear
equalizer in a frequency-selective fading channel.
The DFE is subject to error propagation if decisions are
made incorrectly.
=> doesnt work well with low SNR.
Note. Optimal non-linear: MLSE (complexity grows exponentially
with delay spread) 422
211
Equalization by transversal filtering

Transversal filter:
A weighted tap delayed line that reduces the effect of
ISI by proper adjustment of the filter taps.
N
z (t ) = c x(t n )
n= N
n n = N ,..., N k = 2 N ,...,2 N
x (t )

c N cN +1 c N 1 cN
z (t )

Coefficient
adjustment 423
Training the filter
424
212
Transversal equalizing filter

Zero-forcing equalizer:
The filter taps are adjusted such that the equalizer output is forced to be
zero at N sample points on each side:
Adjust 1 k =0
z (k ) =
{cn }nN= N 0 k = 1,..., N
Mean Square Error (MSE) equalizer:

The filter taps are adjusted such that the MSE of ISI and noise power at
the equalizer output is minimized. (note: noise is whitened before filter)
Adjust
{c n }nN= N
[
min E ( z (kT ) ak ) 2 ]
425
Equalizer
The ideal equalizer, an exact inverse system for the channel, is almost
always unrealizable.
With enough prior information about the channel, very good
approximations can be realized.
Using filter theory, the combination of channel and equalizer can be
made a near distortionless system.
Less distortion higher order filter, more delay.
Equalization can be done at receiver, transmitter or both.
However, the channel often is not known precisely at the time of
system design and needs to be fine-tuned during operation.
One approach to simplify the design of the equalizer is to focus on
eliminating ISI, rather than a complete inverse system.
At the input to the decision device, we simply try to enforce the zero-
ISI condition
peq(t0) = 1 and peq(t0 + nD) = 0 for n = 1, 2, ....
213
Zero-forcing equalizer
Consider a transversal-filter equalizer positioned between the receiver
filter and the decision device.
427
Zero-forcing equalizer
When the system commences operation, the values of pR(t0 - kD)
for k = -2N, ... , 2N are measured during a training phase.
The system of equations (1) is then a set of 2N+ 1 linear equations
in 2N + 1 unknowns: the tap gains c[n] easily solved.
Typically the equalizer (and the solution of the linear equa-
tions) is implemented digitally.
This approach attempts to zero out as much of the ISI as possible:
hence called a zero-forcing equalizer.
A side effect may be noise enhancement: the noise power input
to the decision device may increase.
In many scenarios, the channel response may vary with time.
In this case, we may need to periodically suspend transmission
of data while the equalizer is re-trained.
More advanced equalizers are able to update continuously or
428
track the channel.
This is called adaptive equalization.
214
Effect on BER: AWGN only
In a Gaussian channel (no fading) BER <=> Q(S/N)

erfc(S/N)
Typical BER vs. S/N curves

BER
Frequency-selective channel
(no equalization)
Gaussian
channel Flat fading channel
(no fading)
S/N
429
Effect on BER: Flat Fading
Flat fading: BER = BER ( S N z ) p ( z ) dz

z = signal power level

BER
(no equalization)
Gaussian
(no fading)
S/N
430
215
Effect on BER: ISI/Frequency selective

channel
Frequency selective fading <=> irreducible BER floor!

BER
(no equalization)
Gaussian
(no fading)
S/N
431
Effect on BER: using equalization
Diversity (e.g. multipath diversity) <=> improved

performance

BER
Gaussian Frequency-selective channel

channel (with equalization)
Flat fading channel
(no fading)
S/N
432
216
Complexity and Adaptation
Nonlinear equalizers (DFE, MLSE) have

better performance but higher complexity
Equalizer filters must be FIR

Can approximate IIR Filters as FIR filters
Truncate or use MMSE criterion
Channel response needed for equalization

Training sequence used to learn channel
Tradeoffs in overhead, complexity, and delay
Channel tracked during data transmission
Based on bit decisions
Cant track large channel fluctuations
433
Equalization: Summary
Equalizer equalizes the channel response in frequency domain to
remove ISI
Can be difficult to design/implement,
Can get noise enhancement (linear EQs) or error propagation
(decision feedback EQs)
434
217
Statistical Tools for

Telecommunications
Probability & Stochastic

processes
Introduction
218
Elementary probability concepts
Experiment, sample space and event
219
Formal definition of probability: Axioms
Probability measure: Important

properties
220
Union Bound
A B
P(A B) P(A) + P(B)
P(A1 A2 AN) i= 1..N P(Ai)
Applications:
Getting bounds on BER (bit-error rates),
Bounding the tails of probability distributions
Experiment, outcome, probability, event

and sample space
Think of probability as modeling an experiment
E.g.: tossing a coin!
The set of all possible outcomes is the sample space:
S
Any subset A of S is an event
Classic Experiment:
Tossing a die:S = {1,2,3,4,5,6}
Any subset A of S is an event:
A = {the outcome is even} = {2,4,6}
221
Probability of events: Axioms

P is the probability mass function if it maps each
event A, into a real number P(A), and:
i.) P ( A ) 0 for every event A S
ii.) P(S) = 1
iii.) If A and B are mutually exclusive events then,
A B
P ( A B ) = P ( A ) + P (B )
A B =
Probability of events
In fact for any sequence of pair-wise-mutually-
exclusive events, we have
A1, A2 , A3 ,... (i.e. Ai A j = 0 for any i j )
Ai A j = , and UA i =
S.
i =1
A1

A2
Ai P An = P ( An )
A
n =1 n =1
j An
222
Conditional probability
Total probability theorem
223
Conditional probability and

independence
P ( A | B ) = (conditional) probability that the
outcome is in A given that we know the
outcome in B
P ( AB )
P( A | B) = P (B ) 0
P (B )
Note that: P ( AB ) = P (B )P ( A | B ) = P ( A )P (B | A )
Events A and B are independent if P(AB) = P(A)P(B).
Also: P ( A | B ) = P ( A) and P (B | A ) = P (B )
Random variables
224
Random variable as a measurement

Thus a random variable can be thought of as a
measurement (yielding a real number) on an experiment
Maps events to real numbers
We can then talk about the cdf, pdf, and define the
mean/variance and other moments
Cumulative Distribution Function (CDF or

cdf)
225
Cumulative Distribution Function

The cumulative distribution function (CDF) for a random
variable X is
FX ( x) = P( X x) = P({s S | X ( s ) x})
Note that FX ( x ) is non-decreasing in x, i.e.
x1 x2 Fx ( x1 ) Fx ( x2 )
Also lim Fx ( x) = 0 and lim Fx ( x) = 1
x x
Plots of cdf
226
Cumulative Distribution Function

(CDF)
1
0 .9 L o g n o rm a l(0 ,1 )
G a m m a (.5 3 ,3 )
0 .8 E x p o n e n tia l(1 .6 )
W e ib u ll(.7 ,.9 )
0 .7 P a re to (1 ,1 .5 )
0 .6
F(x)
0 .5
0 .4
0 .3
0 .2
median
0 .1
0
0 2 4 6 8 10 12 14 16 18 20
x
Emphasizes skews, easy identification of median/quartiles,

converting uniform rvs to other distribution rvs
Complementary CDFs (CCDF)

0
10
-1
10
-2
10
log(1-F(x))
-3
10 L o g n o rm a l(0 ,1 )
G a m m a (.5 3 ,3 )
E x p o n e n tia l(1 .6 )
W e ib u ll(.7 ,.9 )
-4
10 P a re to II(1 ,1 .5 )
P a re to I(0 .1 ,1 .5 )
-1 0 1 2
10 10 10 10
lo g (x )
Useful for focusing on tails of distributions:

Line in a log-log plot => heavy tail
227
Probability Density Function (PDF or pdf)
Histogram: Plotting frequencies
Class Freq.
Count 15 but < 25 3
5 25 but < 35 5
35 but < 45 2
Frequency 4
Relative 3
frequency Bars
2
Percent
1
0 15 25 35 45 55
Lower Boundary
228
Probability distribution function (pdf):

Continuous version of histogram
a.k.a. frequency histogram, p.m.f (for discrete r.v.)
Continuous probability density

function
Mathematical formula
Frequency
Shows all values, x, and
frequencies, f(x) (Value, Frequency)
f(x) is not probability f(x)

Properties
f (x )dx = 1
x
All X a b
(Area Under Curve)
f ( x ) 0, a x b Value
229
Continuous-valued random variables
Thus, for a continuous random variable X, we can

define its probability density function (pdf)
Note that since FX ( x) is non-decreasing in x we

have
f X ( x) 0 for all x.
Example: Uniform random variable
230
Gaussian (or normal) random variable
Gaussian random variable

(a) Emg (electromyography) signal
231
Gaussian random variable

(b) Histogram and pdf fits
probability density functions (pdf)

1 .5
L o g n o rm a l(0 ,1 )
G a m m a (.5 3 ,3 )
E x p o n e n tia l(1 .6 )
W e ib u ll(.7 ,.9 )
P a re to (1 ,1 .5 )
1
f(x)
0 .5
0
0 0 .5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5
x
Emphasizes main body of distribution, frequencies,

various modes (peaks), variability, skews
232
Functions of random variables
Numerical data properties
Central Tendency
(Location)
Variation (Dispersion)
Shape
233
Numerical data:
Properties & measures
Numerical Data
Properties
Central
Variation Shape
Tendency
Mean Range Skew
Median Inter-quartile Range
Mode Variance
Standard Deviation
Expectation of random variables
234
Expectation of a random variable:

E[X]
The expectation (average) of a (discrete-valued) random variable X is

X = E ( X ) = xP( X = x) = xPX ( x)
x =
Expectation
The expectation (average) of a continuous random variable X
is given by

E( X ) = xf

X ( x)dx
Note that this is just the continuous equivalent of the discrete

expectation

E ( X ) = xPX ( x)
x =
235
Other Measures: Median and mode
Median = F-1 (0.5), where F = CDF

Aka 50% percentile element
Order the values and pick the middle element
Used when distribution is skewed
Considered a robust measure
Mode: Most frequent or highest probability value

Multiple modes are possible
Need not be the central element
Mode may not exist (e.g. uniform distribution)
236
Indices/Measures of spread/dispersion
Why care?
You can drown in a river of average depth 15 cm!

Lesson: The measure of uncertainty or dispersion may matter more than
the index of central tendency
Expectation of random variables
237
ACpower = Totalpower DCpower =

variance
Variance, standard deviation, coefficient

of variation, SIQR
Variance: second moment around the mean:
2 = E[(X-)2]
Standard deviation =
Coefficient of Variation (C.o.V.) = /
SIQR= Semi-Inter-Quartile Range (used with median

= 50th percentile)
(75th percentile 25th percentile)/2
238
Multiple random variables
239
240
Correlation coefficient
Covariance and correlation: Measures of

dependence
Covariance: =
For i = j, covariance = variance!

Independence => covariance = 0 (not vice-versa!)
Correlation (coefficient) is a normalized (or scaleless) form

of covariance:
Between 1 and +1.

Zero => no correlation (uncorrelated).
Note uncorrelated DOES NOT mean independent!
241
Random vectors and sum of r.v.s
Random vector = [X1, , Xn], where Xi = r.v.

Covariance matrix:
K is an nxn matrix
Kij = Cov[Xi,Xj]
Kii = Cov[Xi,Xi] = Var[Xi]
Sum of independent r.v.s

Z=X+Y
PDF of Z is the convolution of PDFs of X and Y
Can use transforms!
Characteristic function
The distribution of a random variable X can be determined from its
characteristic function, defined as
It captures all the moments, and is related to the IFT of pdf:

We see that the characteristic function X() of X(t) is the inverse
Fourier transform of the distribution pX(x) evaluated at f = /(2). Thus
we can obtain pX(x) from X() as
This will become significant in finding the distribution for sums of random
variables.
242
Important (discrete) random

variable: Bernoulli
The simplest possible measurement on an experiment:
Success (X = 1) or failure (X = 0).
Usual notation:
PX (1) = P( X = 1) = p PX (0) = P( X = 0) = 1 p
A discrete random variable that takes two values 1 and 0 with
probabilities p and 1-p.
Good model for a binary data source whose output is 1 or 0.
Can also be used to model the channel errors.
Bernoulli random variable
243
Binomial random variable
Binomial distribution
P(X)
.6
.4 n = 5 p = 0.1
.2
.0 X
Mean 0 1 2 3 4 5
= E ( x ) = np
P(X) n = 5 p = 0.5
Standard Deviation .6
.4
.2
= np (1 p) .0 X
0 1 2 3 4 5
244
Binomial distribution
Binomial can looks like skewed or normal
Depends upon
p and n !
Binomials for different p, N =20

Distribution of Blocks Experiencing k losse s out of N
Distribution of Blocks Experiencing k losses out of N
25.00%
30.00%
25.00% 20.00%
20.00%
Number of Blocks
Number of Blocks
15.00%
15.00%
10.00%
10.00%
5.00%
5.00%
0.00% 0.00%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Num ber of Losses out of N = 20 Number of Losses out of N = 20
10% PER 30% PER

Distribution of Blocks Experiencing k losses out of N
Npq = 1.8 20.00% Npq = 4.2

As Npq >> 1, better approximated by normal 18.00%
16.00%
distribution near the mean: 14.00%
symmetric, sharp peak at mean, exponential-square

Number of Blocks
12.00%
(e-x^2) decay of tails

10.00%
8.00%
(pmf concentrated near mean) 6.00%
50% PER
4.00%
Npq = 5
2.00%
0.00%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Num ber of Losses out of N = 20
245
Important random variable:

Poisson
A Poisson random variable X is defined by its PMF: (limit of binomial)
x
P( X = x) = e x = 0,1, 2,...
x!
where > 0 is a constant
It can be shown that

PX ( x) = 1
x =0
and E(X) =
Poisson random variables are good for counting frequency of occurrence:
like the number of calls that arrive to a switchboard in one hour (busy
hour), or the number of packets that arrive to a router in one second.
Important continuous random variable:

Exponential
Used to represent time, e.g. until the next arrival
Has PDF e x for x 0
X f ( x) = {
0 for x < 0
for some >0

Properties:

1

0
f X ( x)dx = 1 and E ( X ) =

246
Gaussian/Normal distribution
Normal distribution:
Completely characterized by
mean () and variance (2)
Q-function: one-sided tail of

normal pdf
erfc(): two-sided tail.

So:
Normal distribution: Why?

Uniform distribution
looks nothing like
bell shaped (Gaussian)!
Large spread ()!
CENTRAL LIMIT TENDENCY!
Sum of r.v.s from a uniform distribution after

very few samples looks remarkably normal
BONUS: it has decreasing !
247
Central Limit Theorem

Let X1, X2, X3, , XN denote N mutually independent random variables
whose individual distributions are not known and they are not necessarily
Gaussian distributed.
The theorem establishes that the sum of the N random variables (say, X) is
a random variable which tends to follow a Gaussian (or, normal)
distribution as N .
Further,
a) The mean of the sum random variable X is the sum of the mean values
of the constituent random variables and
b) the variance of the sum random variable X is the sum of the variance
values of the constituent random variables X1, X2, X3, , XN .
The Central Limit Theorem is very useful in modeling and analyzing
several situations in the study of electrical communications.
However, one necessary condition to look for before invoking Central
Limit theorem is that no single random variable should have significant
contribution to the sum random variable.
Gaussian distribution
Rapidly dropping tail probability
Why? Doubly exponential PDF (e-z^2 term)

A.k.a: Light tailed (not heavy-tailed).
No skew
Fully specified with just mean and variance (2nd order)
248
Error functions
Error functions
249
Error functions
Q and Phi () functions

Phi function: CDF of standard Gaussian
Q function: Complementary CDF of standard Gaussian
250
Q and Phi function properties
By the symmetry of the N(0,1) density, we have: Q(x) = (x)
From their definitions, we have: Q(x) = 1 (x)
Combining, we get:
Note: we can express Q and Phi functions with any real-valued arguments in terms
of the Q function with positive arguments alone.
Simplifies computation, enables use of bounds and approximations
for Q functions with positive arguments.
Q function has exponentially decaying

tails
Asymptotically tight bounds for large arguments
Very Important Conclusion: The asymptotic behavior of the Q function is
The rapid decay of the Q function implies, for example, that: Q(1) + Q(4) Q(1)
Design implications: We will use this to identify dominant events causing errors
Useful bound for analysis (and works for small arguments):
251
Plots of Q function and its bounds

Asymptotically tight
Useful for analysis

and for small arguments
Note the rapid decay: y-axis

has log scale
252
Height and spread of Gaussian can vary
Gaussian R.V.
Standard Gaussian :
Tail: Q(x)
tail decays exponentially!
Gaussian property preserved

with linear transformations:
253
Standardized normal distribution
X
Z=
Normal Standardized normal
distribution distribution
=1
X = 0 Z
One table!
Obtaining the probability
Standardized normal
probability table (portion)
Z .00 .01 .02 =1
0.0 .0000 .0040 .0080
.0478
0.1 .0398 .0438 .0478
0.2 .0793 .0832 .0871
= 0 .12 Z
0.3 .1179 .1217 .1255 Shaded area
Probabilities exaggerated
254
Example: P(X 8)
X 85
Z= = = .30
10
Normal Standardized Normal
Distribution Distribution
= 10 =1
.5000
.3821
.1179
=5 8 X =0 .30 Z
Shaded area exaggerated
Q-function:
Tail of normal
distribution
Q(z) = P(Z > z) = 1 P[Z < z]
255
Sampling from non-normal populations
Central tendency
Population distribution
x =
= 10
Dispersion

x =
n = 50 X
Sampling distribution
Sampling with
n=4 n =30
replacement X = 5 X = 1.8
X- = 50 X
Central Limit Theorem (CLT)
As sample
size gets
large
enough
(n 30) ...
256
Central Limit Theorem (CLT)

As sample x =
size gets n
sampling
large distribution
enough becomes
(n 30) ... almost normal.
X
x =
Comment on CLT
Central limit theorem works if original distribution are not
heavy tailed
Need to have enough samples.
E.g. with multipaths, if there is not rich enough
scattering, the convergence to normal may have not
happened yet.
Moments converge to limits.
Trouble with aggregates of heavy tailed distribution
samples
Rate of convergence to normal also varies with
distributional skew, and dependence in samples
257
Jointly Gaussian random variables (or

Gaussian random vectors)
Multiple random variables defined on a common
probability space are also called random vectors
same probability space means we can talk about joint
distributions
A random vector is Gaussian (or the random variables
concerned are jointly Gaussian) if any linear combination
is a Gaussian random variable
These arise naturally when we manipulate Gaussian noise
Correlation of Gaussian noise
Multiple samples of filtered Gaussian noise
Joint distribution characterized by mean vector and
covariance matrix
Analogous to mean and variance for scalar Gaussian
Jointly Gaussian distribution (bivariate)
258
Joint pdf for x = y = 1 and x,y = 0
Joint pdf for x = y = 1 and x,y = 0.95
259
Multivariate Gaussian pdf
Mean vector and covariance matrix
Lets first review these for arbitrary random

vectors m x 1 random vector
Mean Vector (m x 1)
Covariance Matrix (m x m)
(i,j)th entry
Compact
representation
260
Properties of covariance
Covariance unaffected when we add constants
Adding constants changes the mean but not the covariance.

So we can always consider zero mean versions of random variables when
computing covariance.
Common scenario: Mean is due to signal, covariance is due to noise.

So we can often ignore the signal when computing covariance.
Covariance is a bilinear function (i.e., multiplicative constants pull out)
Quantities related to covariance
Variance of a random variable is its covariance with itself
Correlation coefficient is the normalized covariance
261
Mean and covariance evolve separately under

affine transformations
Mean of Y depends only on the mean of X
Covariance of Y depends only on the covariance of X (and does not

depend on the additive constant b)
Back to Gaussian random vectors
is a Gaussian random vector if any linear combination

A Gaussian random vector is completely characterized by its
mean vector and covariance matrix.
Notation:
Why? Consider the characteristic function of X (which specifies

its distribution)
X ( u) = E [e j ( u1 X 1 +...+ un X n ) ] = E [e ju X ]
T
But the linear combination uT X ~ N(uT mX ,uT C X u) is a Gaussian random

variable whose distribution depends only on the mean vector and covariance matrix of X
Thus, the characteristic function, and hence the distribution, of X depends only on its
mean vector and covariance matrix.
262
Joint Gaussian density

Exists if and only if covariance matrix is invertible. If so, is given by:
How would we compute expectation of p(X), where X is a Gaussian random

vector?
Integrating over multiple dimensions is tedious. Instead we can use Monte Carlo simulations.
--start with samples of independent N(0,1) random variables
--transform to random vector with desired joint Gaussian stats
--evaluate function
--average over runs
Often we deal with a linear combination (e.g., sample at output of a filter), which
are simply scalar Gaussian, so we do not need multidimensional integration.
Independence and uncorrelatedness
Two random variables are uncorrelated if their covariance is zero.
Independent random variables are uncorrelated
Uncorrelated random variables are not necessarily independent
Uncorrelated, jointly Gaussian, random variables are independent

Diagonal covariance matrix means inverse is also diagonal,
and joint density decomposes into product of marginals.
263
Gaussian vectors (real-valued)

Collection of i.i.d. standard Gaussian r.v.s:
Euclidean distance from the origin to w
The density f(w) depends only on the magnitude of w, i.e. ||w||2
Orthogonal transformation O (i.e., OtO = OOt = I) preserves the magnitude of a

vector
Gaussian random vectors

Linear transformations of the standard Gaussian vector:
pdf: has covariance matrix K = AAt in the quadratic form instead of 2
When the covariance matrix K is diagonal, i.e., the component random

variables are uncorrelated. Uncorrelated + Gaussian => independence.
White Gaussian vector => uncorrelated, or K is diagonal
Whitening filter => convert K to become diagonal (using eigen-
decomposition)
Note: Normally AWGN noise has infinite components, but it is projected

onto a finite signal space to become a Gaussian vector.
264
Complex Gaussian R.V: Circular Symmetry

A complex Gaussian random variable X whose real and
imaginary components are i.i.d. gaussian
satisfies a circular symmetry property:
ejX has the same distribution as X for any .
ej multiplication: rotation in the complex plane.
We shall call such a random variable circularly symmetric
complex Gaussian, denoted by CN(0, 2), where 2 = E[|X|2].
Complex Gaussian: Summary
265
Complex Gaussian vectors: Summary
We will often see equations like:
Here, we will make use of the fact

that projections of w are complex Gaussian, i.e.:
h can describe the complex channel
Related distributions
X = [X1, , Xn] is Normal

||X|| is Rayleigh ( e.g., magnitude of a complex gaussian channel X1 +
jX2 )
||X||2 is Chi-Squared with n-degrees of freedom
When n = 2, chi-squared becomes exponential (e.g., power in
complex gaussian channel: sum of squares)
266
Random processes
Interpretation of random process

A random process X(A,t) can be viewed as a function of two
variables: an event A and time t.
In the next figure there are N sample functions of time,
{Xj(t)}.
Each of the sample functions can be regarded as the output of
a different noise generator.
For a specific event Aj we have a single time function X(Aj,t)
= Xj(t) (i.e., a sample function).
The totality of all sample functions is called an ensemble.
For a specific time tk, X(A,tk) is a random variable X(tk) whose
value depends on the event.
Finally, for a specific event, A = Aj, and a specific time t=tk,
X(Aj,tk) is simply a number.
For notational convenience we often shall designate the
random process by X(t), and let the functional dependence
upon event A be implicit.
267
Random noise process
Random sequences and random processes
268
Random processes
Random process
A random process is a collection of time functions, or signals,
corresponding to various outcomes of a random experiment.
For each outcome, there exists a deterministic function, which is called a
sample function or a realization.
Random
variables
Real number
Sample functions
or realizations
(deterministic
function)
time (t)
269
Examples of random processes
Examples of random processes
270
Random process: Definition

A random process is defined by all its joint CDFs
for all possible sets of sample times
tn
t2
t0 t1
Random processes
271
Classification of random processes
(weak sense stationarity)
Example: Stationary vs. nonstationary
The outside temperature of the house is an example of a nonstationary

random process, as the expected temperature in the summer is warmer than
in the winter.
The temperature in your refrigerator can be modelled as a stationary
random process.
272
Stationarity
If time-shifts (any value T) do not affect its joint CDF
tn+T
tn
t2 t +T t2+T
t0 + T 1
t0 t1
Wide (Weak) sense stationarity (WSS)
Many nonstationary random processes have the property that

the mean and autocorrelation functions are independent of
time.
Such random processes are referred to as wide-sense stationary
(WSS).
Note: All strict-sense stationary random processes are also wide-

sense stationary, but not all wide-sense stationary random
processes are strict-sense stationary.
Note: All WSS Gaussian random processes are also stationary in
the strict sense.
273
LTI systems: WSS in WSS out
Keep only above two properties (2nd order stationarity)

Dont insist that higher-order moments or higher order joint
CDFs be unaffected by lag T
With LTI systems, we will see that WSS inputs lead to WSS
outputs,
In particular, if a WSS process with PSD SX(f) is passed through a linear
time-invariant filter with frequency response H(f), then the filter output
is also a WSS process with power spectral density |H(f)|2SX(f).
Gaussian w.s.s. = Gaussian stationary process (since it only has

2nd order moments)
Stationarity: Summary
Strictly stationary: If none of the statistics of the random
process are affected by a shift in the time origin.
Wide sense stationary (WSS): If the mean and

autocorrelation function do not change with a shift in the
origin time.
Cyclostationary: If the mean and autocorrelation function

are periodic in time.
274
Statistical averages or joint moments
Example: Mean value or the 1st moment
275
Example: Mean-squared value or the 2nd

moment
Correlation
276
Power Spectral Density (PSD) of a random

process
Time averaging and ergodicity
277
Random processes and LTI systems
Ergodicity
Time averages = Ensemble averages
[i.e. ensemble averages like mean/autocorrelation can be computed as time-
averages over a single realization of the random process]
A random process: ergodic in mean and autocorrelation (like w.s.s.) if
and
278
Power Spectral Density (PSD)

For deterministic signals, the power spectrum is usually found by taking the Fourier
transform of the signal. For stationary random processes, the power spectrum
(spectral density) is found by taking the Fourier transform of the autocorrelation
function.
1. SX(f) is real and SX(f) 0

2. SX(-f) = SX(f)
3. AX(0) = SX() d
Power spectrum
For a deterministic signal x(t), the spectrum is well defined: If X ( )
represents its Fourier transform, i.e., if
+
X ( ) = x(t )e jt dt ,
then | X ( ) |2 represents its energy spectrum.
This follows from Parsevals theorem since the signal energy is given by
+ +
x (t )dt = 21 | X ( ) | d = E.
2 2
Thus | X ( ) |2 represents the signal energy in the band ( , + )

| X ( )|2
X (t ) Energy in( , + )
0 t 0
+
279
Spectral density: Summary

Energy signals:
Energy spectral density (ESD):
Power signals:
Power spectral density (PSD):
Random process:
Power spectral density (PSD):
Note: We have used f for and Gx for Sx
Properties of autocorrelation function
For real-valued (and WSS for random signals):

1. Autocorrelation and spectral density form a Fourier
transform pair RX() SX()
2. Autocorrelation is symmetric around zero RX(-) = RX()
3. Its maximum value occurs at the origin |RX()| RX(0)
4. Its value at the origin is equal to the average power or
energy
280
Autocorrelation: Summary
Autocorrelation of an energy signal
Autocorrelation of a power signal
For a periodic signal:
Autocorrelation of a random signal
For a WSS process:
Signal transmission with linear systems

(filters)
Input Output
Linear system
Deterministic signals:
Random signals:
Ideal distortionless transmission:

All the frequency components of the signal not only arrive with an identical
time delay, but also amplified or attenuated equally.
281
Deterministic systems with stochastic inputs
Deterministic systems
Memoryless Systems Systems with Memory
Y ( t ) = g [ X ( t )]
Time-varying Time-Invariant Linear systems

systems systems Y ( t ) = L[ X ( t )]
Linear-Time Invariant
(LTI) systems
+
X (t ) h(t ) Y ( t ) = h ( t ) X ( ) d
+
LTI system = h ( ) X ( t ) d .
282
LTI systems
WSS input is good enough
Noise
283
Noise in communication systems

Noise in communcation systems is often described by a zero-mean
Gaussian random process, n(t).
This process is stationary and its PSD is flat, hence, it is called white
noise.
Observations at different times, no matter how close, are uncorrelated
(and therefore independent, since the process is Gaussian), i.i.d.
Gaussian.
[w/Hz]
Power spectral
density
Autocorrelation
function
Probability density function
White Gaussian Noise (WGN)

White:
Similar to white light contains equal amounts of all frequencies in the
visible band of EM spectrum
Power spectral density (PSD) is constant, i.e. flat, for all frequencies of
interest (from dc to 1012 Hz)
Autocorrelation is a delta function => two samples, no matter however
close, are uncorrelated.
N0/2 to indicate two-sided PSD
Zero-mean gaussian completely characterized by its variance (2)
Variance of filtered noise is finite = N0/2
Gaussian + uncorrelated => i.i.d.
Affects each symbol independently: memoryless channel
Practically: if bandwith of noise is much larger than that of the system:
white Gaussian noise approximation is good enough
Note: Colored noise exhibits correlations at positive lags
284
Shannon capacity with AWGN
Approximation:
log2(1+x) x for small x
Application: Shannon capacity with AWGN noise:

Bits-per-Hz = C/B = log2(1+ SNR)
If we can increase SNR linearly when SNR is small (e.g.
cell-edge) we get a linear increase in capacity.
When SNR is large, of course increase in SNR gives only a

diminishing return in terms of capacity: log (1+ SNR)
C/B = log2(1+ SNR) log2(SNR) , when SNR>>
Gaussian random process

Random process: collection of random variables on a common
probability space.
(simple generalization of random vectors: instead of the number of random
variables being finite, we can have countably or uncountably many of them.)
Gaussian random process: any linear combination of samples is a

Gaussian random variable.
is a Gaussian random process
for any choice of number of samples, sampling times

and combining coefficients
are jointly Gaussian

for any choice of number of samples, sampling times
and combining coefficients
285
Characterizing a Gaussian random

process
Statistics of Gaussian random process completely specified by

mean function and autocorrelation/autocovariance function
Why? Need to be able to specify the statistics of any collection of samples.

Since these are jointly Gaussian, only need their means and covariances.
WSS Gaussian random processes are stationary.
Why?
Gaussian random process characterized by second order stats.
If second order stats are shift-invariant, then we cannot distinguish
statistically between shifted versions of the random process.
White Gaussian Noise (WGN)
Real-valued WGN: zero mean, WSS, Gaussian random process with
Sn ( f ) = N 0 /2 = 2 Rn ( ) = (N 0 /2) ( ) = 2 ( )
Two-sided PSD
(need to integrate over both positive and negative frequencies to get the power)
(need to integrate only over physical band of

One-sided PSD: N0 positive frequencies)
Complex-valued WGN: Real and imaginary components are i.i.d.

real valued WGN
286
Why we use a physically unrealizable

noise model
WGN has infinite power
Not physically realizable
Actual receiver noise is bandlimited and has finite power
OK and convenient to assume WGN at receiver input
Receiver noise PSD is relatively flat over typical
receiver bandwidths
Receiver always performs some form of bandlimiting
(e.g., filtering, correlation), at the output of which we
have finite power
Output noise statistics with WGN as input and
bandlimited noise at input are identical
Why is WGN more convenient?
Impulsive autocorrelation function makes computation
of output second order stats much easier
Modeling using WGN

Physical baseband system: corrupted by
real-valued WGN
Replace bandlimited noise by infinite-power
WGN at input to receiver
Physical passband system: complex
envelope of passband receiver noise
modeled as complex-valued WGN
Replace bandlimited noise by infinite-power
WGN at input to receiver
287
Modeling using WGN: the big picture
How much is N0?
Ideal receiver at room temperature
Boltzmanns constant
room temperature (usually set to 290K)
Raised this using the receiver Noise Figure

for noise figure of F dB
E.g. if F=6 dB and B=20MHz
288
Noise power computation

Communication theory can work with signal-to-noise
ratios
But we do need absolute numbers when calculating the
link budget: need to calculate required signal power
based on the actual value of noise power
Example: B = 20 MHz bandwidth, receiver noise figure (F) of 6 dB

Noise power
Noise power in dBm
White noise process & LTI systems
289
White noise
Example
290
Example
Basic detection and estimation

concepts
291
AWGN Channel and hypothesis testing
One of M signals sent:
Receiver has to decide between one of M hypotheses based on

the received signal, which is modeled as:
where
Need to learn some detection theory first, before we can deal with
this hypothesis testing problem.
Likelihood principle
Experiment:
Pick Urn A or Urn B at random
Select a ball from that Urn.
The ball is black.
What is the probability that the selected Urn is A?
292
Write out what you know!

P(Black | UrnA) = 1/3
P(Black | UrnB) = 2/3
P(Urn A) = P(Urn B) = 1/2
We want P(Urn A | Black).
Intuition: Urn B is more likely than Urn A (given that the ball is black). But
by how much?
This is an inverse probability problem.
Solution technique: Use Bayes Theorem.
Bayes manipulations:
P(Urn A | Black) = P(Urn A and Black) /P(Black)
Decompose the numerator and denomenator in terms of the probabilities we know.
P(Urn A and Black) = P(Black | UrnA)*P(Urn A)

P(Black) = P(Black| Urn A)*P(Urn A) + P(Black| UrnB)*P(UrnB)
We know all these values.

P(Urn A and Black) = 1/3 * 1/2
P(Black) = 1/3 * 1/2 + 2/3 * 1/2 = 1/2
P(Urn A and Black) /P(Black) = 1/3 = 0.333
Notice that it matches our intuition that Urn A is less likely, once we have seen
black.
The information that the ball is black has CHANGED !

From P(Urn A) = 0.5 to P(Urn A | Black) = 0.333
293
Likelihood detection concepts
Hypotheses: Urn A or Urn B ?

Observation: Black
Prior probabilities: P(Urn A) and P(Urn B)
Likelihood of Black given choice of Urn: {aka forward probability}
P(Black | Urn A) and P(Black | Urn B)
Posterior Probability: of each hypothesis given evidence
P(Urn A | Black) {aka inverse probability}
Likelihood Principle (informal): All inferences depend ONLY on
The likelihoods P(Black | Urn A) and P(Black | Urn B), and
The priors P(Urn A) and P(Urn B)
Result is a probability (or distribution) model over the space of possible hypotheses.
Maximum Likelihood (intuition)

Recall:
P(Urn A | Black) = P(Urn A and Black) /P(Black) =
P(Black | UrnA)*P(Urn A) / P(Black)
P(Urn? | Black) is maximized when P(Black | Urn?) is maximized.

Maximization over the hypotheses space (Urn A or Urn B)
P(Black | Urn?) = likelihood

=> Maximum Likelihood approach to maximizing posterior probability
294
Maximum Likelihood
Max likelihood
This hypothesis has the highest (maximum)

likelihood of explaining the data observed
Maximum Likelihood (ML) mechanics
Independent Observations (like Black): X1, , Xn

Hypothesis
Likelihood Function: L() = P(X1, , Xn | ) = i
P(Xi | )
{Independence => multiply individual likelihoods}
Log Likelihood LL() = i log P(Xi | )
Maximum likelihood: by taking derivative and
setting to zero and solving for
P
295
Back to urn example

In our urn example, we are asking:
Given the observed data ball is black
which hypothesis (Urn A or Urn B) has
the highest likelihood of explaining this
observed data?
Ans from above analysis: Urn B
Not Just Urns and Balls: Detection of signal

in AWGN
Hypothesis testing framework

Want to determine which of M possible hypotheses best explains an observation?
Three ingredients:
Hypotheses
Observation takes values in observation space
(assume finite-dimensional--good enough for our purpose)
Statistical relationship between hypotheses and observation expressed

through the conditional densities of the observation given each hypothesis
Conditional densities
Fourth ingredient needed for Bayesian hypothesis testing
Prior probabilities
296
Decision rule
A decision rule is a mapping from the observation space to the set of hypotheses
Can also view it as a partition of the observation space into M disjoint

regions:

1
5
2 4 3
Basic Gaussian example
0 or 1 sent
Conditional densities
Could model noisy sample

at output of an equalizer
297
Basic Gaussian example (contd.)
Sensible rule: split the difference

Would this rule make sense if we know for sure that 0 was sent?
What if we know beforehand that 0 was sent with probability 0.75?
What if the noise is not Gaussian or additive?
Need a systematic framework for deriving good decision rules.

First step: define the performance metrics of interest
Performance metrics for evaluating decision

rules
Conditional error probabilities
Average error probability
Error probs for sensible rule in basic Gaussian example:

Conditional error probs
( )
( )
Average error prob regardless of prior probs
298
Maximum Likelihood (ML) rule

Choose the hypothesis that maximizes the conditional probability of
the observation:
Check: Sensible rule for the basic Gaussian example is the ML rule
ML rule seems like a good idea. Is there anything optimal about it?
Minimizes error probability if all hypotheses are equally likely
Asymptotically optimal when observations can be trusted more and

more (e.g., high SNR, large number of samples).
Minimum Probability of Error (MPE) rule

which turns out to be the
Maximum A Posteriori Probability (MAP) rule
Minimize average probability of error (assume prior probabilities of

the hypotheses are known)
Lets derive it. Convenient to consider maximizing prob of correct decision.

For any given decision rule Decision regions
Conditional prob of correct decision
Average prob of correct decision
Consider any potential observation:

If we put it in the i-th decision region ( ), then our reward
(contribution to the prob of correct decision) is
MPE rule: choose i to maximize this contribution
299
MPE rule (contd.)

We have derived the MPE rule to be as follows:
1) MPE rule is equivalent to the Maximum A Posteriori Probability (MAP) rule
Posterior probability of hypothesis i

given the observation
Can rewrite MPE rule as follows:
2) MPE rule reduces to ML rule for equal priors ( )
We can drop from the maximization if it does not depend on i
Likelihood Ratio Test (LRT)

For binary hypothesis testing, MPE rule specializes to:
Can rewrite as a Likelihood Ratio Test (LRT)
Often we take log on both sides

to get Log LRT (LLRT)
Note: Comparing likelihood ratio to a threshold is a common feature of optimal decision

rules resulting from many different criteria--MPE, ML, Neyman-Pearson (radar problem
trading off false alarm vs. miss probabilities). The threshold changes based on the criterion.
The likelihood ratio summarizes all the information relevant to the hypothesis testing
problem; that is, it is a sufficient statistic
300
Likelihood Ratio for basic Gaussian

example
Compare log LR with zero to get sensible (ML) rule:
Note that the inequalities are reversed when m<0.
Irrelevant statistics
Consider the following hypothesis testing problem:
When can we throw Y2 away without performance degradation?

That is, when is Y2 irrelevant to our decision?
Intuition for two scenarios:
If the noises are independent, Y2 should be irrelevant, since it
contains no signal contribution.
If the noises are equal (extreme case of highly correlated), Y2 is
very relevantsubtract it out from Y1 to get perfect detection!
Need a systematic way of recognizing irrelevant statistics
301
Irrelevant statistics
BEGIN PROOF
Conditional densities are all that are relevant. Under the given conditions,
These depend on hypothesis only through the first observation. END PROOF
Relation to sufficient statistics: f(Y) is a sufficient statistic if

Y is irrelevant for hypothesis testing with (f(Y), Y). That is, once
we know f(Y), we have all the information needed for our decision, and
no longer need the original observation Y.
Irrelevant statistics: Example
Then theorem condition is (more than) satisfied to see Y2 is irrelevant:
Why?
This argument can be applied when deriving optimal receivers for

signaling in AWGN.
302
Big Picture: Detection under AWGN
Baseband digital link
303
Unipolar binary error probability
Decision threshold
304
Error probabilities and Q-function
Additive White Gaussian Noise

(AWGN)
Thermal noise is described by a zero-mean Gaussian random process,
n(t) that adds on to the signal => additive
Its PSD is flat, hence, it is called white noise.
Autocorrelation is a spike at 0: uncorrelated at any non-zero
lag
[W/Hz]
Power spectral
Density
(flat => white)
Autocorrelation
Function
Probability density function (uncorrelated)
(gaussian)
305
Detection of signal in AWGN

Detection problem:
Given the observation vector z , perform a mapping from z
of the transmitted symbol, mi , such that
to an estimate m
the average probability of error in the decision is
minimized.
n
mi si z m
Modulator Decision rule
Binary PAM + AWGN
pz (z | m2 ) pz (z | m1 )
s2 s1
1 (t )
Eb 0 Eb
Signal s1 or s2 is sent. z is received

Additive white gaussian noise (AWGN) => the likelihoods are
bell-shaped pdfs around s1 and s2
pz ( z | m1 ) pz (z | m2 )
MLE => at any point on the x-axis, see which curve (blue or red)
has a higher (maximum) value and select the corresponding
signal (s1 or s2) : simplifies into a nearest-neighbor rule
306
Effect of noise in signal space

The cloud falls off exponentially (gaussian).
Vector viewpoint can be used in signal space, with a random noise vector w
Maximum Likelihood (ML) detection: Scalar case
likelihoods
Assuming both symbols equally likely: uA is chosen if
Log-Likelihood => A simple distance criterion!
307
AWGN detection for Binary PAM
pz (z | m2 ) pz (z | m1 )
s2 s1
1 (t )
Eb 0 Eb
s s /2
Pe ( m1 ) = Pe (m2 ) = Q 1 2
N /2
0
2 Eb
PB = PE ( 2) = Q

N0
AWGN nearest neighbor detection
Projection onto the signal directions (subspace) is called matched filtering to

get the sufficient statistic
Error probability is the tail of the normal distribution (Q-function), based
upon the mid-point between the two signals
308
Detection in AWGN: Summary
Vector detection
309
Detection vs. estimation
In detection we have to decide which symbol was transmitted

sA or sB
This is a binary (0/1, true/false or yes/no) type answer,
with an associated error probability
In estimation, we have to output an estimate of a transmitted

signal h.
This estimate is a complex number, not a binary answer.
Typically, we try to estimate the complex channel h, so that
we can use it in coherent combining (matched filtering)
Estimation in AWGN: MMSE

Need:
Performance criterion: mean-squared error (MSE)
Optimal estimator is the conditional mean of x given the observation y

Gives Minimum Mean-Square Error (MMSE)
Satisfies orthogonality property:

Error independent of observation:
But, the conditional mean is a non-linear operator

It becomes linear if x is also gaussian.
Else, we need to find the best linear approximation (LMMSE)!
310
LMMSE
We are looking for a linear estimate: x = cy

The best linear estimator, i.e. weighting coefficient c is:
We are weighting the received signal y by the transmit

signal energy as a fraction of the received signal energy c.
The corresponding error (MMSE) is:
Linear Algebra Tools for

Advanced Telecommunications
622
311
What is linear and algebra?

Properties satisfied by a line through the origin (one-dimensional y
case).
A directed arrow from the origin (v) on the line, when scaled by a cv
constant (c) remains on the line v
Two directed arrows (u and v) on the line can be added to
create a longer directed arrow (u + v) in the same line. x
This is nothing but arithmetic with symbols!

Algebra: generalization and extension of arithmetic.
Linear operations: addition and scaling. y
Abstract and Generalize ! v

Line vector space having N dimensions u u+v
Point vector with N components in each of the N x
dimensions (basis vectors).
Vectors have: Length and Direction.
Basis vectors: span or define the space and its
dimensionality.
Linear function transforming vectors matrix.
The function acts on each vector component and scales it
Add up the resulting scaled components to get a new vector!
In general: f(cu + dv) = cf(u) + df(v)
623
Vector
Think of a vector as a directed line

segment in N-dimensions! a
v = b
Vector has length and direction r
Basic idea: convert geometry in higher c

dimensions into algebra!
Once you define a basis along each
dimension: x-, y-, z-axis y
Vector becomes a 1 x N matrix!
v = [a b c]T v
Geometry starts to become linear

algebra on vectors like v! x
312
Examples of geometry becoming algebra
Lines are vectors through the origin, scaled and translated:

mx + c
Intersection of lines can be modeled as addition of vectors: solution of
linear equations.
Linear transformations of vectors can be associated with a
matrix A, whose columns represent how each basis vector is
transformed.
Ellipses and conic sections:
ax2 + 2bxy + cy2 = d
Let x = [x y]T and A is a symmetric matrix with rows [a b]T and [b c]T
xTAx = d {quadratic form equation for ellipse!}
This becomes convenient at higher dimensions
Note how a symmetric matrix A naturally arises from such a
homogenous multivariate equation.
625
Scalar vs. matrix equations
Line equation: y = mx + c
Matrix equation: y = Mx + c
Second order equations:

xTMx = d
y = (xTMx)u + Mx
involves quadratic forms like xTMx
626
313
Vector addition: A+B

A+B = ( x1 , x2 ) + ( y1 , y2 ) = ( x1 + y1 , x2 + y2 )
A
A+B = C
(use the head-to-tail method
B to combine vectors)
C
B
627
Scalar product: av
av = a( x1 , x2 ) = (ax1 , ax2 )
av
v
Change only the length (scaling), but keep direction fixed.
Note: Matrix operation (Av) can change length,

direction and also dimensionality!
628
314
Vectors: Magnitude (length) and phase

(direction)
(unit vector => pure direction) Alternate representations:

Polar coordinates: (||v||, )
y
Complex numbers: ||v||ej
||v||
||v||

phase
x
629
Inner (dot) product: v.w or wTv
v
w v.w = ( x1 , x2 ).( y1 , y2 ) = x1 y1 + x2 . y2
The inner product is a SCALAR!
v.w = ( x1 , x2 ).( y1 , y2 ) =|| v || || w || cos
v.w = 0 v w
If vectors v, w are columns, then dot product is wTv
630
315
Inner products, norms, signal space

Signals modeled as vectors in a vector space: signal space
To form a signal space, first we need to know the inner
product between two signals (functions):
Inner (scalar) product: (generalized for functions)

< x(t ), y (t ) >= x(t ) y (t )dt
*

= cross-correlation between x(t) and y(t)
Properties of inner product:
< ax(t ), y (t ) >= a < x(t ), y (t ) >
< x(t ), ay (t ) >= a * < x(t ), y (t ) >
< x(t ) + y (t ), z (t ) >=< x(t ), z (t ) > + < y (t ), z (t ) >
631
Signal space
The distance in signal space is measure by calculating the norm.
What is norm?
Norm of a signal (generalization of length):

x(t ) = < x(t ), x(t ) > = x(t ) dt = E x
2

= length of x(t)
ax(t ) = a x(t )
Norm between two signals:
d x , y = x(t ) y (t )
We refer to the norm between two signals as the Euclidean

distance between two signals.
632
316
Example of distances in signal space

2 (t )
s1 = (a11 , a12 )
E1 d s1 , z
1 (t )
E3 z = ( z1 , z 2 )
d s3 , z E2 d s2 , z
s 3 = (a31 , a32 )
s 2 = (a21 , a22 )
Detection in
The Euclidean distance between signals z(t) and s(t): AWGN noise:
d si , z = si (t ) z (t ) = (ai1 z1 ) 2 + ( ai 2 z2 ) 2 Pick the closest
i = 1,2,3 signal vector
633
Bases and orthonormal bases

Basis (or axes): Frame of reference
Basis: a space is totally defined by a set of vectors any point is a linear

combination of the basis
Ortho-Normal: orthogonal + normal

x = [1 0 0] x y = 0
T
y = [0 1 0] x z = 0
Note: T
Orthogonal: dot product is zero

z = [0 0 1] yz = 0
T 634
Normal: magnitude is one
317
Projections and orthogonal basis

Get the component of the vector on each axis:
dot-product with unit vector on each axis!
Note: This is what Fourier transform does!

Projects a function onto a infinite number of orthonormal basis functions:
(ej or ej2n), and adds the results up (to get an equivalent representation
in the frequency domain).
CDMA codes are orthogonal, and projecting the composite received signal
on each code helps extract the symbol transmitted on that code. 635
Orthogonal projections: CDMA, Spread

Spectrum (SS)
Spread spectrum
Base-band Spectrum Radio Spectrum
Code B
Code A
B
B
Code A A
A
B C C
B B C
A A A B
A C
B
Time
Sender Receiver
636
Each code is an orthogonal basis vector signals sent are orthogonal
318
Matrix
A matrix is a set of elements, organized into rows and

columns rows
a b
c d
columns
637
Matrix (geometrically)
Matrix represents a linear function acting on vectors:
Linearity (a.k.a. superposition): f(au + bv) = af(u) + bf(v)
f transforms the unit x-axis basis vector i = [1 0]T to [a c]T
f transforms the unit y-axis basis vector j = [0 1]T to [b d]T
f can be represented by the matrix with [a c]T and [b d]T as columns
Why? f(w = mi + nj) = A[m n]T
Column viewpoint: focus on the columns of the matrix!
a b
c d

[0,1]T [a,c]T
[1,0]T
[b,d]T
Linear Functions f : Rotate and/or stretch/shrink the basis vectors
319
Matrix operating on vectors

Matrix is like a function that transforms the vectors on a plane
Matrix operating on a general point transforms x- and y-components
System of linear equations: matrix is just the bunch of coefficients !
a b x x'
x = ax + by
=
c d y y'
y = cx + dy
Vector (column) viewpoint:

New basis vector [a c]T is scaled by x, and added to:
New basis vector [b d]T scaled by y
i.e. a linear combination of columns of A to get [x y]T
Vector spaces, dimension, span

Another way to view Ax = b, is that a solution exists for all vectors b that lie in the
column space of A,
i.e. b is a linear combination of the basis vectors represented by the columns of
A
The columns of A span the column space
The dimension of the column space is the column rank (or rank) of matrix A.
In general, given a bunch of vectors, they span a vector space.

The dimension of the space is maximal only when the vectors are linearly
independent of the others.
Subspaces are vector spaces with lower dimension that are a subset of the
original space
Note: Linear channel codes (eg: Hamming, Reed-Solomon, BCH) can be viewed as
k-dimensional vector sub-spaces of a larger N-dimensional space.
k-data bits can therefore be protected with N-k parity bits
640
320
Forward Error Correction (FEC):

Eg: Reed-Solomon RS(N,K)
K of N Recover K
RS(N,K) received data packets!
FEC (N-K)
Block
Size Lossy Network
(N)
Data = K
This is linear algebra in action: design an appropriate k-dimensional vector sub-space

641
out of an N-dimensional vector space
Matrices: Scaling, rotation, identity

Pure scaling, no rotation => diagonal matrix (note: x-, y-axes could be scaled differently!)
Pure rotation, no stretching => orthogonal matrix O
Identity (do nothing) matrix = unit scaling, no rotation!
r1 0
0 r2
[0,1]T [0,r2]T
scaling
[1,0]T [r1,0]T
cos -sin
sin cos
[-sin, cos]T
[0,1]T
[cos, sin]T
rotation

[1,0]T 642
321
Scaling
P
r 0 a.k.a: dilation (r >1),

0 r
contraction (r <1)
643
Rotation
cos -sin
sin cos
644
322
Reflections
Reflection can be about any line or point.
Complex Conjugate: reflection about x-axis
(i.e. flip the phase to -)
Reflection => two times the projection
distance from the line.
Reflection does not affect magnitude
Induced Matrix
645
Orthogonal projections: Matrices
646
323
2D Translation
P
t
P
P
P ' = ( x + t x , y + t y ) = P+t ty
t
P
y
x tx 647
Basic matrix operations

Addition, Subtraction, Multiplication: creating new matrices (or functions)
a b e f a + e b + f
c d + g =
h c + g d + h
Just add elements

a b e f a e b f
c d g =
h c g d h
Just subtract elements

a b e f ae + bg af + bh
c d g = Multiply each row by
h ce + dg cf + dh each column
648
324
Multiplication
Is AB = BA? Maybe, but maybe not!
a b e f ae + bg ... e f a b ea + fc ...
c d g = =
h ... ... g
h c d ... ...
Matrix multiplication AB: apply transformation B first, and

then again transform using A!
Multiplication is NOT commutative!
Note: If A and B both represent either pure rotation or

scaling they can be interchanged (i.e. AB = BA)
649
Multiplication as composition
Different!
650
325
Inverse of a matrix
Identity matrix:
AI = A
Inverse exists only for square
matrices that are non-singular
Maps N-dim space to
1 0 0
I = 0 1 0
another N-dim space
bijectively
Some matrices have an
inverse, such that:
AA-1 = I
0 0 1
Inversion is tricky:
(ABC)-1 = C-1B-1A-1
Determinant of a matrix
a b
Used for inversion A=
If det(A) = 0, then A has no inverse c d
det( A) = ad bc
Note: Determinant-criterion for space-time
code design.
Good code exploiting time diversity 1 d b
A1 =
ad bc c a
should maximize the minimum
product distance between codewords.
Coding gain determined by min of
determinant over code words.
652
326
Projection: Using inner products
p = a (aTx)
||a|| = aTa = 1
653
Projection: Using inner products

p = a (aTb)/ (aTa)
Note: the error vector e = b-p

is orthogonal (perpendicular) to p.
i.e. Inner product: (b-p)Tp = 0
Orthogonalization principle: after projection, the difference or error is

orthogonal to the projection
Note: We can use this idea to find a least-squares line that minimizes
the sum of squared errors (i.e. min eTe).
This is also used in detection under AWGN noise to get the test statistic:
Idea: project the noisy received vector y onto (complex) transmit vector h:
matched filter/max-ratio-combining (MRC)
654
327
Schwartz Inequality and matched filter
Inner Product (aTx) Product of Norms (i.e. |a||x|)

Projection length Product of Individual Lengths
This is the Schwartz Inequality!
Equality happens when a and x are in the same direction (i.e. cos = 1, when
= 0)
Application: matched filter

Received vector y = x + w (zero-mean AWGN)
Note: w is infinite dimensional
Project y to the subspace formed by the finite set of transmitted symbols x: y
y is said to be a sufficient statistic for detection, i.e. reject the noise
dimensions outside the signal space.
This operation is called matching to the signal space (projecting)
Now, pick the x which is closest to y in distance (ML detection = nearest
neighbor)
655
Receiver without matched filter
Transmitted Signal Received Signal
Signal + AWGN noise will not reveal the original transmitted sequence.
There is a high power of noise relative to the power of the desired signal (low SNR).
If the receiver were to sample this signal at the correct times, the
resulting binary message would have a lot of bit errors.
656
328
Matched filter
Consider the received signal as a vector r, and the transmitted signal vector as s
Matched filter projects the r onto signal space spanned by s (matches it)
Filtered signal can now be safely sampled by the receiver at the correct sampling instants,
resulting in a correct interpretation of the binary message
Matched filter is the filter that maximizes the signal-to-noise ratio it can be
shown that it also minimizes the BER: it is a simple projection operation
657
Matched filter and repetition coding
hx1 only spans a

1-dim space
||h||
658
Multiply by conjugate => cancel phase!
329
Symmetric, Hermitian, positive definite
Symmetric: A = AT
Symmetric => square matrix
Complex vectors/matrices:
Transpose of a vector or a matrix with complex elements must involve a
conjugate transpose, i.e. flip the phase as well.
For example: ||x||2 = xHx, where xH refers to the conjugate transpose of x
Hermitian (for complex elements): A = AH
Like symmetric matrix, but must also do a conjugation of each element (i.e. flip
its phase).
i.e. symmetric, except for flipped phase
Note we will use A* instead of AH for convenience
Positive definite: symmetric, and its quadratic forms are strictly positive, for non-
zero x :
xTAx > 0
Geometry: bowl-shaped minima at x = 0
659
Orthogonal, unitary matrices: Rotations

Rotations and Reflections: Orthogonal matrices Q
Pure rotation => Changes vector direction, but not magnitude (no scaling
effect)
Retains dimensionality, and is invertible
Inverse rotation is simply QT
Unitary matrix (U): complex elements, rotation in complex plane

Inverse: UH (note: conjugate transpose).
Note:
Gaussian noise exhibits isotropy, i.e. invariance to direction. So any rotation
Q of a gaussian vector (w) yields another gaussian vector Qw.
Circular symmetric (c-s) complex gaussian vector w => complex rotation with
U yields another c-s gaussian vector Uw
Note: The Discrete Fourier Transform (DFT) matrix is both unitary and symmetric.
DFT is nothing but a complex rotation, i.e. viewed in a basis that is a rotated
version of the original basis.
FFT is just a fast implementation of DFT. It is fundamental in OFDM.
660
330
Quadratic forms: xTAx

Linear:
y = mx + c generalizes to vector equation
y = Mx + c ( y, x, c are vectors, M = matrix)
Quadratic expressions in 1-variable: x2

Vector expression: xTx ( projection!)
Quadratic forms generalize this, by allowing a linear transformation A as well
Multivariable quadratic expression: x2 + 2xy + y2

Captured by a symmetric matrix A, and quadratic form:
xTAx
Note: Gaussian vector formula has a quadratic form term in its exponent:
exp[-0.5 (x -)T K-1 (x -)]
Similar to 1-variable gaussian: exp(-0.5 (x -)2/2 )
K-1 (inverse covariance matrix) instead of 1/ 2
Quadratic form involving (x -) instead of (x -)2
661
Rectangular matrices
Linear system of equations:
Ax = b
More or less equations than necessary.
Not full rank
If full column rank, we can modify equation as:
ATAx = ATb
Now (ATA) is square, symmetric and invertible.
x = (ATA)-1 ATb now solves the system of equations!
This solution is called the least-squares solution. Project b onto column space
and then solve.
(ATA)-1 AT is sometimes called the pseudo inverse
Note: (ATA) or (A*A) will appear often in communications math (MIMO). They
will also appear in SVD (singular value decomposition)
The pseudo inverse (ATA)-1 AT will appear in decorrelator receivers for MIMO
662
331
Invariants of matrices: Eigenvectors

Consider a NxN matrix (or linear transformation) T
An invariant input x of a function T(x) is nice because it does not change when the
function T is applied to it.
i.e. solve this eqn for x: T(x) = x
We allow (positive or negative) scaling, but want invariance concerning direction:
T(x) = x
There are multiple solutions to this equation, equal to the rank of the matrix T. If T
is full rank, then we have a full set of solutions.
These invariant solution vectors x are eigenvectors, and the characteristic scaling
factors associated with each x are eigenvalues.
E-vectors:
- Points on the x-axis unaffected [1 0]T
- Points on y-axis are flipped [0 1]T
(but this is equivalent to scaling by -1!)
E-values: 1, -1 (also on diagonal of matrix)
663
Eigenvectors
Eigenvectors are even more interesting because any vector in the domain of
T can now be viewed in a new coordinate system formed with the invariant
eigen directions as a basis.
The operation of T(x) is now decomposable into simpler operations on x,
which involve projecting x onto the eigen directions and applying the
characteristic (eigenvalue) scaling along those directions
Note: In Fourier transforms (associated with linear systems):

The unit length phasors ej are the eigenvectors! And the frequency
response is composed of the eigenvalues!
Why? Linear systems are described by differential equations (i.e. d/d and
higher orders)
Recall d (ej)/d = jej
j is the eigenvalue and ej the eigenvector (actually, an eigenfunction)
664
332
Eigenvalues and eigenvectors

Eigenvectors (for a square mm matrix S)
Example
(right) eigenvector eigenvalue
How many eigenvalues are there at most?
only has a non-zero solution if

this is a m-th order equation in which can have at
most m distinct solutions (roots of the characteristic
polynomial) can be complex even though S is real.
665
Diagonal (eigen) decomposition

2 1
Let S= ; 1 = 1, 2 = 3.
1 2
The eigenvectors
1 and 1 form U = 1 1
1 1 1
1

1 1 / 2 1 / 2 Recall
Inverting, we have U = UU1 =1.
1 / 2 1 / 2
1 1 1 0 1 / 2 1 / 2
Then, S=UU1 = 1 1 0 3 1 / 2 1 / 2

666
333
Example
Lets divide U (and multiply U1) by 2
1 / 2 1 / 2 1 0 1 / 2 1/ 2
Then, S=
1 / 2 1 / 2 0 3 1 / 2 1/ 2
Q (Q-1= QT )
667
Geometric view: Eigenvectors

Homogeneous (2nd order) multivariable equations:
Represented in matrix (quadratic) form with symmetric matrix A:
where
Eigenvector decomposition:
Geometry: Principal Axes of Ellipse

Symmetric A orthogonal e-vectors!
Same idea in Fourier transforms
E-vectors are frequencies
Positive Definite A +ve real e-values!
668
334
Why do eigenvalues/vectors matter?

Eigenvectors are invariants of A
Do not change direction when operated A
Recall d(et)/dt = et .
et is an invariant function for the linear operator d/dt, with eigenvalue
E.g., pair of differential eqns can be written as: dy/dt = Ay, where y = [v u]T
Substitute y = etx into the equation dy/dt = Ay

etx = Aetx
This simplifies to the eigenvalue vector equation: Ax = x
Solutions of multivariable differential equations correspond to solutions of

linear algebraic eigenvalue equations!
669
Eigen decomposition
Every square matrix A, with distinct eigenvalues has an eigen
decomposition:
A = SS-1
S is a matrix of eigenvectors and
is a diagonal matrix of distinct eigenvalues = diag(1, N)
Follows from definition of eigenvector/eigenvalue:

Ax = x
Collect all these N eigenvectors into a matrix (S):
AS = S.
or, if S is invertible (if e-values are distinct)
A = SS-1
670
335
Eigen decomposition: Symmetric A

Every square, symmetric matrix A can be decomposed into a product of a
rotation (Q), scaling () and an inverse rotation (QT)
A = QQT
Idea is similar A = SS-1
But the eigenvectors of a symmetric matrix A are orthogonal and form
an orthogonal basis transformation Q.
For an orthogonal matrix Q, inverse is just the transpose QT
This is why we like symmetric (or hermitian) matrices: they admit nice
decomposition
We like positive definite matrices even more: they are symmetric and
all have all eigenvalues strictly positive.
Many linear systems are equivalent to symmetric/hermitian or positive
definite transformations.
671
Fourier methods Eigen decomposition
Applying transform techniques is just eigen decomposition!

Discrete/Finite case (DFT/FFT):
C = FF* where F is the (complex) Fourier matrix, which
happens to be both unitary and symmetric, and
multiplication with F is rapid using the FFT.
Applying F = DFT, i.e. transform to frequency domain, i.e.
rotate the basis to view C in the frequency basis.
Applying is like applying the complex gains/phase
changes to each frequency component (basis vector)
Applying F* inverts back to the time-domain (IDFT or
IFFT)
672
336
Fourier/Eigen decomposition
Continuous case:
Any function f(t) can be viewed as a integral (sum) of
scaled, time-shifted impulses c()(t+) d
h(t) is the response the system gives to an impulse (impulse
response).
Functions response is the convolution of the function f(t)
with impulse response h(t): for linear time-invariant systems
(LTI): f(t)*h(t)
Convolution is messy in the time-domain, but becomes a
simple multiplication in the frequency domain: F(s)H(s)
Input Output
Linear system
673
Fourier /Eigen decomposition
Transforming an impulse response h(t) to frequency domain gives H(s), the

characteristic frequency response.
This is a generalization of multiplying by a Fourier matrix F
H(s) captures the eigenvalues (i.e scaling) corresponding to each
frequency component s.
Doing convolution now becomes a matter of multiplying eigenvalues
for each frequency component; and then transform back (i.e. like
multiplying with IDFT matrix F*)
The eigenvectors are the orthogonal harmonics, i.e. phasors eikx

Every harmonic eikx is an eigenfunction of every derivative and every
finite difference, which are linear operators.
Since dynamic systems can be written as differential/difference
equations, eigentransform methods convert them into simple
polynomial equations!
674
337
Applications in random vectors/processes

Covariance matrix K for random vectors X:
Generalization of variance, Kij is the co-variance between
components xi and xj
K = E[(X -)(X -)T]
Kij = Kji: K is a real, symmetric matrix, with orthogonal
eigenvectors!
K is positive semi-definite. When K is full-rank, it is positive definite.
White no off-diagonal correlations
K is diagonal, and has the same variance in each element of the
diagonal
Eg: Additive White Gaussian Noise (AWGN)
Whitening filter: eigen decomposition of K + normalization of each
eigenvalue to 1!
(Auto)Correlation matrix R = E[XXT]
R-vectors X, Y uncorrelated E[XYT] = 0 orthogonal
675
Gaussian random vectors

Linear transformations of the standard gaussian vector:
pdf: has covariance matrix K = AAt in the quadratic form instead of 2
When the covariance matrix K is diagonal, i.e., the component random

variables are uncorrelated. Uncorrelated + gaussian => independence.
White gaussian vector => uncorrelated, or K is diagonal
Whitening filter => convert K to become diagonal (using eigen-
decomposition)
Note: normally AWGN noise has infinite components, but it is projected

676
onto a finite signal space to become a gaussian vector
338
Digital Modulation
Basic concepts
Modulation
Placing baseband signals on high frequency carriers using the
process of modulation facilitates the long distance
transmission of data, voice and video signals.
Modulation:
The signal processing technique where, at the transmitter
one signal (the modulating signal) modifies a property of
another signal (the carrier signal) so that a composite wave
(the modulated wave) is formed.
Demodulation:
At the receiver, the modulating signal is recovered from the
modulated wave.
678
339
Bandwidth
The bandwidth of the modulated wave is equal to, or greater

than the bandwidth of the modulating signal.
Since the modulated wave has a higher frequency it can be
launched from practical sized antennas, cables or waveguides
Each symbol represents a specific sequence of bits and the
symbol set covers all possible bit combinations.
The maximum symbol rate is determined by the passband of the
bearer and associated equipment.
679
Analogue modulation
Analogue modulation combines a higher frequency

sinusoidal carrier with a lower frequency signal
carrying the message.
Such carriers can be modulated in three distinct ways
Amplitude A can be varied according to the message
Amplitude modulation
Frequency f can be varied according to the message signal
Frequency modulation
Phase can also be varied with the message signal.
Phase modulation
Note that, frequency and phase modulation are
referred to as angle modulation.
680
340
What is digital modulation?
Digital modulation combines a high frequency

sinusoidal carrier signal and a digital data stream to
create a modulated wave that assumes a limited
number of states.
As for analogue modulation, we can modulate the
wave in sympathy with the digital data stream in three
basic ways:
Amplitude A can be varied in sympathy with the message
Amplitude modulation
Frequency f can be varied according to the message signal
Frequency modulation
Phase can also be varied with the message signal.
Phase modulation
681
Why digital modulation?
Most communication systems can be classified into

one of three different categories:
Bandwidth efficient
Ability of system to accommodate data within a prescribed
bandwidth
Power efficient
Reliable sending of data with minimal power requirements
Cost efficient
System needs to be economical in the context of its use
682
341
Digital modulation provides better information

capacity, higher data security, better quality
communications.
Industry trends:
683
Another layer of complexity in many new systems is

multiplexing.
Two principal types of multiplexing (or multiple access)
used only digital systems are
TDMA (Time Division Multiple Access) and
CDMA (Code Division Multiple Access).
These are two different ways to add diversity to
signals allowing different signals to be separated from
one another.
684
342
Transmitting information
A pure carrier is generated at the transmitter.

The carrier is modulated with the information to be
transmitted.
Any reliably detectable change in signal
characteristics can carry information.
At the receiver the signal modifications or changes are
detected and demodulated.
Modulation
685
Polar display
Polar display - magnitude and phase

represented together
A simple way to view amplitude
and phase is with the polar
diagram.
The carrier becomes a frequency
and phase reference and the
signal is interpreted relative to
the carrier.
The signal can be expressed in
polar form as a magnitude and a
phase.
686
343
Polar display
Magnitude is represented as the

distance from the centre and
phase is represented as the
angle.
Amplitude modulation (AM)
changes only the magnitude of
the signal.
Phase modulation (PM)
changes only the phase of the
signal. Amplitude and phase
modulation can be used
together .
Frequency modulation (FM)
looks similar to phase
modulation, though frequency
is the controlled parameter,
rather than relative phase.
687
I/Q formats
In digital communications,
modulation is often expressed in
terms of I and Q.
This is a rectangular representation
of the polar diagram.
On a polar diagram, the I axis lies
on the zero degree phase reference,
and the Q axis is rotated by 90
degrees.
The signal vectors projection onto
the I axis is its I component and
the projection onto the Q axis is its
Q component.
688
344
I and Q in transmitter
I/Q diagrams are useful since they mirror the way in

which digital communication signals are created using
an I/Q modulator.
In the transmitter, I and Q signals are mixed with the
same local oscillator.
A 90o phase shifter is placed on one of the paths.
Signals that are at 90o are said to be orthogonal to
each other or in quadrature.
689
Transmitter side
Signals that are in quadrature are independent and do

not interfere with each other.
This simplifies digital radios and similar devices
690
345
Receiver side
On the receiver side, the combined signals

are easily separated out
691
Why use I/Q?
Digital modulation is easy to accomplish with I/Q

modulators.
Most modulators map data onto a number of discrete
points on the I-Q plane.
Points are known as constellation points.
As the signal moves from one point to another,
simultaneous amplitude and phase modulation usually
takes place.
Difficult to achieve in conventional phase modulators.
692
346
Application areas
Modulation format Application
MSK, GMSK GSM
BPSK Deep space telemetry, cable modems
QPSK and DQPSK Satellite, CDMA, TETRA
OQPSK (OffsetQPSK) CDMA, satellite
FSK, GFSK DECT, paging, AMPS, CT2,
VSB North American digital TV

8PSK Satellite, aircraft
16 QAM Microwave digital radio, modems, DVB-C, DVB-T
32 QAM Terrestrial microwave, DVB-T
64 QAM DVB-C modems
256 QAM Modems, Digital video (USA)
Terrestrial Trunked Radio (TETRA) is a professional mobile radio and two-way transceiver
specification. TETRA was specifically designed for use by government agencies, emergency
services, (police forces, fire departments, ambulance), transport services and the military.
693
TETRA is an European Telecommunications Standards Institute (ETSI) standard.
Digital modulation
The modulating signal m(t) is a digital signal given by

Binary line codes
or
Multi-level line codes
Correspondingly, the bandpass signals are also given
by
Binary line codes
or
Multi-level line codes
694
347
Binary signal format example:

Unipolar
We shall illustrate a number of binary signal formats
or line codes in the following examples.
Unipolar
A 1 is represented by a current of 2A signal units and a 0 is
represented by a current of zero signal units.
695

Unipolar
Unipolar actually can occur in two forms:
Non return to zero (NRZ)
Current maintained for entire bit period (time slot)
In a long sequence with equally likely 1s and 0s, power is (2A)2
or 2A2 signal watts
Return to zero (RZ)
Currents are maintained for a fraction of the time slot. If we
assume that the current is maintained for the time slot and the
symbols are equally likely, then the power in this case is x 2A2
= A2 signal watts.
Consider the sequence 101100111000 and view the
following diagrams to compare the two cases.
696
348

Unipolar
With Non Return to Zero operation:
Long sequences of 0s produce periods where there is no current
generated
Long sequences of 1s produce periods where positive current is
generated
When the 1s and 0s are equally likely, the mean value is A signal
units.
Each of the above conditions can cause problems for an
electronic receiver:
When a constant current or no current flows there is no timing
information and synchronization is difficult.
Unipolar (Non Return to Zero)
697

Unipolar
With Return to Zero operation:

Long sequences of 0s produce periods where there is no
current generated
Long sequences of 1s produce periods where positive current
is generated for a fraction of the time and hence a change can
be detected by the receiver.
When the 1s and 0s are equally likely and the pulses are T
wide, the mean value is A/2 signal units.
So RZ eliminates the timing problem, but not the
problem of long term level shifts.
698
349
Binary signal format example: Bipolar
Bipolar operation:
A 1 is represented by a current of +A signal units
A 0 is represented by a current of A signal units
Two modes of operation, once again:
Non Return to Zero
Currents maintained for entire time slot
Power needed for equally likely symbols is A2 signal watts
Return to Zero
Currents maintained for fraction of time slot
Power needed for equally likely symbols is A2/2 signal watts
699
Bipolar Non Return to Zero
Bipolar Return to Zero
700
350
Long strings of 1s or 0s produce constant currents in

NRZ bipolar and these represent a problem for
electronic circuits once again.
For RZ bipolar, these problems are basically
eliminated because the receiver detects the return to
zero in each pulse period.
When 1s and 0s are equally likely, the mean signal
value is just zero.
701

Biphase
Biphase (or Manchester)
A 1 is a positive current of amplitude A signal units that
changes to a negative current pulse of equal magnitude and a
0 is a negative pulse that changes to a positive current pulse
of equal magnitude.
The change-over occurs at the midpoint of the timeslot.
This type of coding is used between equipment that operates
at a high speed and requires close synchronization.
702
351
Binary signal format example: AMI
Alternate Mark Inversion (AMI)

1s are represented by return to zero current pulses of equal
magnitude A that alternate between positive and negative.
0s are represented by the absence of current pulses.
Power requirements are A2/4 which is half of RZ bipolar and
one eighth of NRZ bipolar.
Since the polarity alternates, almost all the power is contained
within a bandwidth equal to the bit rate expressed in Hz.
With a pulse shape that is approximately the same as a raised
cosine, AMI is used extensively in the carrier systems.
703
Binary signal format example: 2B1Q
Two binary, one quaternary (2B1Q)

Four signal levels (3 and 1) each represent a pair of bits.
Of each pair, the first bit determines whether the level is positive
or negative (1 = +ve, and 0 = -ve).
101100111000
704
352
Comments on 2B1Q signalling
2B1Q signaling is used for BISDN basic rate services (at 160kbps) and
ISDN
digital subscriber loop services.
For long sequences of 1s and 0s, or alternating 1s and 0s (i.e. 1010101010)
2B1Q
signaling produces constant currents and synchronization is impossible.
Since the frequency power density spectrums of 2B1Q, AMI and Raised
Cosine
are narrower, they are employed in bandwidth limited environments such as
telephone connections.
Manchester is used in LANs and other applications where precise
synchronization is important and bandwidth is available.
705
Bit rate and symbol rate

To understand and compare different modulation format efficiencies, it is
important to first understand the difference between bit rate and symbol
rate.
The signal bandwidth for the communications channel needed
depends on the symbol rate, not on the bit rate. (Ignore sync and error control)
Bit Rate
Bit Rate Symbol Rate:

Bit rate is the frequency of a system If symbols are generated at a rate of r
bit stream. per second to create a baseband
Take, for example, a radio with an 8 signal with a bandwidth of W Hz, then
bit sampler, sampling at 10 kHz for Nyquist has shown that r 2W.
voice.
For a double-sideband modulated
The bit rate, the basic bit stream
wave whose transmission bandwidth
rate in the radio, would be eight bits
multiplied by 10k samples per is BT Hz, BT = 2W so that r BT.
second, or 80 kbits per second.
706
353
Bit rate and symbol rate

The state diagram opposite represents QPSK (more
details later).
Notice that for each constellation point two bits are
transmitted.
If only one bit was being transmitted per symbol, then
in the previous example the symbol and bit rates would
be identical at 80kbits per second.
For the QPSK example, the symbol rate will be 40kbits
per second.
Symbol rate is sometimes called the baud rate. QPSK state diagram
Note that the baud rate is not the same as bit rate. (These
terms are often confused.)
If more bits can be sent with each symbol, then the
same amount of data can be sent in a narrower
spectrum.
This is why modulation formats that are more complex
and use a higher number of states can send the same
information over a narrower piece of the RF spectrum.
707
Bandwidth requirements
Consider the two modulation schemes depicted in the

figures below:
BPSK 8PSK
One bit per symbol 3 bits per symbol
Bit rate = Symbol rate Symbol rate = 1/3 Bit rate
An example of how symbol rate influences spectrum requirements can be seen in

eight-state Phase Shift Keying (8PSK) as shown above on the right. It is a variation
of
PSK. There are eight possible states that the signal can transition to at any time.
The phase of the signal can take any of eight values at any symbol time. Since 23 =
8, there are three bits per symbol. This means the symbol rate is one third of the
bit rate.
708
354
Digital modulation basics
The bit rate defines the rate at which information is

passed.
The baud (or signaling) rate defines the number of
symbols per second.
Each symbol represents n bits, and has M signal
states, where M = 2n.
This is called M-ary signaling.
The maximum rate of information transfer through a
baseband channel is given by:
Capacity = 2 W log2M bits per second
where W = bandwidth of modulating baseband signal
709
Symbol clock
The symbol clock represents the frequency and exact
timing of the transmission of the individual symbols.
At the symbol clock transitions, the transmitted carrier
is at the correct I/Q (or magnitude/phase) value to
represent a specific symbol (a specific point in the
constellation).
710
355
Binary bandpass signaling examples
711
Binary keying
Binary keying definition:

The bits in a message stream switch the modulation
parameters (amplitude, frequency and phase) from one state
to another. This process is called binary keying.
Binary keying is a process that makes the values of amplitude,
phase or frequency of the carrier signal change in sympathy
with the values of the bits in the binary signal stream.
Basic actions can be classified as:
ASK Amplitude Shift Keying
PSK Phase Shift Keying
FSK Frequency Shift Keying
712
356
Binary Amplitude Shift Keying

(BASK)
The transmitted signal for BASK is a sinusoid whose amplitude is
changed by on-off keying (OOK) so that a 1 is represented by the
presence of a signal and a 0 is represented by the absence of a signal.
The modulated pulse can be described mathematically when signal
1 is present as:
where Tb is the bit duration (in sec).
When signal 0 is present we have
713
Double Side Band Suppressed

Carrier (DSB-SC)
The Double Side Band - Suppressed Carrier (DSB-SC)

signal is essentially an AM signal that has a
suppressed discrete carrier.
This signal is given by the following equation:
s(t ) = Ac m(t )cos ct

where m(t) is assumed to have a zero dc level for the
suppressed carrier case.
714
357
On-off Keying - OOK
OOK
On-off keying is also known as Amplitude Shift Keying (ASK)
The above graph shows a time domain representation of Binary
Amplitude Shift Keying
715
Binary or Bi-Phase Shift Keying

(BPSK)
One of the simplest forms of digital

modulation is Binary or Bi-Phase Shift
Keying (BPSK).
One application where this is used is for
deep space telemetry.
The phase of a constant amplitude carrier
signal moves between zero and 180
degrees.
On an I and Q diagram, the I state has two
different values.
There are two possible locations in the state
BPSK
diagram, so a binary one or zero can be One bit per symbol
sent. Bit rate = Symbol rate
716
358
Binary Phase-Shift Keying
This is illustrated in the chart above. Notice the 180 phase shifts
indicated by the arrow.
717
The above equations describe the waveforms for

BPSK. Note that it can also be referred to as phase-
reversal keying or PRK.
Let
s(t ) = Ac cos[c t + Dp m(t )]
Where m(t) is given in the figure below:
718
359
Typically, m(t) has peak values of 1 and Dp = /2 radians, thus
s(t ) = Ac m(t ) sin ct

BPSK is equivalent to DSB-SC with polar data waveform.
The complex envelope is given by
g (t ) = jAc m(t )
719
Quadrature Phase Shift Keying

(QPSK)
A more common type of phase modulation is
Quadrature Phase Shift Keying (QPSK).
QPSK is used extensively in applications
including:
CDMA (Code Division Multiple Access) cellular
service,
QPSK state diagram
Wireless local loop,
Iridium (a voice/data satellite system) and
DVB-S (Digital Video Broadcasting - Satellite).
QPSK is effectively two independent BPSK
systems (I and Q), and therefore exhibits the
same performance but twice the bandwidth
efficiency.
720
360
Quadrature Phase Shift Keying

(QPSK)
Quadrature Phase Shift Keying can be filtered using
raised cosine filters to achieve excellent out of band
suppression.
721
Nyquist & Root raised cosine filters
The Nyquist bandwidth is the

minimum bandwidth that can be
used to represent a signal.
It is important to limit the spectral
occupancy of a signal, to improve
bandwidth efficiency and remove
adjacent channel interference.
Root raised cosine filters allow an
approximation to this minimum
bandwidth.
722
361
Types of Quadrature Phase Shift Keying
Conventional QPSK Offset QPSK /4-QPSK
Conventional QPSK has transitions through zero (ie. 180o phase

transitions). A highly linear amplifier is required.
In Offset QPSK, the transitions on the I and Q channels are
staggered. Phase transitions are therefore limited to 90o.
In /4-QPSK the set of constellation points are toggled for each
symbol, so transitions through zero cannot occur. This scheme
produces the lowest envelope variations.
All QPSK schemes require linear power amplifiers.
723
QPSK and OQPSK
Constellation diagram for QPSK Signal doesn't cross zero,

with Gray coding (Each adjacent because only one bit of the
symbol only differs by one bit). symbol is changed at a time
724
362
Offset QPSK (OQPSK)

Offset quadrature phase-shift keying (OQPSK) is a variant of
phase-shift keying modulation using 4 different values of the
phase to transmit.
Taking four values of the phase (two bits) at a time to construct
a QPSK symbol can allow the phase of the signal to jump by as
much as 180 at a time.
When the signal is low-pass filtered (as is typical in a
transmitter), these phase-shifts result in large amplitude
fluctuations, an undesirable quality in communication systems.
By offsetting the timing of the odd and even bits by one bit-
period, or half a symbol-period, the in-phase and quadrature
components will never change at the same time.
In the constellation diagram it can be seen that this will limit
the phase-shift to no more than 90 at a time.
This yields much lower amplitude fluctuations than non-offset
QPSK and is sometimes preferred in practice.
725
Difference of the phase between QPSK

and OQPSK
726
363
/4-QPSK
This final variant of QPSK uses two
identical constellations which are
rotated by 45 ( / 4 radians, hence
the name) with respect to one another.
Usually, either the even or odd symbols
are used to select points from one of
the constellations and the other
symbols select points from the other
constellation.
This also reduces the phase-shifts
from a maximum of 180, but only to a
maximum of 135 and so the amplitude
fluctuations of / 4QPSK are
between OQPSK and non-offset QPSK.
One property this modulation scheme
possesses is that if the modulated
signal is represented in the complex
domain, it does not have any paths
through the origin.
In other words, the signal does not
pass through the origin.
Dual constellation diagram for /4-QPSK.
This lowers the dynamical range of
fluctuations in the signal which is This shows the two separate constellations
desirable when engineering with identical Gray coding but rotated by
communications signals. 45 with respect to each other
QPSK Summary
Quadrature means that the signal shifts between
phase states that are separated by 90 degrees (/2
radians). The signal shifts in increments of 90 degrees
from 45 to 135, 45, or 135 degrees.
These points are chosen as they can be easily
implemented using an I/Q modulator.
Only two I values and two Q values are needed and
this gives two bits per symbol.
There are four states because 22 = 4.
It is therefore a more bandwidth-efficient type of
modulation than BPSK - twice as efficient.
728
364
Frequency Shift Keying (FSK)
Frequency modulation and Phase modulation are

closely related.
729
Frequency Shift Keying

Discontinuous phase FSK
Where 1 = mark frequency; 2 = space frequency
730
365
731
FSK
Continuous phase FSK
where
732
366

In FSK, the frequency of the carrier is changed as a function of the
modulating signal (data) being transmitted. The amplitude is unchanged.
In Binary FSK (BFSK or 2FSK), a 1 is represented by one frequency
and a 0 is represented by another frequency.
The bandwidth occupancy of FSK depends on the spacing of the two

symbols. A frequency spacing of 0.5 times the symbol period is typically
used.
FSK can be expanded to a M-ary scheme, employing multiple

frequencies as different states.
733
Applications for FSK
FSK (Frequency Shift Keying) is used in many

applications including cordless and paging systems.
Some of the cordless systems include
DECT (Digital Enhanced Cordless
Telephone)
and
CT-2: Cordless Telephone 2

CT-2 is a second generation cordless
telephone system that allows users to
roam away from their home base
stations and receive service in public
places. Away from the home base
station, the service is one way outbound
from the phone to a telepoint that is
within range.
734
367
Binary Frequency Shift Keying
Here the modulated wave is a sinusoid of constant amplitude

whose presence at one frequency means a 1 is present and if
another frequency is present then this means a 0 is present.
When signal 1 is present, the pulse can be described as:
When signal 0 is present, the pulse can be described as:
735
Binary Frequency Shift Keying
736
368
Minimum Shift Keying

Since a frequency shift produces an advancing or
retarding phase, frequency shifts can be detected by
sampling the phase at each symbol period.
Phase shifts of (2N + 1) /2 radians are easily detected
with an I/Q demodulator.
At even numbered symbols, the polarity of the I channel
conveys the transmitted data,
At odd numbered symbols the polarity of the Q channel
conveys the data.
This orthogonality between I and Q simplifies
detection algorithms and hence reduces power
consumption in a mobile receiver.
- MSK is used in the GSM (Global System for Mobile Communications)
cellular standard.
737
Minimum Shift Keying

The minimum frequency shift which yields
orthogonality of I and Q is that which results in a
phase shift of /2 radians per symbol (90 degrees
per symbol).
FSK with this deviation is called MSK (Minimum
Shift Keying). The deviation must be accurate in
order to generate repeatable 90 degree phase
shifts.
A phase shift of +90 degrees represents a data bit

equal to 1, while 90 degrees represents a 0.
The peak-to-peak frequency shift of an MSK signal

is equal to half of the bit rate.

738
369
Comments on FSK and MSK
FSK and MSK produce constant envelope carrier

signals, which have no amplitude variations.
This is a desirable characteristic for improving the power
efficiency of transmitters.
Amplitude variations can exercise nonlinearities in an amplifiers
amplitude-transfer function, generating spectral re-growth, a
component of adjacent channel power.
Therefore, more efficient amplifiers (which tend to be less
linear) can be used with constant-envelope signals, reducing
power consumption.
739
Comments on FSK and MSK
MSK has a narrower spectrum than wider deviation forms of FSK.

The width of the spectrum is also influenced by the waveforms
causing the frequency shift.
If those waveforms have fast transitions, then the
spectrum of the transmitter will be broad.
In practice, the waveforms are filtered with a Gaussian filter,
resulting in a narrow spectrum.
In addition, the Gaussian filter has no time-domain overshoot, which
would broaden the spectrum by increasing the peak deviation.
MSK with a Gaussian filter is termed GMSK (Gaussian MSK).
740
370
Differential PSK
Recovery of the data stream from a PSK modulated

wave requires synchronous demodulation
The receiver must reconstruct the carrier exactly so that it
can detect changes in the phase of the received signal.
Differential PSK eliminates the need for the
synchronous carrier in the demodulation process and
this has the effect of simplifying the receiver.
At the transmitter, we process the data stream to give
a modulated wave where the phase changes by
radians whenever a 1 appears in the stream.
It remains constant whenever a 0 appears in the
stream.
741
DPSK
Differential Phase-Shift Keying
Binary data are first differentially encoded and then passed to
the BPSK modulator.
Example:
742
371
DPSK
Thus we see that the receiver only needs to detect

phase changes. It does not need to search for
specific phase values.
743
Example: DSPK
744
372
Digital Modulation
More sophisticated approach

on modulation and
demodulation/detection
Digital communication system

Transmitted power;
bandpass/baseband
signal BW
Information:
Information
- analog:BW &
source
dynamic range Message Message
- digital:bit rate estimate Information
sink
Source
Maximization of encoder Source
decoder
information
transferred Channel Channel In baseband systems
Encoder decoder these blocks are missing
Message protection &
channel adaptation; Baseband
Interleaving Deinterleaving means that
convolution, block
coding no carrier
Modulator Demodulator wave
Fights against burst modulation
errors is used for
Transmitted Received signal transmission
Channel (may contain errors)
M-PSK/FSK/ASK..., signal
depends on channel wired/wireless
Noise constant/variable
BW & characteristics 746
Interference linear/nonlinear
373
Digital communication system as an

application of theories
747
Modulation and demodulation/detection
Modulation
Transform digital data into an analog signal that
can be transmitted or stored (the real world is
analog, not digital).
Demodulation/detection
The received signal contains information about the
transmitted data but is corrupted by noise.
Estimate what data was sent, aiming at minimum
possible probability of making mistakes.
748
374
Electrical communication system
749
Modulation
750
375
Modulation
751
Signal, waveform, modulation and

demodulation
752
376
Geometry of signal set
753
Basis waveforms
754
377
Inner product and norm
755
Signal space
756
378
Basis of signal space
757
Linear independency
758
379
Comparison: Waveforms vs. vectors
759
Gram-Schmidt
760
380
Signal space examples
761
Example: PAM
762
381
Example: MPAM and 2PAM
763
Example: PPM
764
382
Example: PPM
765
Example: Bi-orthogonal signals
766
383
Example: PSK
767
Example: PSK
768
384
Example: Fourier series
769
Example: Sampling expansion
770
385
Demodulation and detection
771
Modulation/demodulation
772
386
Receiver
773
Transmission over an AWGN channel
774
387
Optimal demodulation
775
Correlator demodulator
776
388
Equivalent Gaussian (vector) channel
777
Matched filtering+sampling
Correlation
778
389
Equivalency of matched filter and

correlator
779
Optimum detection: MAP decision

rule
780
390
Optimum detection: ML decision rule
781
Example
782
391
Error probability
783
Equivalence of the original waveform

problem and the discrete vector problem
784
392
Error probability for two signals
785
Q-function
786
393
Examples
787
Example: OOK
788
394
Example: Two-pole signaling
789
Example: Orthogonal signaling
790
395
Pe vs. SNR
791
Example: Four signals
792
396
Error probability with two signals
793
Error probability with two signals
794
397
Union bound for ML decisions
795
Note
Union Bound:
A B
P(A B) P(A) + P(B)

P(A1 A2 AN) i= 1..N P(Ai)
Applications:
Getting bounds on BER ,
In general, bounding the tails of probability distributions 796
398
Approximation with dominating term(s)
797
Digital Modulation
Appendix: Basic signal space

and orthogonalization
concepts
798
399
Vector space concepts
799
Vector space concepts
800
400
Signal space concepts
801
Signal space concepts
802
401
Gram-Schmidt (G-S) orthogonalization
803
G-S orthogonalization
804
402
G-S orthogonalization
805
Example
806
403
Example (contd.)
807
Example (contd.)
808
404
Example: Summary
809
Proof of G-S-O procedure
810
405
Proof of G-S-O procedure
811
Note
812
406
M-ary ASK (PAM)
813
M-ary ASK (PAM)
814
407
M-ary ASK (PAM)
815
M-ary PSK
816
408
M-ary PSK
817
M-ary PSK
Quarternary Phase Shift Keying (QPSK)
818
409
M-ary PSK
819
M-ary QAM
820
410
M-ary QAM
821
M-ary QAM
Signal space diagram (M=16)
822
411
M-ary QAM
Signal space diagram (M=8)
823
M-ary QAM
824
412
M-ary FSK
825
M-ary FSK
826
413
M-ary FSK
Signal space diagram (M = 2)
827
M-ary FSK
828
414
Multicarrier Systems (OFDM)
OFDM, COFDM, DMT

Orthogonal frequency-division multiplexing (OFDM), essentially
identical to coded OFDM (COFDM) and discrete multi-tone modulation
(DMT), is a frequency-division multiplexing (FDM) scheme utilized as a
digital multi-carrier modulation method.
A large number of closely-spaced orthogonal sub-carriers are used to
carry data.
The data is divided into several parallel data streams or channels, one for
each sub-carrier.
Each sub-carrier is modulated with a conventional modulation scheme such
as quadrature amplitude modulation (QAM) or phase-shift keying (PSK) at
a low symbol rate, maintaining total data rates similar to conventional
single-carrier modulation schemes in the same bandwidth.
OFDM has developed into a popular scheme for wideband digital
communication, whether wireless or over wirelines, used in applications
such as digital television and audio broadcasting, wireless networking and
broadband internet access.
830
415
Advantages of OFDM
The primary advantage of OFDM over single-carrier schemes is its ability
to cope with severe channel conditions (for example, attenuation of high
frequencies in a long copper wire, narrowband interference and
frequency-selective fading due to multipath) without complex
equalization filters.
Channel equalization is simplified because OFDM may be viewed as
using many slowly-modulated narrowband signals rather than one rapidly-
modulated wideband signal.
The low symbol rate makes the use of a guard interval between symbols
affordable, making it possible to handle time-spreading and eliminate
intersymbol interference (ISI).
This mechanism also facilitates the design of single frequency networks
(SFNs), where several adjacent transmitters send the same signal
simultaneously at the same frequency, as the signals from multiple distant
transmitters may be combined constructively, rather than interfering as
would typically occur in a traditional single-carrier system.
831
Multicarrier modulation
As it is known from the single carrier based communication
systems nonideal channels introduce Intersymbol Interference
(ISI), which degrades the performance compared with the ideal
channel.
The degree of performance degradation depends on the
frequency response of the channel.
Typically, the complexity of the receiver increases as the
spread of the ISI increases.
An alternative approach to the design of bandwidth-efficient
communication system in the presence of channel distortion is
to subdivide the available channel bandwidth into a number of
narrow sub-channels so, that the frequency response of each
subchannel is nearly flat.
Multicarrier modulation method has been used in variety of
applications (e.g., DAB, DVB-T/H, 3GPP-LTE, ADSL and
HDSL). 832
416
Capacity of multicarrier modulation in

linear channel
Lets suppose that C(f) is the frequency response of a
nonideal bandlimited channel with bandwidth W .
Noise is supposed to be additive Gaussian white noise
(AWGN) with PSD G.
Number (N) of equispaced subbands of bandwidth is
f=W/N, where f is small enough that |C(f)|2/ G
constant within each subband.
833
H-S law capacity
834
417
Max C with P(f)
835
P distribution
It can be stated that multicarrier modulation scheme that divides the

available bandwidth into subbands of relatively narrow width provides a
solution that could yield transmission rates close to capacity of the channel.
The signal in each sub-band may be controlled independently (coding,
modulation) at a synchronous symbol rate of 1/f=N/W .
If f is small enough the equalizer contains only one tap to correct
amplitude and phase distortion
836
418
Heavy distortion media

A very suitable application of multicarrier modulation is in digital transmission
over copper wire subscriber loops, because of large amplitude distortion.
In this kind of wire the attenuation which increases rapidly as a function of
frequency makes it extremely difficult to achieve a high transmission rate with
conventional single modulated carrier and an equalizer at the receiver.
The dominant noise in transmission over subscriber lines is crosstalk interference
from signals carried on other telephone lines located in the same cable.
That is why the interference power is frequency dependent, which can be taken in
account when allocating the power to subcarriers.
The ISI penalty in the performance can be

large in wireless system
Multicarrier modulation with optimum
power distribution provides the potential for
a higher transmission rate.
837
Transmitter
Idealized system model:
A simple idealized OFDM system model suitable for a time-invariant AWGN
channel transmitter
838
419
Transmitter
An OFDM carrier signal is the sum of a number of orthogonal sub-carriers, with
baseband data on each sub-carrier being independently modulated commonly using
some type of quadrature amplitude modulation (QAM) or phase-shift keying
(PSK).
This composite baseband signal is typically used to modulate a main RF carrier.
Input signal s[n] is a serial stream of binary digits.
By inverse multiplexing, these are first demultiplexed into N parallel streams, and
each one mapped to a (possibly complex) symbol stream using some modulation
constellation (QAM, PSK, etc.).
Note that the constellations may be different, so some streams may carry a higher
bit-rate than others.
An inverse FFT (IFFT) is computed on each set of symbols, giving a set of complex
time-domain samples.
These samples are then quadrature-mixed to passband in the standard way.
However, the real and imaginary components are first converted to the analogue
domain using digital-to-analogue converters (DACs); the analogue signals are then
used to modulate cosine and sine waves at the carrier frequency, fc.
These signals are then summed to give the transmission signal s(t).
839
Receiver
The receiver picks up the signal r(t) , which is then
quadrature-mixed down to baseband using cosine and sine
waves at the carrier frequency.
This also creates signals (mirror-images) centered on 2fc , so
low-pass filters are used to reject these.
The baseband signals are then sampled and digitised using
analogue-to-digital converters (ADCs).
Next FFT is used to convert signals back to the frequency
domain.
This returns N parallel streams, each of which is converted to a
binary stream using an appropriate symbol detector.
These streams are then re-combined into a serial stream, which
is an estimate of the original binary stream at the transmitter.
840
420
Receiver
Idealized system model:
A simple idealized OFDM system model suitable for a time-invariant
AWGN channel receiver
841
OFDM principle: Transmitting end
842
421
Multicarrier spectrum
843
Spectra
With rectangular pulse shaping, the amplitude spectrum of the subcarrier is
spectral ovelapping occurs
844
422
OFDM transmitter
The multiplexing operations in transmitter can be

implemented by using IDFT (Inverse Discrete Fourier
Transform) operations
In practice, IDFT is implemented by using IFFTs
(Inverse Fast Fourier Transforms).
845
OFDM principle: Receiving end
846
423
OFDM Receiver
The multiplexing operations in receiver can be implemented by using DFT
(Discrete Fourier Transform) operations
In practice, DFT is implemented by using FFTs (Fast Fourier Transforms).
847
FFT based system
848
424
FFT based system
849
FFT based system
850
425
Orthogonality
In OFDM, the sub-carrier frequencies are chosen so that the sub-carriers
are orthogonal to each other, meaning that cross-talk between the sub-
channels is eliminated and inter-carrier guard bands are not required.
This greatly simplifies the design of both the transmitter and the receiver;
unlike conventional FDM, a separate filter for each sub-channel is not
required.
The orthogonality requires that the sub-carrier spacing is f=k/TU Hz,
where TU seconds is the useful symbol duration (the receiver side window
size), and k is a positive integer, typically equal to 1.
Therefore, with N sub-carriers, the total passband bandwidth will be B
Nf (Hz).
The orthogonality also allows high spectral efficiency, with a total symbol
rate near the Nyquist rate for the equivalent baseband signal (i.e. near half
the Nyquist rate for the double-side band physical passband signal),
because almost the whole available frequency band can be utilized.
OFDM generally has a nearly 'white' spectrum, giving it gentle
electromagnetic interference properties with respect to other co-channel
users.
851
FFT based system
852
426
FFT based system
853
Avoiding ISI
854
427
Guard interval for elimination of ISI

One key principle of OFDM is that since low symbol rate modulation
schemes (i.e., where the symbols are relatively long compared to the
channel time characteristics) suffer less from intersymbol interference
caused by multipath propagation, it is advantageous to transmit a number of
low-rate streams in parallel instead of a single high-rate stream.
Since the duration of each symbol is long, it is feasible to insert a guard
interval between the OFDM symbols, thus eliminating the intersymbol
interference.
The guard interval also eliminates the need for a pulse-shaping filter, and it
reduces the sensitivity to time synchronization problems.
The cyclic prefix, which is transmitted during the guard interval, consists of
the end of the OFDM symbol copied into the guard interval, and the guard
interval is transmitted followed by the OFDM symbol.
The reason that the guard interval consists of a copy of the end of the
OFDM symbol is so that the receiver will integrate over an integer number
of sinusoid cycles for each of the multipaths when it performs OFDM
demodulation with the FFT.
855
Channel responses
856
428
Channel estimation
857
Inaccurate synchronization and frequency

offset
OFDM requires very accurate frequency synchronization between the
receiver and the transmitter; with frequency deviation the sub-carriers will
no longer be orthogonal, causing inter-carrier interference (ICI) (i.e.,
cross-talk between the sub-carriers).
Frequency offsets are typically caused by mismatched transmitter and
receiver oscillators, or by Doppler shift due to movement.
While Doppler shift alone may be compensated for by the receiver, the
situation is worsened when combined with multipath, as reflections will
appear at various frequency offsets, which is much harder to correct.
This effect typically worsens as speed increases, and is an important factor
limiting the use of OFDM in high-speed vehicles.
Several techniques for ICI suppression are suggested, but they may
increase the receiver complexity.
858
429
Example
If one sends a million symbols per second using conventional single-carrier
modulation over a wireless channel, then the duration of each symbol
would be one microsecond or less.
This imposes severe constraints on synchronization and necessitates the
removal of multipath interference.
If the same million symbols per second are spread among one thousand
sub-channels, the duration of each symbol can be longer by a factor of a
thousand (i.e., one millisecond) for orthogonality with approximately the
same bandwidth.
Assume that a guard interval of 1/8 of the symbol length is inserted
between each symbol.
Intersymbol interference can be avoided if the multipath time-spreading
(the time between the reception of the first and the last echo) is shorter than
the guard interval (i.e., 125 microseconds).
This corresponds to a maximum difference of 37.5 kilometers between the
lengths of the paths.
859
OFDM and linear distortion
860
430
OFDM and linear distortion
861
Simplified equalization
The effects of frequency-selective channel conditions, for
example fading caused by multipath propagation, can be
considered as constant (flat) over an OFDM sub-channel if the
sub-channel is sufficiently narrow-banded (i.e., if the number
of sub-channels is sufficiently large).
This makes equalization far simpler at the receiver in OFDM
in comparison to conventional single-carrier modulation.
The equalizer only has to multiply each detected sub-carrier
(each Fourier coefficient) by a constant complex number, or a
rarely changed value.
862
431
Channel coding and interleaving

OFDM is invariably used in conjunction with channel coding (forward error
correction, FEC), and almost always uses frequency and/or time interleaving.
Frequency (subcarrier) interleaving increases resistance to frequency-selective
channel conditions such as fading.
For example, when a part of the channel bandwidth fades, frequency interleaving
ensures that the bit errors that would result from those subcarriers in the faded part
of the bandwidth are spread out in the bit-stream rather than being concentrated.
Similarly, time interleaving ensures that bits that are originally close together in the
bit-stream are transmitted far apart in time, thus mitigating against severe fading as
would happen when travelling at high speed.
However, time interleaving is of little benefit in slowly fading channels, such as for
stationary reception, and frequency interleaving offers little to no benefit for
narrowband channels that suffer from flat-fading (where the whole channel
bandwidth fades at the same time).
The reason why interleaving is used on OFDM is to attempt to spread the errors out
in the bit-stream that is presented to the error correction decoder, because when
such decoders are presented with a high concentration of errors the decoder is
unable to correct all the bit errors, and a burst of uncorrected errors occurs.
Note that similar design of audio data encoding makes compact disc (CD) playback
robust.
863
Channel coding and interleaving

A classical type of error correction coding used with OFDM-based systems
is convolutional coding, often concatenated with Reed-Solomon or BCH
coding.
Usually, additional interleaving (time and frequency interleaving) in
between the two layers of coding is implemented.
The choice for Reed-Solomon coding as the outer error correction code is
based on the observation that the Viterbi decoder used for inner
convolutional decoding produces short errors bursts when there is a high
concentration of errors, and Reed-Solomon codes are inherently well-suited
to correcting bursts of errors.
Newer systems adopt near-optimal types of error correction codes that use
the turbo decoding principle, where the decoder iterates towards the desired
solution.
Examples of such error correction coding types include turbo codes and
LDPC codes, which perform close to the Shannon limit for the Additive
White Gaussian Noise (AWGN) channel.
A low-density parity-check (LDPC) and turbo codes are capacity-approaching codes, which means that
864
practical constructions exist that allow codes to closely approach the channel capacity, a theoretical
maximum for the code rate at which reliable communication is still possible given a specific noise level.
432
OFDM extended with multiple access

OFDM in its primary form is considered as a digital modulation technique, and not
a multi-user channel access method, since it is utilized for transferring one bit
stream over one communication channel using one sequence of OFDM symbols.
However, OFDM can be combined with multiple access using time, frequency or
coding separation of the users.
In Orthogonal Frequency Division Multiple Access (OFDMA), frequency-division
multiple access is achieved by assigning different OFDM sub-channels to different
users.
OFDMA supports differentiated quality of service by assigning different number of
sub-carriers to different users in a similar fashion as in CDMA, and thus complex
packet scheduling or Media Access Control (MAC) schemes can be avoided
In Multi-carrier code division multiple access (MC-CDMA), also known as OFDM-
CDMA, OFDM is combined with CDMA spread spectrum communication for
coding separation of the users.
Co-channel interference can be mitigated, meaning that manual fixed channel
allocation (FCA) frequency planning is simplified, or complex dynamic channel
allocation (DCA) schemes are avoided.
865
Example
866
433
Introduction to Information
and Coding Theory
Basic concepts
Information and coding theory
Information sources and source coding

Information measures, entropy,
Represent source data efficiently in digital form.
Channel capacity and coding
Channel capacity, limits,
Use redundant bits to counteract transmission
errors
868
434

Transmitted power;
bandpass/baseband
signal BW
Information:
Information
- analog:BW &
source
dynamic range Message Message
- digital:bit rate estimate Information
sink
Source
Maximization of encoder Source
information decoder
transferred Channel Channel In baseband systems
Encoder decoder these blocks are missing
Message protection &
channel adaptation; Baseband
Interleaving Deinterleaving means that
convolution, block
coding no carrier
Modulator Demodulator wave
Fights against burst modulation
errors is used for
Received signal transmission
Transmitted
signal Channel (may contain errors)
M-PSK/FSK/ASK...,
depends on channel wired/wireless
Noise constant/variable
BW & characteristics 869
Interference linear/nonlinear
Digital communication system as an

application of theories
870
435
Some probability basics
P(not a) = 1 P(a)
P(a or b) = P(a)+P(b) P(a and b),
where a and b are events
We will often denote P(a and b) by P(a, b).
If P(a,b) = 0, we say a and b are mutually exclusive.
871
Conditional probability
P(a|b) is the probability of a, given that we know b.

The joint probability of both a and b is given by:
P(a,b) = P(a|b)P(b).
Since P(a,b) = P(b,a), we have Bayes Theorem:
P(a|b)P(b) = P(b|a)P(a),
or
P(a|b) =P(b|a)P(a)/P(b)
872
436
Indepencence
If two events a and b are such that
P(a|b) = P(a),
we say that the events a and b are independent.
Note that from Bayes Theorem, we will also have
that
P(b|a) = P(b),
and furthermore,
P(a,b) = P(a|b)P(b) = P(a)P(b).
This last equation is often taken as the definition of
independence.
873
Required properties of information

measure
We will want our information measure I(p) to have several properties:
1. Information is a non-negative quantity:
I(p) 0.
2. If an event has probability p=1, we get no information from the
occurrence of the event:
I(1) = 0.
3. If two independent events occur (whose joint probability is the product
of their individual probabilities), then the information we get from
observing the events is the sum of these two pieces of information:
I(p1 p2) = I(p1)+I(p2). (This is the critical property )
4. We will want our information measure to be a continuous and monotonic
function of the probability meaning that slight changes in probability
should result in slight changes in information.
874
437
Derivation of information measure

We can therefore derive the following:
1. I(p2) = I(p p) = I(p)+I(p) = 2 I(p)
2. Thus, further, I(pn) = n I(p) (by induction)
3. I(p) = I((p1/m)m) = m I(p1/m), so
I(p1/m) = (1/m) I(P) and thus in general
I(pn/m) =(n/m) I(p)
4. And thus, by continuity, we get, for 0 < p 1, and a real number a > 0 :
I(pa) = a I(p)
875
Information measure
We can find a simple expression, which satisfy the previous
properties.
This is
I(p) = logb(p) = logb(1/p)
for any base b.
The base b determines the units we are using.
We can change the units by changing the base, using the
formulas, for bases b1, b2, x > 0,
and therefore
876
438
Units of information
Thus, using different bases for the logarithm results in
information measures which are just constant multiples of each
other, corresponding with measurements in different units:
log2 units are bits (from binary)
loge units are nats from natural logarithm
log10 units are hartleys, after an early scientist in the field of
transmission techniques.
Note: Unless we want to emphasize the units, we need not bother to

specify the base for the logarithm, and will write log(p).
Typically however, we will think in terms of log2(p).
877
Example
A) Flipping a fair coin once will give us events h(ead)

and t(ail) each with probability 1/2, and thus a single
flip of a coin gives us log2(1/2) = 1 bit of
information (whether it comes up h or t).
B) Flipping a fair coin n times (or, equivalently, flipping
n fair coins) gives us log2((1/2)n) = log2(2n) = n
log2(2) = n bits of information.
878
439
Example
We could enumerate a sequence of 50 flips as, for

example:
hthhtththhht
or, using 1 for h and 0 for t, the 50 bits
101100101110
Thus n flips of a fair coin gives us n bits of
information, and takes n binary digits to specify.
That these two are the same reassures us that our
definition of information measure is good (enough).
879
Average amount of information

Suppose now that we have n symbols {a1, a2, . . . , an},
and some source is providing us with a stream of these
symbols.
Suppose further that the source emits the symbols with
probabilities {p1, p2, . . . , pn}, respectively.
For now, we also assume that the symbols are emitted
independently (successive symbols do not depend in
any way on past symbols).
What is the average amount of information we get
from each symbol we see in the stream?
880
440
Average amount of information

If we observe the symbol ai, we will get log(1/pi) information from that
particular observation.
In a long run (say N) of observations, we will see (approximately) N pi
occurrences of symbol ai
Thus, in the N (independent) observations, we will get total information I
of
The average information per symbol observed will be
881
Note
Note that
so we can define pi log(1/pi) to be 0 when pi = 0.
882
441
Entropy of the distribution

We have defined information strictly in terms of the
probabilities of events.
Therefore, let us suppose that we have a set of probabilities (a
probability distribution) P = {p1, p2, . . . , pn}.
We define the entropy of the distribution P by:
For a continuous probability distribution P(x)
883
Review of some probability concepts
The joint probability is denoted by

The conditional probability density function of X given (the
occurrence of) the value y of Y, can be written as
where pX,Y(x,y) gives the joint probability of X and Y, while pY(y)

gives the marginal density for Y (pY(y) > 0).
884
442
Definition of information
Here the message signal is modeled as a random process.
We begin by considering observations of a random variable:
Each observation gives a certain amount of information.
But rare observations give more information than usual ones.
Example: The statement The sun will rise next morning

gives very little information (high probability).
The statement San Francisco will destroy
next morning by an earthquake gives a lot
of information (low probability).
885
Definition: (Self)information
Observing a random variable X that takes its

values from the set of possible outcomes
X = {x1, x2, , xK }, the (self) information of
observation xm is defined as
I(xm) = -log2(pX(xm))=log2(1/pX(xm))
where pX(xm) are the probabilities of the

outcomes
886
443
Interpretation
It is easy to see that I()0 for 0pX1.

For a rare event the probability p() is small and the information is large.
For a usual event the probability p()1 and the information is small.
In case of two independent random variables X and Y, Y = {y1, y2, , yN }, the

information of the joint event (xm,yn) becomes
I(xm, yn) = -log2(pXY(xm, yn))= -log2(pX(xm))-log2(pY(yn))
= I(xm)+ I(yn)
Thus in case of independent events, the information is additive, which makes sense
intuitively.
887
Information source
Source data: a speech signal, an image, a computer file, ...

In practice source data is time-varying and unpredictable.
Bandlimited continuous-time signals (e.g. speech) can be
sampled into discrete time and reproduced nearly without
loss (quantization noise).
A source is a discrete-time stochastic process {Xn}
888
444
Properties of information source
889
Entropy and information
890
445
Entropy
( )
891
Entropy
Entropy has the following interpretations:
Average information obtained from an observation.
Average uncertainty about X before the observation.
Entropy is measure of uncertainty.
The more we know about something the lower the entropy.
Why the term entropy?

Thermodynamics (mid 19th): amount of unusable heat in system
Statistical Physics (end 19th): log (complexity of current system
state) amount of mess in the system
892
446
Binary entropy function
893
Binary entropy function h(p)
894
447
Example
895
Example
896
448
Theorem
We have H(X) = 0 when exactly one of the probabilities is one and all
the rest are zero. We have H(X) = log(L) only when all of the events
have the same probability 1/L. That is, the maximum of the entropy
function is the log() of the number of possible events, and occurs
when all the events are equally likely.
897
Example
How much information can a student get from a
single grade?
First, the maximum information occurs if all grades
have equal probability.
E.g., in a pass/fail class, on average half should pass
if we want to maximize the information given by the
grade.
The maximum information the student gets from a
grade will be:
Pass/Fail : 1 bit.
Pass: 1, 2, 3, 4, 5 : 2.3 bits.
898
449
Example: Entropy of English text

(memoryless model)
899
Average entropy
A probability mass function (pmf) 900

Size of the set is also called cardinality
450
Note
A probability mass function (pmf) is a function

that gives the probability that a discrete random
variable is exactly equal to some value.
A pmf differs from a probability density

function (pdf) in that the values of a pdf,
defined only for continuous random variables, are
not probabilities as such.
The integral of a pdf over a range of possible

values (a, b] gives the probability of the random
variable falling within that range.
901
Comment
It is important to recognize that our definitions of information
and entropy depend only on the probability distribution.
In general, it would not make sense for us to talk about the
information or the entropy of a source without specifying the
probability distribution.
It can certainly happen that two different observers of the same
data stream have different models of the source, and thus
associate different probability distributions to the source.
The two observers will then assign different values to the
information and entropy associated with the source.
902
451
Example on comment
Two people listening to the same lecture can get very

different information from the lecture depending on
their backgrounds.
For example, without appropriate background, one
person might not understand anything at all, and
therefore have as probability model a completely
random source, and therefore get much more
information [!!??] than the listener who understands
quite a bit, and can therefore anticipate much of what
goes on, and therefore assigns non-equal probabilities
to successive word.
903
Conditional entropy
Now given two random variables X and Y, the
conditional entropy of X given Y is denoted as
H(X|Y) and measures
average information obtained from observing X given that
the value of Y is known
average uncertainty about the observation X given that the
value of Y is known
how much extra information one still needs to supply on
average to communicate X given that the other party knows
Y
Thus the conditional entropy measures the statistical
dependence between X and Y in information theoretic
sense.
904
452
Conditional entropy
905
cf. confer
Conditional entropy
The conditional uncertainty of the discrete random

variable X with L outcomes given the discrete
random variable Y with M outcomes is the quantity
906
453
Theorem
907
Joint entropy H(XY) H(X,Y)
The joint entropy of two random variables X,Y is the

amount of the information needed on average to
specify both their values. 908
454
Chain rule
909
Example
910
455
Example
911
Example
912
456
Example: Dice
Lets consider the vector-valued random variable XY,

we get in total 12 outcomes.
913
Example: Dice
914
457
Theorem
915
Entropy rate
916
458
Recap: Entropy properties

Entropy measures the amount of information in a
random variable or the length of the message required
to transmit the outcome.
Joint entropy is the amount of information in two (or
more) random variables.
Conditional entropy is the amount of information in
one random variable, given we already know the
other.
Entropy rate is per-word or per-character entropy.
917
Reduction of uncertainty due to an

observation
918
459
Symmetry in the reduction of

uncertainty
919
Mutual information
The mutual information basically measures the

amount of information which Y contains about X
(or vice versa).
920
460
Note
Statistically independent X and Y
=> I(X,Y) = 0
=> Entropy is additive for independent variables
I(X,X) = H(X)-H(X|X) = H(X)

= Self-information is Entropy
921
Example
922
461
Example
923
Definition: Mutual information
924
462
Information measures
925
Continuous variables: Differential entropy
926
463
Shannon theorems and channel

capacity
927
Model of communication systems

In 1948, Claude Shannon laid the foundations for
modern information, coding, and communication
theory.
He developed a general model for communication
systems, and a set of theoretical tools for analyzing
such systems.
His basic model consists of three parts: a sender (or
source), a channel, and a receiver (or sink).
928
464
Communication with source and channel

coding
Shannons general model also includes encoding

and decoding elements, and noise within the
channel equivalent noiseless channel
929
Transmission channels
Cables
wire pairs (e.g., ordinary telephone line)
coaxial cable
waveguide (metallic waveguide and optical fiber)
More or less free space radio transmission
broadcasting
point-to-point microwave transmission
satellite position transmission
cell networks
(Portable magnetic/electronic/optical memory equipment)
930
465
Channel models
Channel models of the random phenomena

introduced by the physical channel are needed
Examples:
A discrete channel A linear additive noise channel
Discrete information transmission

system
932
466
Probability concepts of transmission

Several types of symbol probabilities will be needed to deal
with the two alphabets here, and well use the notation defined
as follows:
P(xi) is the probability that the source selects symbol xi for
transmission;
P(yj) is the probability that symbol yj is received at the
destination;
P(xi, yj) is the joint probability that xi is transmitted and yj is
received;
P(xi|yj) is the conditional probability that xi was transmitted
given that y is received;
P(yj|xi) is the conditional probability that yj is received given
that xi was transmitted.
933
Example
Well assume, for simplicity, that the channel is time-invariant and
memoryless, so the conditional probabilities are independent of time and
previous symbol transmissions.
The conditional probabilities P(yj|xi) then have special significance as the
channels forward transition probabilities.
By way of example, Figure depicts the forward transitions for a noisy
channel with two source symbols and three destination symbols.
If this system is intended to deliver yj = y1 when xi = x1 and yj = y2 when xi
= x2 , then the symbol error probabilities are given by P(yj|xi) for j i.
Forward
transition
probabilities for
a noisy discete
channel
934
467
Shannons model
In Shannons discrete model, it is assumed that the
source provides a stream of symbols selected from a
finite alphabet A = {a1, a2, . . . , an}, which are then
encoded.
The code is sent through the channel and possibly
disturbed by noise.
At the other end of the channel, the receiver will
decode, and derive information from the sequence of
symbols.
Note: Sending information from one place to another is equivalent to sending

information from one time to another time, and thus Shannons theory applies equally
as well to information storage questions as to information transmission questions.
Shannons model
Given a source of symbols and a channel with noise (more precise, a
probability model for these elements), we can talk about the capacity of the
channel.
The general model Shannon worked with involved two sets of symbols, the
input symbols and the output symbols.
Let us say the two sets of symbols are
A = {a1, a2, . . . , an} and
B = {b1, b2, . . . , bm}.
Note that we do not necessarily assume the same number of symbols in the
two sets.
Given the noise in the channel, when symbol bj comes out of the channel,
we can not be sure which ai was put in.
The channel is characterized by the set of probabilities {P(ai|bj)}.
936
468
Mutual information
We can then consider various related information and entropy
measures.
First, we can consider the information we get from observing a
symbol bj.
Given a probability model of the source, we have an a priori
estimate P(ai) that symbol ai will be sent next.
Upon observing bj, we can revise our estimate to P(ai|bj).
The change in our information (the mutual information) will
be given by:
937
Mutual information: Properties
If ai and bj are independent (i.e., if

P(ai, bj) = P(ai) P(bj)), then
I(ai; bj) = 0.
938
469
Average mutual information

What we actually want is to average the mutual information
over all the symbols:
and from these,
939
Average mutual information: Properties
Also we have:
I(A;B) 0,
and
I(A;B) = 0
if and only if A and B are independent.
940
470
Entropy: Definitions and properties
941
Mutual information and entropies
942
471
Channel capacity
If we are given a channel, we could ask what is the

maximum possible information that can be
transmitted through the channel.
We could also ask what mix of the symbols {ai} we
should use to achieve the maximum.
In particular, using the definitions above, we can
define the channel capacity to be:
943
Shannons main theorem

For any channel, there exist ways of encoding input symbols
such that we can simultaneously utilize the channel as closely
as we wish to the capacity, and at the same time have an error
rate as close to zero as we wish.
This is actually quite a remarkable theorem.
We might naively guess that in order to minimize the error rate, we
would have to use more of the channel capacity for error
detection/correction, and less for actual transmission of information.
Shannon showed that it is possible to keep error rates low and
still use the channel for information transmission at (or near)
its capacity.
944
472
Shannons channel coding theorem

Unfortunately, Shannons theorem has a a couple of
problematic points.
The first is that the proof is non-constructive.
It doesnt tell us how to construct the coding system to optimize
channel use, but only tells us that such a code exists.
The second is that in order to use the capacity with a low error
rate, we may have to encode very large blocks of data.
This means that if we are attempting to use the channel in real-time,
there may be time lags while we are filling buffers.
There is thus still much work possible in the search for
efficient coding schemes.
945
Application: Source coding

Consider a memoryless information source producing an
output signal to be represented by a bit stream.
The source coder doing this must use a unique bit stream for
each of the possible messages (message streams).
This is called lossless source coding.
Also lossy coding techniques exist.
Different source coding variants:
encoding each source output individually vs. treating many consecutive
outputs as a whole
nature of the code words in the whole coded stream fixed vs. variable
length code words
known vs. unknown statistics of the source
Question:
Whats the minimum amount of bits that can be used ?
946
473
Discrete Memoryless Source (DMS) and

typical sequence
(the joint
947
Examples: Typical sequences
948
474
Possible sequences
949
Coding problem
We need NH bits to enumerate all different typical

sequences.
That is, H bits/source symbol.
A source code: (1) Observe a sequence; (2a) If typical
sequence produce and store/transmit its NH bit index; (2b) If
non-typical sequence declare an error; (3) Reproduce the
source sequence from the stored/transmitted index.
This code has rate R = H bits/source symbol.
As N the code works without errors.
475
Source Coding Theorem
The (lossless) Source Coding Theorem: For a source with

entropy rate H, a lossless source code of rate R exists as long
as R > H. For R < H no lossless source code can be found.
H measures the information content in the source,

in the sense that H bits per symbol are required to
describe its output!
Source coding: Example

One very simple but rather efficient code is obtained by coding two
consecutive source bits as follows: assign lowest-length symbol to the most
probable event
It is also easy to decode a bit-stream constructed in this way since no code

word is a prefix of another.
Now in this code, 0.645 bits per sample are used on average (opposed to 1
bit per sample of the trivial code) (average code length to transmit 2 bits
= 0.81*1+0.09*2+0.09*3+0.01*3=1.29 => average code length to transmit
1 bit =1.29/2=0.645) 952
476
Example
Shannon source coding theorem relates the uncertainty

of the source output to the probability of typical long
sequences of source symbols.
953
Typical sequence
Lets assume that a source is a discrete time

stochastic process {Xn} and Xn is independent and
identically distributed binary digits 0 and 1 with
probabilities p and 1-p.
954
477
Example
Assuming that p < 1/2, the most probable output

sequence consists of only ones.
But such a sequence is not a typical sequence.
Suppose we bet on horses, then it is likely that we

lose, but it would be very unlikely that we lose all the
time; such a losing sequence is not typical!
Probability of the typical sequence
956
478
Amount of the typical sequences
957
Typical set
958
479
Example
959
960
480
Example
Choose a smaller , namely =0.046 [ 5% of
h(1/3)], and increase the length of the
sequences.
961
Meaningful sequences
If we consider a source that outputs text, then the

typical long sequences are the sequences of
"meaningful" text while the nontypical sequences are
simply garbled text.
What is meant by "meaningful" is determined by the
structure of the language; that is, by its grammar,
spelling rules etc.
481
Meaningful text
963
Meaningful fraction
964
482
Coding of the typical sequences
965
966
483
Source coding theorem
967
Codewords
968
484
Code rate
969
Transmission rate
The transmission rate Rt is measured in bits/second

and is obtained by multiplying the code rate R by the
number of transmitted channel symbols/second.
If the duration of the pulse for a symbol is T seconds,
then we obtain the transmission rate as
485
Example
The (3,2) block code B = {000, 011, 101, 110}

consists of M = 2K= 4 codewords and has rate
R = log(M)/K = K/N = 2/3.
971
(N,K) block code
972
486
Example
973
Separate source and channel coding
974
487
Binary Symmetric Channel (BSC)
975
Channel coding theorem
976
488
Jointly typical sequences
By combining
and the previous equations we obtain
977
Cardinality of jointly typical set
978
489
Fan
979
Fans
nonoverlapping fans in the following figure.
980
490
Fans
981
Number of distinguishable messages
Each fan can represent a message.
982
491
Channel capacity
983
Non-overlapping fans and correct

decoding
984
492
R>C
985
Channel coding theorem
986
493
Reversed fans
987
Coding strategy
988
494
Proof of validity of strategy
989
990
495
991
Example: Binary Symmetric Channel (BSC)
992
496
Channel capacity of BSC
993
Discrete time Gaussian channel
994
497
Infinite capacity Infinity energy
995
Limitation: Signaling energy
996
498
Capacity of Gaussian channel
997
Bandlimited noisy channel
998
499
Transmitted signal power
999
Signal and noise energies/sample
1000
500
Channel capacity
1001
Channel capacity
1002
501
Capacity of bandlimited Gaussian

channel
1003
Without bandwith limitation
1004
502
Without bandwith limitation
1005
Energy/bit
1006
503
Capacity without bandwith limitation
1007
Shannon limit
1008
504
Shannon limit and reliable

communication
1009
Example: Bandlimited Gaussian

channel
1010
505

channel
1011

channel
1012
506

channel
1013

channel
1014
507

channel
1015
Losless source coding
1016
508
Source coding
1017
Lossless source coding: Concepts
1018
509
Desired properties of a source code
1019
Instantaneous Prefix property
1020
510
Classifications
1021
Shannon Source Coding Theorem

(SCT)
A discrete source with entropy rate H [bits/source
symbol], and a lossless source code of rate R
[bits/source symbol].
In 1948 Claude Shannon showed, based on typical
sequences, that a lossless (errorfree) code exists as
long as R > H.
A lossless code does not exist for any R < H.
The theorem does not say how to design practical
coding schemes.
Note: Shannon SCT is an "existence/non-existence theorem" (as many

results in information theory are).
511
Inequalites
1023
SCT for uniquely decodable codes
1024
512
Source coding
1025
Source coding
1026
513
Prefix
1027
Prefix-free code
1028
514
Example
1029
Example
1030
515
Leaves and nodes
1031
Path length lemma
1032
516
Average codeword length = average leaves

depth
1033
Example
1034
517
Example: Huffman tree
1035
Example: Huffman
1036
518
Example: Huffman
1037
Example: Huffman
1038
519
Average codeword length vs. uncertainty

of the source
1039
Optimality of Huffman code
1040
520
Coding source symbols pairwise
1041
Huffman codes
1042
521
Huffman codes
1043
Lempel-Ziv-Welch
The LZW algorithm is a so-called universal source-

coding algorithm, which means that we do not need
to know the source statistics.
The algorithm is easy to implement and for long
sequences it approaches the uncertainty of the source;
it is asymptotically optimum.
522
Lempel-Ziv algoritm
1045
Lossy source coding
1046
523
Lossy source coding
1047
Distortion measures
1048
524
Rate-distortion function
1049
Distortion-rate function
1050
525
Examples
1051
Quantization
1052
526
Scalar quantizer
1053
MSE and SQNR
1054
527
Uniform/Linear quantization
1055
Quantization noise
1056
528
Optimal non-uniform quantization
1057
Compressor and expander
1058
529
Speech compression
1059
Waveform coding: PCM
1060
530
Waveform coding: DPCM
1061
1-point DPCM
1062
531
Multipoint DPCM
1063
1064
532
Analysis-syntesis technique
1065
Some examples
1066
533
Channel coding (Error correction)
1067
Main classes of channel coding
1068
534
Binary field
1069
Addition
1070
535
Linear code
1071
Received word = codeword + error
1072
536
Errors and Hamming distance
1073
Minimum distance
1074
537
Example: Hamming (7,4)

In Table 1 we specify an encoder mapping for the
(7,4) Hamming code with M = 24 = 16 codewords.
Table 1:
1075
Example
1076
538
Example
1077
Example
1078
539
Example
1079
Example
1080
540
Example
1081
Example
1082
541
G-matrix
1083
Generation of codewords
1084
542
Example
1085
Parity-checking procedure
1086
543
Parity-checking procedure
1087
Parity-check matrix
1088
544
GHT=0
1089
Convolutional codes
1090
545
Example: Encoder for convolutional

code
1091
Encoder state
The state of a system is a compact description of

its past history such that it, together with the
present input, suffices to determine the present
output and the next state.
For our convolutional encoder we choose the state

to be the contents of its memory element;
that is, at time t the state is
546
Example
1093
State transition diagram
1094
547
Trellis-diagram
1095
Viterbi-algorithm
1096
548
Example: Viterbi decoding
1097
Example: Evolution of subpaths through

trellis
1098
549
Error correction capability
1099
Error correction capability
1100
550
Free distance and capability of error

correction
1101
Diversity
551
Introduction to diversity
Basic Idea
Send same bits over independent fading paths
Independent fading paths obtained by time, space, frequency, or polarization
diversity
Combine paths to mitigate fading effects
Tb
t
Multiple paths unlikely fade simultaneously
1103
Diversity gain
AWGN case: BER vs SNR:

(any modulation scheme, only the constants differ)
Note: Here is received SNR
Rayleigh Fading without diversity:
Rayleigh Fading with diversity:
(MIMO):
Note: Diversity is a reliability theme, not a capacity/bit-rate one.

For capacity: need more degrees-of-freedom (i.e. symbols/s) & packing of bits/symbol
1104
(e.g. MQAM).
552
Time diversity
1105
Time diversity
Time diversity can be obtained by interleaving and coding over
symbols across different coherent time periods.
Channel: time
diversity/selectivity,
but correlated across
successive symbols
(Repetition) Coding
without interleaving: a full
codeword lost during fade
Interleaving: of sufficient
depth: (> coherence time)
At most 1 symbol of codeword
lost!
Coding alone is not sufficient!
553
Forward Error Correction (FEC):

Eg: Reed-Solomon RS(N,K)
K of N Recover K
RS(N,K) received data packets!
FEC (N-K)
Block
Size Lossy Network
(N)
Data = K
Block: of sufficient size: (> coherence time), 1107

else need to interleave, or use with hybrid ARQ
Hybrid ARQ/FEC model

Packets Sequence Numbers
CRC or Checksum
Proactive FEC
Timeout
ACKs
Status Reports
NAKs,
SACKs
Bitmaps
Retransmissions
Packets
Reactive FEC
1108
554
Example: GSM
The data of each user are sent over time slots of length 577 s
Time slots of the 8 users together form a frame of length 4.615 ms
Voice: 20 ms frames, rate convolution coded = 456 bits/voice-frame
Interleaved across 8 consecutive time slots assigned to that specific user:
0th, 8th, . . ., 448th bits are put into the first time slot,
1st, 9th, . . ., 449th bits are put into the second time slot, etc.
One time slot every 4.615 ms per user, or a delay of ~ 40 ms (ok for voice).
The 8 time slots are shared between two 20 ms speech frames.
1109
Repetition code: Diversity analysis
After interleaving over L coherence time periods,
Repetition coding: for all
where and
This is classic vector detection in white Gaussian

noise.
1110
555
Repetition coding: Matched filtering
hx1 only spans a 1-dim space

(similar to MPAM, with
random channel gains instead!)
||h||
1111
Multiply by conjugate => cancel phase!
Repetition coding: Fading analysis

BPSK Error probability:
Average over ||h||2 i.e. over Chi-squared distribution,

L-degrees of freedom! Repetition coding gets full diversity,
but sends only one symbol every L
symbol times.
i.e. trades off bit-rate for
reliability (better BER)
1112
556
Chi-square distribution
In probability theory and statistics, the chi-square distribution
(also chi-squared or -distribution) with k degrees of freedom
is the distribution of a sum of the squares of k independent
standard normal random variables.
It is one of the most widely used probability distributions in
hypothesis testing, or in construction of confidence intervals
If X1, ..., Xk are independent, standard normal random variables,
then the sum of their squares
is distributed according to the chi-square distribution with k

degrees of freedom denoted as
The chi-square distribution has one parameter: k a positive integer that specifies the
1113
number of degrees of freedom (i.e. the number of Xis)
Diversity gain: Intuition

Typical error (deep fade) event probability:
In other words, ||h|| < ||w||/||x||
i.e. ||hx|| < ||w||
(i.e. signal x is attenuated to be of the order of noise w)
Chi-Squared pdf of
1114
557
Key note: Deep fades become rarer

Deep fade Error event
Note: this graph plots

reliability (i.e. BER vs SNR)
Repetition code trades

off information rate
(i.e. poor use of deg-of-freedom)
1115
Antenna diversity
1116
558
Antenna diversity
Receive Transmit Both

(SIMO) (MISO) (MIMO)
1117
Antenna diversity: Rx

1118
559
Receive diversity
Same mathematical structure as repetition

coding in time diversity (!), except that
there is a further power gain (aka array
gain).
Optimal reception is via matched filtering/MRC
(Maximal Ratio combiner)
(a.k.a. receive beamforming).
1119
Array gain vs. diversity gain

Diversity Gain: There are multiple independent channels between
the transmitter and receiver, and diversity gain is a product of the
statistical richness of those channels
Array gain is not caused by statistical diversity between the

different channels but coherent combination of the actual energy
received by each of the antennas.
Even if the channels are completely correlated, as might happen in a
line-of-sight (LOS) system, the received SNR increases linearly with
the number of receive antennas.
Eg: Correlated flat-fading:
Single Antenna SNR:
Adding all received paths:
1120
560
Receive diversity: Selection combining
Recall: Bandpass vs. matched filter analogy.

Pick max signal, but dont fully combine signal
power from all taps. Diminishing returns from
more taps.
1121
Receive Beamforming: Maximal Ratio Combining

(MRC)
Weight each branch
SNR:
MRC idea: Branches with better signal energy should be enhanced,

whereas branches with lower SNRs given lower weights
1122
561
Equivalence of MRC and matched filtering

Maximal Ratio Combining (MRC) or Beamforming is just
matched filtering in the spatial domain!
Generalization of this f-domain picture, for combining multi-tap
signal
Weight each branch
SNR: 1123
Selection diversity vs. MRC
1124
562
Antenna diversity: Tx

1125
Transmit diversity
If transmitter knows the channel, send:
maximizes the received SNR by in-phase addition of

signals at the receiver (transmit beamforming), i.e.
closed-loop Tx diversity.
1126
563
Spacetime coding (STC)

Spacetime coding is a technique used in wireless
communications to transmit multiple copies of a data stream
across a number of antennas and to exploit the various received
versions of the data to improve the reliability of data-transfer.
The fact that the transmitted signal must traverse a difficult
environment with scattering, reflection, refraction and so on and
may then be further corrupted by thermal noise in the receiver
means that some of the received copies of the data will be
'better' than others.
This redundancy results in a higher chance of being able to use
one or more of the received copies to correctly decode the
received signal.
In fact, spacetime coding combines all the copies of the
received signal in an optimal way to extract as much information
from each of them as possible.
1127
Space-Time Block Coding (STBC)

STC involves the transmission of multiple redundant
copies of data to compensate for fading and thermal
noise in the hope that some of them may arrive at the
receiver in a better state than others.
In the case of STBC in particular, the data stream to
be transmitted is encoded in blocks, which are
distributed among spaced antennas and across time.
While it is necessary to have multiple transmit
antennas, it is not necessary to have multiple receive
antennas, although to do so improves performance.
This process of receiving diverse copies of the data
is known as diversity reception
1128
564
Antenna Diversity: Tx+Rx = MIMO

1129
Wireless Overview
565

Wireless channel with RX and TX antennas
and between them more or less free-space
Noise
Transmitter Channel Receiver
Source
SOURCE
User
Transmitter
Source Channel
Formatter Modulator
encoder encoder
Receiver
Source Channel
decoder decoder
What is wireless communication?

Any form of communication that does not require the
transmitter and receiver to be in physical contact through
guided media
Electromagnetic wave propagated through free-space
RF, Microwave, IR, Optical
Simplex: one-way communication (e.g., radio, TV)
Half-duplex: two-way communication but not simultaneous
(e.g., push-to-talk radios)
Full-duplex: two-way communication (e.g., cellular phones)
Frequency-division duplex (FDD)
Time-division duplex (TDD)
1132
566
Why wireless?
Characteristics
Mostly radio transmission
New protocols for data transmission are needed
Advantages
Spatial flexibility in radio reception range
Ad hoc networks without former planning
No problems with wiring (e.g. historical buildings, fire protection, esthetics)
Robust against disasters like earthquake, fire and careless users (which
remove and break connectors and cut wires)
Disadvantages
Generally lower transmission rates for higher numbers of users
Often proprietary, standards are often restricted
Many national regulations, global regulations are evolving slowly
Restricted frequency range, interferences of frequencies
Nevertheless, in the last 30 years, it has really been a wireless revolution 1133
Radio wave propagation
Propagation of the radio wave in free space

depends heavily on the frequency of the
signal and obstacles in its path.
There are some major effects on signal
behavior
Reflection and multipath
Diffraction or shadowing
scattering
Building and vehicle penetration
Fading of the signal
Interference 1134
567
Factors affecting wave propagation
(1) direct signal

(2) diffraction
(3) vehicle penetration
(4) interference
(5) building penetration
1135
Path loss, shadowing, fading

Variable & rapid decay of signal due to environment, multi-
paths, mobility
568
Fading channel
Wireless channel is very different!

Wireless channel is very different from a wired channel.
Not a point-to-point link: EM signal propagates in patterns determined by the
antenna gains and environment
Noise adds on to the signal (AWGN)
Signal strength falls off rapidly with distance (especially in cluttered
environments): large-scale fading.
Shadowing effects make this large-scale signal strength drop-off non-isotropic.
Fast fading leads to huge variations in signal strength over short distance, times,
or in the frequency domain.
Interference due to superimposition of signals, leakage of energy can raise the
noise-floor and fundamentally limit performance:
Self-interference (inter-symbol, inter-carrier), co-channel interference (in a
cellular system with high frequency reuse), cross-system interference
(microwave ovens vs. WiFi vs. bluetooth)
Results:
Variable capacity
Unreliable channel: errors, outages
Variable delays
Capacity is shared with interferers.
569
Wireless systems
Cellular
With a big emphasis on voice communication
Terrestrial microwave and satellite systems
WiFi
Local networks over wireless, with infrastructure
E.g., 802.11a,b,g,n
WiMAX
Internet provider last mile replacement
Ad Hoc Network
Local networks over wireless, without infrastructure
Sensor network
Radar and radio telescope system 1139
Cellular systems
Geographic region divided into cells
Frequencies/timeslots/codes reused at spatially-separated
locations.
Base stations/MTSOs (Mobile Telephone Switching Offices)
coordinate handoff and control functions
Shrinking cell size increases capacity, as well as networking
burden
Note: Co-channel interference (between same-color cells
below).
BASE
STATION
MTSO
1140
570
Cellular phone networks

Los Angeles
BS
BS
Internet
New York
MTSO MTSO
PSTN
BS
1141
PSTN - Public ServiceTelephone Network
Inside the BS/MTSO is buzzwords bonanza!
1142
571
Wireless generations
First Generation (1G): Analog 25 or 30 kHz FM,
voice only, mostly vehicular communication
Second Generation (2G): Narrowband TDMA and
CDMA, voice and low bit-rate data, portable
units.
Third Generation (3G): Wideband TDMA and
CDMA, voice and high bit-rate data, portable
units
Fourth Generation (4G and beyond 2015): true
broadband wireless: advanced versions of
WiMAX, 3G LTE, 802.11 a/b/g/n, UWB together
in form of intelligent cognitive radio
1143
Generations
Other Tradeoffs:
Rate Rate vs. Coverage
4G Rate vs. Delay
Rate vs. Cost
3G Rate vs. Energy
2G
Mobility
1144
572
LTE: Long-Term Evolution
Based upon OFDM, OFDMA, MIMO

Longer term objective is to support
up to peak data rate of 200 Mbps
with a high average spectral
efficiency
Rule of thumb: the actual capacity (Mbps per channel per sector) in a
multi-cell environment for most wireless
1145
technologies is about 20% to
30% of the peak theoretical data rate.
Wireless evolution to 4G
573
4G/IMT-Advanced
International Mobile Telecommunications (IMT)-Advanced Standard
are requirements issued by the ITU-R of the International
Telecommunication Union (ITU) in 2008 for what is marketed as 4G (Or
sometimes as 4,5G mobile phone and Internet access service.
4G provides, in addition to the usual voice and other services of 3G, mobile
broadband Internet access, for example to laptops with wireless modems, to
smartphones, and to other mobile devices.
Potential and current applications include amended mobile web access, IP
telephony, gaming services, high-definition mobile TV, video
conferencing, 3D television, and cloud computing.
4.5G provides better performance than 4G systems, as an interim step
towards deployment of full 5G capability.
The technology includes:
LTE Advanced
MIMO
5G
5G denotes the next major phase of mobile telecommunications standards
beyond the current 4G/IMT-Advanced standards.
5G network requirements could be as:
Spectral efficiency should be significantly enhanced compared to 4G.
Coverage should be improved.
Signaling efficiency enhanced.
Latency should be significantly reduced compared to LTE.
Data rates of several tens of Mb/s should be supported for tens of thousands of
users.
1 Gbit/s to be offered, simultaneously to tens of workers on the same office
floor.
Several hundreds of thousands of simultaneous connections to be supported for
massive sensor deployments.
574
5G
A road map would be:

Detailed requirements ready and initial system design in 2017
Standards ready end of 2018
Trials start in 2018
Commercial system ready in 2020
The launch of 5G will happen on an operator and country
specific basis in 2020.
In addition to simply providing faster speeds 5G networks will
also need to meet the needs such as the Internet of Things
(IOT).
Classification
Wireless (vs. wired) communication medium

Cellular (vs. meshed vs. MANETs) architectures for
coverage, capacity, QoS, mobility, auto-configuration,
infrastructure support
Mobile (vs. fixed vs. portable) implications for devices
WPAN (vs. WLAN vs. WMAN) network scope, coverage,
mobility
Technologies/Standards/Marketing Alliances: 802.11,
UWB, Bluetooth, Zigbee, 3G, GSM, CDMA, OFDM, MIMO,
WiMAX
Mobile Ad-hoc NETwork (MANET) - ad hoc: no backbone infrastructure

1150
Note: Wireless Body Area Network (WBAN)
575
Wireless standards
IEEE 802.15.4 Sensors RFID

(Zigbee Alliance)
RAN
IEEE 802.22
WAN
IEEE 802.20
IEEE 802.16e
IEEE 802.16d MAN ETSI HiperMAN

WiMAX & HIPERACCESS
IEEE 802.11 LAN ETSI

Wi-Fi Alliance HiperLAN
IEEE 802.15.3 PAN ETSI

UWB, Bluetooth
HiperPAN
Wi-Media
1151
Wireless LANs: WiFi/802.11

Based on the IEEE 802.11a/b/g/n family of standards, and is
primarily a local area networking technology designed to provide
in-building or campus broadband coverage.
IEEE 802.11a/g peak physical layer data rate of 54 Mbps and
indoor coverage over a distance of 30 m.
Beyond buildings: municipal WiFi, Neighborhood Area
Networks (NaN), hotspots
Much higher peak data rates than 3G systems, primarily since it
operates over a larger bandwidth (20 MHz).
Its MAC scheme CSMA (Carrier Sense Multiple Access) is
inefficient for large numbers of users
The interference constraints of operating in the license-
exempt band is likely to significantly reduce the actual
capacity of outdoor Wi-Fi systems.
Wi-Fi systems are not designed to support high-speed
mobility.
Wide availability of terminal devices
802.11n: MIMO techniques for range extension and higher bit
rates 1152
576
Some wireless LAN standards (Wi-Fi)

802.11b
Standard for 2.4GHz ISM band
Frequency hopped spread spectrum
Up to 11 Mbps
802.11a
Standard for 5GHz
OFDM
Up to 54 Mbps
HiperLAN in Europe
802.11g
Standard in both 2.4 GHz and 5 GHz bands
OFDM
Speeds up to 54 Mbps
Frequency Hopping Spread Spectrum (FHSS) - the total frequency band is split into a number of channels.
The broadcast data is spread across the entire frequency band by hopping between the channels in a
pseudorandom fashion.
OFDM - Orthogonal Frequency Division Multiplexing is a multi carrier transmission technique capable of
supporting high speed services whilst still being bandwidth efficient. It achieves this by forcing multiple
1153
sub-carriers together. However, to ensure these adjacent sub-carriers do not cause excessive
interference, they must be orthogonal or 90 to one another.
IEEE 802.11n
Over-the-air (OTA): 200 Mbps; MAC layer (MAC-SAP*): 100Mbps
Microcells, neighborhood area networks (NANs)
PHY
MIMO/multiple antenna techniques
Advanced FEC, (Forward Error Correction)
10, 20 & 40MHz channels widths
Higher order modulation/coding
1154
577
WLAN network architecture

Basic Service Set (BSS): a set of stations which
communicate with one another
Ad-hoc network Infrastructure Mode
Only direct communication

Stations communicate with AP
possible
AP provides connection to wired network
No relay function
(e.g. Ethernet)
Stations not allowed to communicate
directly
Some similarities with cellular
1155
WLAN network architecture

ESS: a set of BSSs interconnected by a distribution system (DS)
Local Area Network (e.g .Ethernet)
ESS Extended Service Set
578
IEEE 802.15 (WPANs)
802.15.1 adoption of Bluetooth standard into

IEEE
802.15.2 coexistence of WPANs and WLANs
in the 2.4GHz band
802.15.3 high rate WPAN (UWB)
802.15.4 low rate WPAN (Zigbee)
802.15: Wireless Personal Area Network
less than 10 m diameter

replacement for cables
(mouse, keyboard, S
headphones) radius of
M
ad hoc: no backbone coverage
infrastructure S
S
master/slaves:
slaves request permission to
send (to master)
master grants requests
802.15: evolved from
Bluetooth specification
2.4-2.5 GHz radio band
1158
up to 1 Mbps
579
Bluetooth: WPAN (piconet)
Cable replacement RF technology (low cost)

Short range (typically <10 m, extendable to 100 m)
2.4 GHz ISM band (crowded!)
Data rate 1 Mbit/s
Widely supported by telcos, PC and consumer
electronics companies
What is UltraWideBand (UWB)?

Time-domain behavior Frequency-domain behavior
Narrowband
Communication
0 1 0 1
Frequency
Modulation
2.4 GHz
Communication
Ultrawideband
1 0 1
Impulse
Modulation
time 3 frequency 10 GHz
(FCC Min=500MHz)
Communication occupies more than 500 MHz of spectrum:

baseband or 3.6-10.1 GHz range. Strict power limits.
580
Why is UWB interesting?

UWB is an impulse radio: sends pulses of tens of picoseconds (10-12) to
nanoseconds (10-9)
Duty cycle of only a fraction of a percent; carrier is not necessarily needed
Uses a lot of bandwidth (GHz); Low probability of detection
Excellent ranging capability; Synchronization (accurate/rapid) an issue.
Multipath highly resolvable: good and bad
Can use OFDM or Rake receiver to get around multipath problem.
Low power transmitters -- 100 times lower than Bluetooth for same
range/data rate
Very high data rates possible Gbps at ~10 m under current regulations
7.5 GHz of free spectrum in the U.S.
FCC legalized UWB for commercial use
Spectrum allocation overlays existing users, but its allowed power
level is very low to minimize interference
Apps: Wireless USB,
480 Mbps, 10m,
UWB spectrum
Bluetooth,
802.11b 802.11a
GPS
PCS
Emitted Cordless Phones

Signal Microwave Ovens
Power
-41 dBm/MHz
UWB
Spectrum
1.6 1.9 2.4 3.1 5 10.6
Frequency (GHz)
Worldwide regulations differ from US --- Japan, EU, Asia
1162
581
IEEE 802.15.4 / ZigBee radios
Low-Rate WPAN
Very low power consumption (no recharge for months or
years!), up to 255 devices
Data rates of 20, 40, 250 kbps
Star clusters or peer-to-peer operation
CSMA-CA channel access
Frequency of operation in ISM bands
Home automation, consumer electronics applications,
RFID/tagging applications (goods supply-chain)
WiMAX fixed and mobile

WiMAX Fixed / Nomadic WiMAX Mobile
802.16d or 802.16 802.16e
Usage: Backhaul, Wireless DSL Usage: Long-distance mobile
Frequencies: 2.5GHz, 3.5GHz wireless broadband
and 5.8GHz (Licensed and L- Frequencies: 2.5GHz
free) Description: Wireless
Description: wireless connections to laptops, PDAs and
connections to homes, handsets when outside of Wi-Fi
businesses, and other WiMAX hotspot coverage
or cellular network towers
WiMAX - Worldwide Interoperability for Microwave Access - the radio interface

1164
within two broad radio bands 2 10 GHz and 11 66 GHz
582
Wide area: Satellite systems
Cover very large areas

Different orbit heights
GEOs (~40000 km), LEOs
(~2000 km), MEOs (~9000km)
Dish antennas, or bulky handsets
Optimized for one-way transmission
Location positioning
GPS systems
Satellite Radio
Radio ( DAB) and (SatTV) broadcasting
Most two-way systems struggling or bankrupt
Expensive alternative to terrestrial cellular system
Trucking fleets, journalists in wild areas, oil rigs
Paging systems
Broad coverage for short messaging
Message broadcast from all base stations
High Tx power (hundreds of watts to kilowatts),
low power pagers
Simple terminals
Optimized for 1-way transmission
Answer-back hard
Overtaken by cellular
obsolete
1166
583
Radio spectrum and its efficient

utilization
Crowded spectrum: FCC chart
1168
584
EM Spectrum for telecom
Most spectra licensed; license can be very

expensive (cellular);
Infrared, ISM (Industrial, Scientific,
Medical) band, and amateur radio bands are
license-free
Licensed and unlicensed spectrum
Licensed
Cell phones, police & fire radio, taxi dispatch, etc.
Unlicensed
All unlicensed bands impose power limits
Industrial, Scientific, Medical (ISM) bands
e.g. (900MHz, 2.4GHz, 5.8GHz)
Unlicensed Personal Communication System (UPCS)
e.g. 1.910-1.920 GHz and 2.390-2.400 GHz
Unlicensed National Information Infrastructure
(UNII) bands
e.g. 5.2GHz
585
Example: 2.4 GHz interference

Micro-wave oven
Bluetooth
802.11b/g WLAN
Cordless phone
Analog video link
Radio/TV/Wireless allocations: 30 MHz-30 GHz
1172
586
Open spectrum: ISM and UNII Bands

ISM: Industrial, Scientific & Medical Band UNII
UNII: Unlicensed National Information Infrastructure band
ISM
ISM
1 2 3 4 5 GHz
1173
802.11/802.16 spectrum
UNII
International International
US Japan ISM
Licensed ISM Licensed Licensed
Licensed
1 2 3 4 5 GHz
802.16a has both licensed and license-free options

ISM: Industrial, Scientific & Medical Band Unlicensed band
UNII: Unlicensed National Information Infrastructure band Unlicensed band
1174
587
Summary: Key pieces of licensed and unlicensed

spectrum
Upper
Low/Mid UNII
WCS ISM MMDS Intl Intl UNII and ISM
License Exempt
Licensed
New Spectrum
2 3 4 5 GHz
1175
Spectrum allocation methods
Auctions: raise revenue, market-based, but may shut out

smaller players; upfront cost depress innovation (lower
equipment budget).
Beauty contest: best technology wins. Faster deployments,
monopolies/oligopolies.
Unlicensed: power limits. (WiFi, some Wimax)
Underlay: primary vs. secondary users. Stricter power limits
for secondary: hide in a wider band under the noise floor
(UWB)
Cognitive radio: primary user has priority. Secondary user
can use greater power, but has to detect and vacate the
spectrum when primary users come up.
1176
588
Counter-attacking the challenges

Turn disadvantages into advantages!
Resources associated with a fading channel: 1) diversity; 2) number of degrees of
freedom; 3) received power.
Cellular concept: reuse frequency and capacity by taking advantage of the fact that
signal fades with distance. Cost: cells, interference management
Multiple access technologies: CDMA, OFDMA, CSMA, TDMA: share the spectrum
amongst variable number of users within a cell
Diversity i.e. use performance variability as an ally by having access to multiple
modes (time, frequency, codes, space/antennas, users) and combining the signal
from all these modes
Directional/Smart/Multiple Antenna Techniques (MIMO): use spatial diversity, spatial
multiplexing.
Adaptive modulation/coding/power control per-user within a frame in low-SNR regime
Multi-hop/Meshed wireless networks with micro-cells
Interference: still the biggest challenge.
Interference estimation and cancellation techniques (e.g., multi-user) may be
key in the future.
CDMA: interference averaging.
Opportunistic beamforming using intelligent antennas
Cellular concept: Spatial reuse
Note: With CDMA

or WiMAX there can
be frequency
1178 reuse of 1
589
Cells in reality
Cellular model vs. reality: shadowing and variable

large-scale propagation due to environment
1179
Interference in cellular networks

Assume the asynchronous users sharing the
same bandwidth and using the same radio base
station in each coverage area or cell.
Intra-cell/co-channel interference due to the
signal from the other users in the home cell.
Inter-cell/adjacent channel interference due to
the signal from the users in the other cell.
Interference due to the thermal noise.
Methods for reducing interference:

Frequency reuse: in each cell of cluster
pattern different frequency is used
By optimizing reuse pattern the problems
of interference can be reduced
significantly, resulting in increased
capacity.
Reducing cell size: in smaller cells the
frequency is used more efficiently: cell
sectoring, splitting
Multilayer network design (overlays):
macro-, micro-, picocells
590
Cell splitting increases capacity
1181
Trend towards smaller cells

Driving forces:
Need for higher capacity in areas with high user density
Reduced size and cost of base station electronics.
[Large cells require 1 million base stations]
Lower height/power, closer to street.
Issues:
Mobiles traverse a small cell more quickly than a large cell.
Handoffs must be processed more quickly.
Location management becomes more complicated, since there are
more cells within a given area where a mobile may be located.
May need wireless backhaul
Wireless propagation models dont work for small cells.
Microcellular systems are often designed using square or triangular
cell shapes, but these shapes have a large margin of error
1182
591
Sectoring improves S/I
Capacity increase 3X.

Each sector can reuse time and code slots.
Interference is reduced by sectoring, since users
only experience interference from the sectors at
their frequency.
1183
Sectoring: Tradeoffs
More antennas.
Even though intersector handoff is simpler compared

to intercell handoff, sectoring also increases the
overhead due to the increased number of inter-
sector handoffs.
In channels with heavy scattering, desired power can

be lost into other sectors, which can cause inter-
sector interference as well as power loss
1184
592
Cell sizes: Multiple layers
Global
Satellite
Suburban Urban
In-Building
Picocell
Microcell
Macrocell
Basic Phone
Smart Phone
Laaptop
1185
Cell breathing: CDMA networks
1186
593
Capacity planning: Multi-cell issues,

coverage-capacity-quality tradeoffs
Coverage and Range
Required site-to-site distance
in [m]
Capacity:
kbps/cell/MHz for data
Quality
Service dependent
Delay and packet loss rate

important for data services
Interference due to spectrum

reuse in nearby cells.
1187
Handover (Handoff)
Handover :
Cellular system tracks mobile stations in order to maintain their
communication links.
When mobile station goes to neighbor cell, communication link switches
from current cell to the neighbor cell.
Hard handover :
In FDMA or TDMA cellular system, new communication establishes after
breaking current communication at the moment doing handover.
Communication between MS and BS breaks at the moment switching
frequency or time slot.
switching
Cell B Cell A
Hard handover : connect (new cell B) after break (old cell A)
594
Soft handover
Soft handover :
In CDMA cellular system, communication does not break even at the
moment doing handover, because switching frequency or time slot is
not required.
transmitting same signal from both BS A

and BS B simultaneously to the MS
Cell B
Cell A
Soft handover: break (old cell A) after connect (new cell B)
Mobility/Handover in umbrella cells
Avoids multiple handoffs.

1190
595
Overlay wireless networks: Mobility &

Handover
1191
Duplexing methods for radio links
Base Station
Forward link
Reverse link
Mobile Station
1192
596
Frequency Division Duplex (FDD)
Forward link frequency and reverse link frequency is different

In each link, signals are continuously transmitted in parallel.
Forward link (F1)

Reverse link (F2) Base Station
Mobile Station
1193
Example of FDD systems
Mobile Station Base Station
Transmitter BPF BPF Transmitter

F1 F2
Receiver BPF BPF Receiver

F2 F1
BPF: Band Pass Filter
1194
597
Time Division Duplex (TDD)
Forward link frequency and reverse link frequency is the same.

In each link, signals take turns just like a ping-pong game.
Forward link (F1)
Reverse link (F1)

Base Station
Mobile Station
1195
Example of TDD Systems
Mobile Station Base Station
Transmitter Transmitter
BPF BPF
Receiver F1 F1 Receiver
Synchronous Switches
BPF: Band Pass Filter
1196
598
Multiplexing: Outline
Single link:
Channel partitioning (TDM, FDM, WDM)
vs. Packets/Queuing/Scheduling
Series of links:
Circuit switching vs. packet switching
Statistical Multiplexing (leverage randomness)
Multiplexing gain
Distributed multiplexing (MAC protocols)
Channel partitioning: TDMA, FDMA, CDMA
Randomized protocols: Aloha, Ethernet (CSMA/CD)
Taking turns: distributed round-robin: polling, tokens
1197
Multiplexing: TDM
1198
599
Multi-carrier: FDM and OFDM

Ch.1 Ch.2 Ch.3 Ch.4 Ch.5 Ch.6 Ch.7 Ch.8 Ch.9 Ch.10
Conventional multicarrier techniques frequency
Ch.2 Ch.4 Ch.6 Ch.8 Ch.10

Ch.1 Ch.3 Ch.5 Ch.7 Ch.9
Saving of bandwidth
50% bandwidth saving
Orthogonal multicarrier techniques frequency
Actually these are sinc-pulses in frequency domain.

Symbols are longer duration
1199in time-domain, and can
eliminate ISI caused by dispersion due to multipaths
Multipath propagation & ISI

Time dispersive channel
Reflections from walls, etc.
Impulse response:
p ( )
[ns]
Problem with high rate data
transmission:
multipath delay spread is of the
order of symbol time
inter-symbol-interference (ISI)
1200
600
Inter-Symbol-Interference (ISI) due to

Multipath fading
Transmitted signal:
Received Signals:
Line-of-sight:
Reflected:
The symbols add up on the

channel Delays
Distortion!
1201
OFDM: Parallel Tx on narrow bands

Channel
Channel impulse
transfer function
response Time
Frequency
(Freq.selective fading)
1 Channel (serial) Signal is

Frequency
broadband
2 Channels Frequency
8 Channels
Frequency
Channels are
narrowband
(flat fading, ISI)
1202
601
MIMO: Spatial diversity, spatial multiplexing

with multiple antennas
Example: Simple selection diversity (Rx only), Diversity Gains.. 1203
SISO, MISO, SIMO, MIMO, SDMA
SISO
Single Input,
Single Output
MISO
Multiple Input,
Single Output
SIMO
Single Input,
Multiple Output
MIMO
Multiple Input,
SDMA Multiple Output
1204
602
Adaptive antenna gains (Tx or Rx)

Diversity
differently fading paths
fading margin reduction
no gain when noise-
limited
Coherent Gain
energy focusing
improved link budget
reduced radiation
Interference Mitigation
energy reduction
enhanced capacity
improved link budget
Enhanced Rate/Throughput
co-channel streams
increased capacity
increased data rate
1205
Multiple Access Control (MAC)
Base Station
Forward link
Reverse link
Mobile Station
Mobile Station
Mobile Station Mobile Station
1206
603
MAC protocols: a taxonomy

Channel Partitioning: TDMA, FDMA
divide channel into pieces (time slots, frequency)
allocate piece to node for exclusive use
Random Access: Aloha, Ethernet CSMA/CD, WiFi CSMA/CA

allow collisions
recover from collisions
Wireless: inefficiencies arise from hidden terminal problem, residual
interference
Cannot support large numbers of users and at high loads
Taking turns: Token ring, distributed round-robin, CDMA, polling

Coordinate shared access using turns to avoid collisions.
Achieve statistical multiplexing gain & large user base, but complexity
CDMA can be loosely classified here (orthogonal code = token)
OFDMA with scheduling also in this category 1207
MAC protocols: Efficiency

Channel partitioning MAC protocols:
share channel efficiently at high load
inefficient at low load: delay in channel access, 1/N
bandwidth allocated even if only 1 active node!
Random access MAC protocols

efficient at low load: single node can fully utilize channel
high load: collision overhead
Taking turns protocols

look for best of both worlds!
1208
604
Channel partitioning
MAC protocols
TDMA: time division multiple access
Access to channel in "rounds"
Each station gets fixed length slot (length = pkt
trans time) in each round
Unused slots go idle
Example: 6-station LAN, 1,3,4 have pkt, slots
2,5,6 idle
Does not statistical multiplexing gains here
1209
TDMA overview
A
B f0
C B A C B A C B A C B A
C
Time
1210
605
FDMA overview
C C
f2
B B f1
A A f0
Time
Need substantial guard bands: inefficient

1211
CDMA
spread spectrum
Base-band Spectrum Radio Spectrum
Code B
Code A
B
B
Code A A
A
B C C
B B C
A A A B
A C
B
Time
Sender Receiver
1212
606
OFDMA
OFDMA: a mix of FDMA/TDMA: (OFDM modulation)
Sub channels are allocated in the frequency domain,
OFDM symbols allocated in the time domain.
Dynamic scheduling leverages statistical multiplexing gains,
and allows adaptive modulation/coding/power control, user
diversity
t TD M A
T D M A \O F D M A
m
N
1213
Summary of multiple access
FDMA
power
TDMA
power
CDMA
power
1214
607
Wireless is hot, but note

The many advantages of wireless are evident to all
anywhere, anytime, unwired access to the global phone
network or wireless Internet via a highly portable
lightweight device.
But if you are not mobile user, it is often more efficient to go

wired (especially optical)
Nearly interference free
If you need more bandwidth: just add a bunch of fibers
As fiber is much cheaper than digging and resurfacing
streets, put in more fiber than you would ever need (dark
fiber)
Often only the last mile is wireless
1215
608

Telecommunications Engineering II by Jorma Kekalainen

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Telecommunications Engineering II by Jorma Kekalainen

Uploaded by

Copyright:

Available Formats

Lecture notes Telecommunications Engineering II by Jorma Kekalainen

2. You can look for material from corresponding

Communication, message and signal

Basic model for telecommunication

Telecommunication and EM spectrum

Analogue vs. digital communication

Note: Latency (one-way) is the time from

Example: Performance of coding

Digital communication system: Structure

Formatting and transmission of baseband

Digital transmission system

Signals of communication system

Signals and Systems

Deterministic signals and

Continuous-time vs. discrete-time

Continuous-valued vs. discrete-valued

Deterministic vs. random (stochastic)

Causal vs. anticausal vs. non-causal

Even and odd signals

Some trigonometric identities

Complex exponential vs. real sinusoids

Complex exponential vs. real sinusoids

Signal power and energy

Note: For analogue impulse

Step response and masking

Basic operations on signals

System (e.g. electric network) is specified by:

Example: Basic blocks

Example: Basic blocks

Basic system concepts

Memoryless components and systems

Voltage divider: Discrete gain block:

Systems and components with

Capacitor: Inductor: Unit delay:

Basic system concepts

Linear, time-invariant systems

Linearity and superposition

Discrete signals and systems

Response of LTI system

Causal LTI system

Causal LTI system

Time-invariance and causality

Cascade LTI systems

Parallel LTI systems

Parallel: e.g. multiple antennas Feedback: e.g. control systems (phase-

Properties of LTI systems

System models: Impulse response

System models: Frequency response

System models: Differential equation

System models: Difference equation

Eigenfunctions of LTI system

Complex exponential input

Frequency function = eigenvalue

Note: Of course, actually H(f0) is the value of the corresponding function

Sinusoids are eigenfunctions of LTI

Fourier transform pair

Frequency content of sinusoid

Frequency content of sinusoid

Frequency content of sinusoid

Positive and negative frequency

Frequency content of the rectangular

Frequency content of the rectangular

Frequency content of the rectangular

Frequency content of the rectangular