Modelling A 4g Lte System in Matlab

Modeling a 4G LTE System in MATLAB
Idin Motedayen-Aval
Senior Applications Engineer
MathWorks
Idin.motedayen-aval@mathworks.com
© 2012 The MathWorks, Inc.1

Agenda
 4G LTE and LTE Advanced

– True Global standard
– True Broadband mobile communications
– How it was achieved?
– What are the challenges?
 MATLAB and communications system design
– Modeling and simulation
– Simulation acceleration
– Path to implementation
 Case study:
– Physical layer modeling of an LTE system in MATLAB
 Summary
2
Part 1:
Modeling & simulation
© 2012 The MathWorks, Inc.3

4G LTE and LTE Advanced
4
Distinguishing Features
 Motivation
– Very high capacity & throughput
– Support for video streaming, web browsing, VoIP, mobile apps
 A true global standard
– Contributions from all across globe
– Deployed in AMER, EMEA, APLA
5
Distinguishing Features
 A true broadband mobile standard
– From 2 Mbps (UMTS)
– To 100 Mbps (LTE)
– To 1 Gbps (LTE Advanced)
6
How is this remarkable advance possible?
 Integration of enabling technologies with sophisticated

mathematical algorithms
– OFDM
– MIMO (multiple antennas)
– Turbo Coding
 Smart usage of resources and bandwidth

– Adaptive modulation
– Adaptive coding
– Adaptive MIMO
– Adaptive bandwidth
7
What MATLAB users care about LTE?
 Academics
– Researchers contributing to future standards
– Professors
– Students
 Practitioners
– System Engineers
– Software designers
– Implementers of wireless systems
 Challenge in interaction and cooperation between these
two groups
 MATLAB is their common language
8
Challenges:
From specification to implementation
 Simplify translation from

specification to a model as
blue-print for implementation
 Introduce innovative
proprietary algorithms
 System-level performance
evaluation
 Accelerate large scale
simulations
 Address gaps in the
implementation workflow
9
Where does MATLAB fit in addressing these
challenges?
 MATLAB and Communications
System Toolbox are ideal for LTE
algorithm and system design
 MATLAB and Simulink provide an
environment for dynamic & large
scale simulations
 Accelerate simulation with a
variety of options in MATLAB
 Connect system design to
implementation with
– C and HDL code generation
– Hardware-in-the-loop verification
10
Communications System Toolbox
Over 100 algorithms for
– Modulation, Interleaving, Channels, Source Coding
– Error Coding and Correction
– MIMO, Equalizers, Synchronization
– Sources and Sinks, SDR hardware
Algorithm libraries in MATLAB Algorithm libraries in Simulink
11
Existing demos of Work in-progress

Communications System toolbox
12
Case Study:
Downlink physical layer of LTE (Release 10)
codewords layers antenna ports
Modulation Resource element OFDM signal

Scrambling mapper
mapper generation
Layer
Precoding
mapper
Scrambling mapper
mapper generation
© 2012 The MathWorks, Inc.

13
What we will do today
1. Start modeling in MATLAB

2. Iteratively incorporate components such as Turbo Coding,
MIMO, OFDM and Link Adaptation
3. Put together a full system model with MATLAB & Simulink
4. Accelerate simulation speed in MATLAB
5. Path to implementation
14
Modeling and Simulation

15
LTE Physical layer model in standard
 Downlink Shared Channel

– Transport channel
Turbo Channel
Coding – Physical channel
MIMO OFDMA
Adaptation
of everything
Reference: 3GPP TS 36 211 v10 (2010-12)
16
LTE Physical layer model in MATLAB
Turbo Channel
Coding
MIMO OFDMA

Scrambling mapper
mapper generation
Layer
Precoding
mapper
Adaptation
of everything Scrambling mapper
mapper generation
17
>> Open LTE system model
Turbo Channel
Coding
MIMO OFDMA

Scrambling mapper
mapper generation
Layer
Precoding
mapper
Adaptation
of everything Scrambling mapper
mapper generation
18
Overview of Turbo Coding
 Error correction & coding

technology of LTE
 Performance: Approach
channel capacity (Shannon
bound)
 Evolution of convolutional
coding
 Based on an iterative
decoding scheme
19
MATLAB Demo
=comm.ConvolutionalEncoder(
= comm.ViterbiDecoder(…
„InputFormat‟, „Hard‟,…
20
MATLAB Demo
= comm.QPSKDemodulator(
„DecisionMethod‟,‟Log-Likelihood ratio‟
=comm.ViterbiDecoder…
„InputFormat‟, „Soft‟,…
21
MATLAB Demo
= comm.TurboEncoder
= comm.QPSKDemodulator(
„DecisionMethod‟,‟Log-Likelihood ratio‟
= comm.TurboDecoder(…
„NumIterations‟, 6,…
22
DL-SCH transport channel processing
a0 , a1 ,..., a A 1
Transport block
CRC attachment
b0 , b1 ,..., bB 1
Code block segmentation

Code block CRC attachment
cr 0 , cr1 ,..., cr K r 1
Channel coding
d r(i0) , d r(1i ) ,..., d r(i)D

r 1
Rate matching
er 0 , er1 ,..., er Er 1
Code block
concatenation
f 0 , f1 ,..., f G 1
3GPP TS 36.212 v10.0.0

>>commpccc
23
Downlink Shared Channel - transport channel
a0 , a1 ,..., a A 1
Transport block
CRC attachment
Turbo Channel
b0 , b1 ,..., bB 1 Coding
MATLAB Function block
Code block segmentation
Code block CRC attachment
cr 0 , cr1 ,..., cr K r 1
Channel coding
d r(i0) , d r(1i ) ,..., d r(i)D

r 1
Rate matching
er 0 , er1 ,..., er Er 1
Code block
concatenation
f 0 , f1 ,..., f G 1
24
OFDM Overview
 Orthogonal Frequency Division Multiplexing

– Multicarrier modulation scheme (FFT-based)
 Split available spectrum into uniform intervals called sub-carriers
– Transmit data independently at each sub-carrier
 Most important feature
– Robust against multi-path fading
– Using low-complexity frequency-domain equalizers
25
OFDM & Multi-path Fading
 Multi-path propagation leads to frequency selective fading

 Frequency-domain equalization is less complex and perfectly
matches OFDM
 We need to know channel response at each sub-carrier – pilots
H(ω)
Frequency-selective fading
*ℎ2 , 𝑑2 }
G(ω)H(ω)
Multi-path
1
*ℎ0 , 𝑑0 }
ω
𝑁
*ℎ1 , 𝑑1 } y(n) = 𝑛=0 ℎ𝑛 x( 𝑛 − 𝑑𝑛 )
𝑌(𝜔) = 𝐻 𝜔 𝑋(𝜔) ω
Frequency-domain equalization If G(ω𝑘 ) ≈ 𝐻 −1 (ω𝑘 ) G(ω𝑘 ) Y(ω𝑘 ) ≈ 𝑋(ω𝑘 )
26
How Does LTE Implement OFDM?
…
Frequency
Nmax=2048
Interpolate
vertically
(Frequency)
Resource
element
Pilots
Resource
block
Interpolate
horizontally (Time)
Resource
grid
…
0.5 1 1.5 2
Time (msec)
27
How to Implement LTE OFDM in MATLAB
Depending on
Channel Bandwidth
switch
prmLTEPDSCH.Nrb = 6*numRb + mod(v+vsh, 6) + 1;
Transmitter:
Case 25, N=512; Place pilots in regular intervals
Set Frequency-domain
FFT size
Case 100, N=2048;
= mean([hp(:,1,1,n) hp(:,3,1,n)])
= mean([hp(k,2,1,:) hp(k,4,1,:)])
X=ifft(tmp,N,1); Receiver:
Estimate channel by
Create OFDM signal interpolating in time &
frequency
28
MIMO Overview
 Multiple Input Multiple Output

 Using multiple transmit and receive Y = H*X + n
antennas
h1,1
h2,1
H= h1,2
ℎ11 ℎ12 ℎ13 ℎ14
X Y
ℎ21 ℎ22 ℎ23 ℎ24
ℎ31 ℎ32 ℎ33 ℎ34
ℎ41 ℎ42 ℎ43 ℎ44
h4,4
Channel
Multiple Input Multiple Output
29
Where is MIMO being used?
 Several wireless standards

– 802.11n: MIMO extensions of WiFi as of 2008
– 802.16e: As of 2005 in WiMax Standard
– 3G Cellular: 3GPP Release 6 specifies transmit diversity mode
– 4G LTE
 Two main types of MIMO
– Spatial multiplexing
– Space-Time Block Coding (STBC)
30
Space-Time Block Codes (STBC)
 STBCs insert redundant data at transmitter

 Improves the BER performance
 Alamouti code (2 Tx, 2 Rx) is one of simplest
examples of orthogonal STBCs
31
Spatial Multiplexing
 MIMO technique used in LTE standard
 Divide the data stream into independent sub-streams

and use multiple transmit antennas
 MIMO is one of the main reasons for boost in data rates

– More transmit antennas leads to higher capacity
 MIMO Receiver essentially solves this system of linear

equations
𝑌 = 𝐻𝑋 + 𝑛
32
MIMO-OFDM overview Y=𝐻∗𝑋+𝑛
What if 2 rows are

linearly dependent?
𝐻=
ℎ1 ℎ2 ℎ3 ℎ4
ℎ1 ℎ2 ℎ3 ℎ4
ℎ31 ℎ32 ℎ33 ℎ34
X ℎ41 ℎ42 ℎ43 ℎ44
Y
Dimension =4; Rank =3;
H = singular (not invertible)
𝐻 = 𝑈𝐷𝑉 𝐻
subcarrier 𝑤𝑘 Singular Value Decomposition
To avoid singularity:
1. Precode input with pre-selected V
Time 𝑡𝑛 2. Transmit over antennas based on Rank
33
Adaptive MIMO:
Closed-loop Pre-coding and Layer Mapping
Base station
Layer
Precoding(*) OFDM
Mapping
Channel Rank
Estimation
Rank Indicator
Precoder Matrix
Precoder Matrix Indicator Estimation mobile
34
Adaptive MIMO in MATLAB
 In Receiver:
 Detect V = Rank of the H Matrix V=
 = Number of layers prmLTEPDSCH.numLayers;
Switch V
 In Transmitter: (next frame)

 Based on number of layers
 Fill up transmit antennas with case 4
out =
available rank complex(zeros(inLen1/2, v));
out(:,1:2) = reshape(in1, 2,
inLen1/2).';
out(:,3:4) = reshape(in2, 2,
inLen2/2).';
35
Cell-Specific Reference Signal Mapping
• Null transmissions allow for separable channel estimation at Rx

• Use clustering or interpolation for RE channel estimation
Figure 6.10.1.2-1, 3GPP TS 36.211 v10.0.0

R0 R0
One antenna port
R0 R0 4 CSR, 0 nulls /slot /RB

=> 8 CSR/160 RE
R0 R0
R0 R0
l0 l6 l0 l6
Resource element (k,l)
R0 R0 R1 R1
Two antenna ports
R0 R0 R1 R1
Not used for transmission on this antenna port
4 CSR, 4 nulls /slot /RB
=> 8 CSR /152 RE
R0 R0 R1 R1
Reference symbols on this antenna port
R0 R0 R1 R1
l0 l6 l0 l6 l0 l6 l0 l6
R0 R0 R1 R1 R2 R3
Varies per antenna port:
Four antenna ports
R0 R0 R1 R1 R2 R3 4 CSR, 8 nulls /slot /RB

=> 8 CSR / 144 RE
R0 R0 R1 R1 R2 R3
R0 R0 R1 R1 R2 R3 2 CSR, 10 nulls /slot /RB

l0 l6 l0 l6 l0 l6 l0 l6 l0 l6 l0 l6 l0 l6 l0 l6
=> 4 CSR / 144 RE
even-numbered slots odd-numbered slots even-numbered slots odd-numbered slots even-numbered slots odd-numbered slots even-numbered slots odd-numbered slots
Antenna port 0 Antenna port 1 Antenna port 2 Antenna port 3 36

Link Adaptation Overview
 Examples of link adaptations
– Adaptive modulation
 QPSK, 16QAM, 64QAM
– Adaptive coding
 Coding rates from (1/13) to (12/13)
– Adaptive MIMO
 2x1, 2x2, …,4x2,…, 4x4, 8x8
– Adaptive bandwidth
 Up to 100 MHz (LTE-A)
37
Turbo Channel
Coding
MIMO OFDMA
Adaptation
of everything
38
Put it all together …
39
References
• Standard documents – 3GPP link
• 3GPP TS 36.{201, 211, 212, 213, 101, 104, 141}, Release 10.0.0, (2010-12).
• 3GPP TS 36.{201, 211, 212, 213, 101, 104}, Release 9.0.0, (2009-12).
• Books
• Dahlman, E.; Parkvall, S.; Skold, J., “4G LTE/LTE-Advanced for Mobile Broadband”, Elsevier,
2011.
• Agilent Technologies, “LTE and the evolution to 4G Wireless: Design and measurement
challenges”, Agilent, 2009.
• Selected papers
• Min Zhang; Shafi, M.; Smith, P.J.; Dmochowski, P.A., “Precoding Performance with Codebook
Feedback in a MIMO-OFDM System”, Communications (ICC), 2011 IEEE International
Conference on, pp. 1-6.
• Simonsson, A.; Qian, Y.; Ostergaard, J., “LTE Downlink 2X2 MIMO with Realistic CSI: Overview
and Performance Evaluation”, 2010 IEEE WCNC, pp. 1-6.
40
Summary
 MATLAB is the ideal language for LTE

modeling and simulation
 Communications System Toolbox extend
breadth of MATLAB modeling tools
 You can accelerate simulation with a
– Parallel computing, GPU processing,
MATLAB to C
 Address implementation workflow gaps
with
– Automatic MATLAB to C/C++ and HDL
code generation
41
Part 2:
Simulation acceleration

42
Why simulation acceleration?
 From algorithm exploration to system design

– Size and complexity of models increases
– Time needed for a single simulation increases
– Number of test cases increases
– Test cases become larger
 Need to reduce simulation time during design
 Need to reduce time for large scale testing during

prototyping and verification
43
MATLAB is quite fast
 Optimized and widely-used libraries

– BLAS: Basic Linear Algebra Subroutines (multithreaded)
– LAPACK: Linear Algebra Package
 JIT (Just In Time) Acceleration

– On-the-fly multithreaded code generation for increased speed
 Built-in support for vector and matrix operations
 Parallel computing support to utilize additional cores

– Parallel Computing Toolbox
– MATLAB Distributed Computing Server
– GPU support
44
Simulation acceleration options in MATLAB
System
Objects
>> Demo
Parallel commacceleration
User’s Code Computing
MATLAB to C
GPU
processing
45
Parallel Simulation Runs
Worker
TOOLBOXES
Worker Worker
BLOCKSETS
Worker
Task 1 Task 2 Task 3 Task 4
Time Time
>> Demo 46
Summary
 matlabpool  available workers

 No modification of algorithm
 Use parfor loop instead of for loop
 Parallel computation or simulation
leads to further acceleration
 More cores = more speed
47
Simulation acceleration options in MATLAB
System
Objects
Parallel
User’s Code Computing
MATLAB to C
GPU
processing
48
What is a Graphics Processing Unit (GPU)
 Originally for graphics acceleration, now also used for

scientific calculations
 Massively parallel array of integer and

floating point processors
– Typically hundreds of processors per card
– GPU cores complement CPU cores
 Dedicated high-speed memory
49
Why would you want to use a GPU?
 Speed up execution of computationally intensive

simulations
 For example:
– Performance: A\b with Double Precision
50
Options for Targeting GPUs
1) Use GPU with MATLAB built-in functions
Greater Control
Ease of Use
2) Execute MATLAB functions elementwise

on the GPU
3) Create kernels from existing CUDA code

and PTX files
51
Data Transfer between MATLAB and GPU
% Push data from CPU to GPU memory

Agpu = gpuArray(A)
% Bring results from GPU memory back to CPU

B = gather(Bgpu)
52
GPU Processing with
Communications System Toolbox
GPU System objects
 Alternative implementation
comm.gpu.TurboDecoder
for many System objects
comm.gpu.ViterbiDecoder
take advantage of GPU
comm.gpu.LDPCDecoder
processing
comm.gpu.PSKDemodulator
 Use Parallel Computing comm.gpu.AWGNChannel
Toolbox to execute many
communications algorithms
directly on the GPU
 Easy-to-use syntax
 Dramatically accelerate
simulations
53
Example: Turbo Coding
 Impressive coding gain
 High computational complexity
 Bit-error rate performance as a function of number of
iterations
„NumIterations‟, numIter,…
54
Acceleration with GPU System objects
Version Elapsed time Acceleration
CPU 8 hours 1.0
1 GPU 40 minutes 12.0
Cluster of 4 11 minutes 43.0

GPUs
 Same numerical results
=
comm.gpu.TurboDecoder(…
„NumIterations‟,
„NumIterations‟, N,…
N,…
=
= comm.AWGNChannel(…
comm.gpu.AWGNChannel(…
55
Key Operations in Turbo Coding Function
CPU GPU Version 1
% Turbo Encoder % Turbo Encoder

hTEnc = comm.TurboEncoder('TrellisStructure',poly2trellis(4, [13 15], 13),.. hTEnc = comm.TurboEncoder('TrellisStructure',poly2trellis(4, [13 15], 13),..
'InterleaverIndices', intrlvrIndices) 'InterleaverIndices', intrlvrIndices)
% AWG Noise % AWG Noise
hAWGN = comm.AWGNChannel('NoiseMethod', 'Variance'); hAWGN = comm.AWGNChannel('NoiseMethod', 'Variance');
% BER measurement % BER measurement
hBER = comm.ErrorRate; hBER = comm.ErrorRate;
% Turbo Decoder % Turbo Decoder
hTDec = comm.TurboDecoder(… hTDec = comm.gpu.TurboDecoder(…
'TrellisStructure',poly2trellis(4, [13 15], 13),... 'TrellisStructure',poly2trellis(4, [13 15], 13),...
'InterleaverIndices', intrlvrIndices,'NumIterations', numIter); 'InterleaverIndices', intrlvrIndices,'NumIterations', numIter);
ber = zeros(3,1); %initialize BER output ber = zeros(3,1); %initialize BER output
%% Processing loop %% Processing loop
while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits)
data = randn(blkLength, 1)>0.5; data = randn(blkLength, 1)>0.5;
% Encode random data bits % Encode random data bits
yEnc = step(hTEnc, data); yEnc = step(hTEnc, data);
%Modulate, Add noise to real bipolar data %Modulate, Add noise to real bipolar data
modout = 1-2*yEnc; modout = 1-2*yEnc;
rData = step(hAWGN, modout); rData = step(hAWGN, modout);
% Convert to log-likelihood ratios for decoding % Convert to log-likelihood ratios for decoding
llrData = (-2/noiseVar).*rData; llrData = (-2/noiseVar).*rData;
% Turbo Decode % Turbo Decode
decData = step(hTDec, llrData); decData = step(hTDec, llrData);
% Calculate errors % Calculate errors
ber = step(hBER, data, decData); ber = step(hBER, data, decData);
end end
56
Profile results in Turbo Coding Function
CPU GPU Version 1

<0.01 hTEnc = comm.TurboEncoder('TrellisStructure',poly2trellis(4, [13 15], 13),.. <0.01 hTEnc = comm.TurboEncoder('TrellisStructure',poly2trellis(4, [13 15], 13),..
<0.01 hAWGN = comm.AWGNChannel('NoiseMethod', 'Variance'); <0.01 hAWGN = comm.AWGNChannel('NoiseMethod', 'Variance');
<0.01 hBER = comm.ErrorRate; <0.01 hBER = comm.ErrorRate;
% Turbo Decoder % Turbo Decoder
<0.01 hTDec = comm.TurboDecoder(… 0.02 hTDec = comm.gpu.TurboDecoder(…
'TrellisStructure',poly2trellis(4, [13 15], 13),... 'TrellisStructure',poly2trellis(4, [13 15], 13),...
'InterleaverIndices', intrlvrIndices,'NumIterations', numIter); 'InterleaverIndices', intrlvrIndices,'NumIterations', numIter);
<0.01 ber = zeros(3,1); %initialize BER output <0.01 ber = zeros(3,1); %initialize BER output
0.30 data = randn(blkLength, 1)>0.5; 0.28 data = randn(blkLength, 1)>0.5;
2.33 yEnc = step(hTEnc, data); 2.38 yEnc = step(hTEnc, data);
0.05 modout = 1-2*yEnc; 0.05 modout = 1-2*yEnc;
1.50 rData = step(hAWGN, modout); 1.45 rData = step(hAWGN, modout);
0.03 llrData = (-2/noiseVar).*rData; 0.04 llrData = (-2/noiseVar).*rData;
330.54 decData = step(hTDec, llrData); 98.18 decData = step(hTDec, llrData);
0.17 ber = step(hBER, data, decData); 0.17 ber = step(hBER, data, decData);
end end
57
Key Operations in Turbo Coding Function
CPU GPU Version 2

hTEnc = comm.TurboEncoder('TrellisStructure',poly2trellis(4, [13 15], 13),.. hTEnc = comm.TurboEncoder('TrellisStructure',poly2trellis(4, [13 15], 13),..
hAWGN = comm.AWGNChannel('NoiseMethod', 'Variance'); hAWGN = comm.gpu.AWGNChannel ('NoiseMethod', 'Variance');
hBER = comm.ErrorRate; hBER = comm.ErrorRate;
% Turbo Decoder % Turbo Decoder - setup for Multi-frame or Multi-user processing
hTDec = comm.TurboDecoder('TrellisStructure',poly2trellis(4, [13 15], 13),... numFrames = 30;
'InterleaverIndices', intrlvrIndices,'NumIterations', numIter); hTDec = comm.gpu.TurboDecoder('TrellisStructure',poly2trellis(4, [13 15], 13),...
'InterleaverIndices', intrlvrIndices,'NumIterations',numIter,…
’NumFrames’,numFrames);
data = randn(blkLength, 1)>0.5; data = randn(numFrames*blkLength, 1)>0.5;
yEnc = step(hTEnc, data); yEnc = gpuArray(multiframeStep(hTEnc, data, numFrames));
modout = 1-2*yEnc; modout = 1-2*yEnc;
rData = step(hAWGN, modout); rData = step(hAWGN, modout);
llrData = (-2/noiseVar).*rData; llrData = (-2/noiseVar).*rData;
decData = step(hTDec, llrData); decData = step(hTDec, llrData);
ber = step(hBER, data, decData); ber=step(hBER, data, gather(decData));
end end
58
Profile results in Turbo Coding Function
CPU GPU Version 2

<0.01 hTEnc = comm.TurboEncoder('TrellisStructure',poly2trellis(4, [13 15], 13),.. <0.01 hTEnc = comm.TurboEncoder('TrellisStructure',poly2trellis(4, [13 15], 13),..
<0.01 hAWGN = comm.AWGNChannel('NoiseMethod', 'Variance'); 0.03 hAWGN = comm.gpu.AWGNChannel ('NoiseMethod', 'Variance');
<0.01 hBER = comm.ErrorRate; <0.01 hBER = comm.ErrorRate;
% Turbo Decoder % Turbo Decoder - setup for Multi-frame or Multi-user processing
<0.01 hTDec = comm.TurboDecoder(… 0.01 numFrames = 30;
'TrellisStructure',poly2trellis(4, [13 15], 13),... 0.01 hTDec = comm.gpu.TurboDecoder('TrellisStructure',…
'InterleaverIndices', intrlvrIndices,'NumIterations', numIter); poly2trellis(4, [13 15], 13),'InterleaverIndices', intrlvrIndices,
'NumIterations',numIter, ’NumFrames’,numFrames);
0.30 data = randn(blkLength, 1)>0.5; 0.22 data = randn(numFrames*blkLength, 1)>0.5;
2.33 yEnc = step(hTEnc, data); 2.45 yEnc = gpuArray(multiframeStep(hTEnc, data, numFrames));
0.05 modout = 1-2*yEnc; 0.02 modout = 1-2*yEnc;
1.50 rData = step(hAWGN, modout); 0.31 rData = step(hAWGN, modout);
0.03 llrData = (-2/noiseVar).*rData; 0.01 llrData = (-2/noiseVar).*rData;
330.54 decData = step(hTDec, llrData); 20.89 decData = step(hTDec, llrData);
0.17 ber = step(hBER, data, decData); 0.09 ber=step(hBER, data, gather(decData));
end end
59
Things to note when targeting GPU
 Minimize data transfer between CPU and GPU.
 Using GPU only makes sense if data size is large.
 Some functions in MATLAB are optimized and can be

faster than the GPU equivalent (eg. FFT).
 Use arrayfun to explicitly specify elementwise

operations.
60
Acceleration Strategies Applied in MATLAB
Technology /
Option Product
1. Best Practices in Programming
MATLAB, Toolboxes,
• Vectorization & pre-allocation
System Toolboxes
• Environment tools. (i.e. Profiler, Code Analyzer)
2. Better Algorithms
• Ideal environment for algorithm exploration MATLAB, Toolboxes,
• Rich set of functionality (e.g. System objects) System Toolboxes
3. More Processors or Cores Parallel Computing

• High level parallel constructs (e.g. parfor, matlabpool) Toolbox,
• Utilize cluster, clouds, and grids MATLAB Distributed
Computing Server
4. Refactoring the Implementation MATLAB, MATLAB Coder,
• Compiled code (MEX) Parallel Computing
• GPUs, FPGA-in-the-Loop Toolbox
61
Summary
 MATLAB is the ideal language for LTE

MATLAB to C
with
code generation
62
Part 3: Path to implementation (C and HDL)

63
LTE Downlink processing
Advanced
channel coding
MIMO OFDM
Adapt
Everything
64
Why Engineers translate MATLAB to C today?
Integrate MATLAB algorithms w/ existing C environment

using source code or static libraries
Prototype MATLAB algorithms on desktops as

standalone executables
Accelerate user-written MATLAB algorithms
Implement C/C++ code on processors or hand-off to

software engineers
65
Automatic Translation of MATLAB to C
e t
ra
i te
Algorithm Design and
Code Generation in
verify
MATLAB a cce /
lerate
With MATLAB Coder, design engineers can
• Maintain one design in MATLAB

• Design faster and get to C/C++ quickly
• Test more systematically and frequently
• Spend more time improving algorithms in MATLAB
66
MATLAB Language Support for
Code Generation
visualization
variable-sized data
Java
struct functions
malloc
sparse numeric
global
System objects
nested functions complex arrays
fixed-point
classes persistent
graphics
67
Supported MATLAB Language
Features and Functions
 Broad set of language features and functions/system
objects supported for code generation.
Matrices and Programming

Data Types Functions
Arrays Constructs
• Matrix operations • Complex numbers • Arithmetic, relational, • MATLAB functions and sub-functions
• N-dimensional arrays • Integer math and • Variable length argument lists
• Subscripting • Double/single-precision logical operators • Function handles
• Frames • Fixed-point arithmetic • Program control
• Persistent variables • Characters (if, for, while, switch ) Supported algorithms
• Global variables • Structures • > 400 MATLAB operators and
• Numeric classes functions
• Variable-sized data • > 200 System objects for
• System objects • Signal processing
• Communications
• Computer vision
68
Path to implementation
 Bring it all to Simulink
 Elaborate your design

– Model System-level inaccuracies
– Compensate
– Add fixed-point
– Generate HDL
69
MATLAB to Hardware

70
Why do we use FPGAs?
We are going
 Customized interfaces to Memory to focus on
Memory
peripherals Memory
this use case
today
 High-speed communication Bridge
interfaces to other
processors
Analog I/O
FPGA ARM
 Finite state machines, Digital I/O
digital logic, timing and
memory control
DSP
 High speed, highly parallel DSP
Algorithms
DSP Algorithms
71
Separate Views of DSP Implementation
System Designer FPGA Designer
Algorithm Design System Test Bench RTL Design RTL Verification
Fixed-Point Environment Models IP Interfaces Behavioral Simulation
Timing and Control Logic Analog Models Hardware Architecture Functional Simulation
Architecture Exploration Digital Models Static Timing Analysis
Algorithms / IP Algorithms / IP Timing Simulation
Back Annotation
Implement Design
FPGA Requirements Synthesis
Map
Hardware Specification
Test Stimulus
Place & Route Hardware
72
What we will learn today
MATLAB to Hardware
 Convert design to fixed-point

– Use Fixed-point Tool
– Use NumericTypeScope in MATLAB
– Verify against floating-point design
 Serialize design
 Implementation using HDL Coder
– Verify through software and/or hardware co-simulation
73
OFDM Transmitter
74
MATLAB to Hardware
ifft(x, 2048, 1)
 Issue #1
– „x‟ is 2048x4 matrix
– MATLAB does 2048-pt FFT along first dimension
– Output is also 2048x4
– Cannot process samples this way in hardware!
– Serialize design
 Issue #2
– MATLAB does double-precision floating-point arithmetic
– Floating point is expensive in hardware (power and area)
– Convert to fixed-point
75
HDL Workflow
 Floating Point Model System

– Satisfies System Requirements Requirements
 Executable Specification
– MATLAB and/or Simulink Model
Continuous Verification
Floating Point
 Model Elaboration Model
– Develop Hardware Friendly Architecture in Simulink
– Convert to Fixed-Point
 Determine Word Length Model
 Determine Binary Point Location Elaboration
 Implement Design
– Generate HDL code using Simulink HDL Coder Implement
– Import Custom and Vendor IP Code Generation
 Verification
– Software co-simulation with HDL simulator
Verification
– Hardware co-simulation
78
Divide and conquer
Save simulation data to use in development
79
MATLAB-based FFT
 Can be done
 Resources
– 120 LUTs (1%)
– 32 Slices (1%)
– 0 DSP48s
 Need 2048-point FFT

 32-point FFT chokes
synthesis tool!
80
Re-implement using Simulink blocks
compare against original code
81
Convert to fixed-point
compare against original code
82
What you just saw
 Simulink Fixed-Point to
model fixed-point data types
– Word lengths
– Fraction lengths
 Fixed-Point Tool
– monitoring signal
min/max, overflow
– optimization of
data types
83
Use original testbench to optimize settings
84
Analyze BER to determine word length
 Anything beyond
8 bits is “good
enough”
85
Recall
ifft(x, 2048, 1)
 Issue #1
– „x‟ is 2048x4 matrix
– MATLAB does 2048-pt FFT along first dimension
– Output is also 2048x4
– Cannot process samples this way in hardware!
– Serialize design
 Issue #2
– MATLAB does double-precision floating-point arithmetic
– Floating point is expensive in hardware (power and area)
– Convert to fixed-point
86
Serial & Fixed-point
“HDL ready”
87
Automatically Generate HDL Code
Simulink HDL Coder
88
What You Just Saw - Workflow Advisor
Select ASIC, FPGA, Or FPGA Board Target

Prepare Model For HDL Code Generation
Generate HDL Code
Physical Design and Critical Path Highlighting
Program FPGA
89
What You Just Saw – Generated HDL Code
Readable, Portable
HDL Code
90
LTE Frame Structure (FDD)
0.5 ms
#0 #1 #2 #3
….. #19
10ms
 1 Frame = 10ms
 10 sub-frames per frame
 2 slots per sub-frame (1 slot = .5ms)
 7 OFDM symbols per slot
– 2048 subcarriers in our simulation
– IFFT output sample time
0.5𝑚𝑠
7 = 3.4877 × 10−8 𝑠 𝐎𝐑 28.672𝑀𝐻𝑧
2048
(30.72MHz after cyclic prefix)

91
Frame and Slot Structure
30.72 MHz = 15000 * 2048

25 * cdma2000 = 8 * W-CDMA = 30.72 MHz
 If we do 2048-length FFT, we can have 15 of
them per millisecond
 But we need a cyclic prefix
=> Choose to have 14 symbols/ms (1 sub-frame)
– Split into two slots of 7 symbols
92
Review: Automatic HDL Code Generation
 Readable, portable HDL code

 Target ASIC and FPGA
 Standard Simulink libraries
 Push-button programming of
Xilinx and Altera FPGA
 Optimize for area and speed
 Code traceability between model
and code
93
FPGA Prototyping
 Shorter iteration cycles MATLAB® and Simulink®
System and Algorithm Design
– Automatic HDL code generation
– Integrated HDL verification
Behavioral Simulation
 Flexible automatic HDL

Synthesizable
Code generation RTL
Back Annotation
– Speed Optimization Implement Design Verification

– Area Optimization Synthesis
Functional Simulation
– Power saving options Map

Static Timing Analysis
Timing Simulation
– Resource utilization Place & Route
– Validation models
FPGA Hardware
94
HDL Coder
Generate VHDL and Verilog Code for FPGA and ASIC designs
New: MATLAB to HDL

MATLAB Simulink
 Automatic floating-point to
fixed-point conversion
HDL  HDL resource optimizations

Coder and reports
 Algorithm-to-HDL traceability
 Integration with simulation &

Verilog and VHDL synthesis tools
95
HDL Verifier
Verify VHDL and Verilog code using cosimulation and FPGAs
New:
MATLAB Simulink FPGA Hardware-in-the Loop Verification
HDL
Verifier
 Support for 15 Altera and

Xilinx FPGA boards
ModelSim® and Incisive®
Xilinx® and Altera® boards • Use with:
• HDL Coder
• Hand-written HDL code
96
HDL Workflow
 Floating Point Model System

– Satisfies System Requirements Requirements
 Executable Specification
– MATLAB and/or Simulink Model
Continuous Verification
Floating Point
 Model Elaboration Model
– Develop Hardware Friendly Architecture
– Convert to Fixed-Point
 Determine Word Length Model
 Determine Binary Point Location Elaboration
 Implement Design
– Generate HDL code using HDL Coder Implement
– Import Custom and Vendor IP Code Generation
 Verification
– Software co-simulation with HDL simulator
– Hardware co-simulation Verification
97
Summary
 MATLAB is an ideal language for LTE

MATLAB to C
with
code generation
98
Thank You
Q&A
99

Modelling A 4g Lte System in Matlab

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modelling A 4g Lte System in Matlab

Uploaded by

Copyright:

Available Formats

Modeling a 4G LTE System in MATLAB

© 2012 The MathWorks, Inc.1

 4G LTE and LTE Advanced

© 2012 The MathWorks, Inc.3

 Integration of enabling technologies with sophisticated

 Smart usage of resources and bandwidth

 Simplify translation from

Algorithm libraries in MATLAB Algorithm libraries in Simulink

Existing demos of Work in-progress

Modulation Resource element OFDM signal

© 2012 The MathWorks, Inc.

1. Start modeling in MATLAB

© 2012 The MathWorks, Inc.

 Downlink Shared Channel

Reference: 3GPP TS 36 211 v10 (2010-12)

codewords layers antenna ports

Modulation Resource element OFDM signal

codewords layers antenna ports

Modulation Resource element OFDM signal

 Error correction & coding

Code block segmentation

cr 0 , cr1 ,..., cr K r 1

d r(i0) , d r(1i ) ,..., d r(i)D

er 0 , er1 ,..., er Er 1

3GPP TS 36.212 v10.0.0

cr 0 , cr1 ,..., cr K r 1

d r(i0) , d r(1i ) ,..., d r(i)D

er 0 , er1 ,..., er Er 1

 Orthogonal Frequency Division Multiplexing

 Multi-path propagation leads to frequency selective fading

 Multiple Input Multiple Output

 Several wireless standards

 STBCs insert redundant data at transmitter

 MIMO technique used in LTE standard

 Divide the data stream into independent sub-streams

 MIMO is one of the main reasons for boost in data rates

 MIMO Receiver essentially solves this system of linear

What if 2 rows are

subcarrier 𝑤𝑘 Singular Value Decomposition

 In Transmitter: (next frame)

• Null transmissions allow for separable channel estimation at Rx

Figure 6.10.1.2-1, 3GPP TS 36.211 v10.0.0

R0 R0 4 CSR, 0 nulls /slot /RB

Resource element (k,l)

R0 R0 R1 R1 R2 R3 4 CSR, 8 nulls /slot /RB

R0 R0 R1 R1 R2 R3 2 CSR, 10 nulls /slot /RB

Antenna port 0 Antenna port 1 Antenna port 2 Antenna port 3 36

 MATLAB is the ideal language for LTE

© 2012 The MathWorks, Inc.

 From algorithm exploration to system design

 Need to reduce simulation time during design

 Need to reduce time for large scale testing during

 Optimized and widely-used libraries

 JIT (Just In Time) Acceleration

 Built-in support for vector and matrix operations

 Parallel computing support to utilize additional cores

Task 1 Task 2 Task 3 Task 4

 matlabpool  available workers

 Originally for graphics acceleration, now also used for

 Massively parallel array of integer and

 Dedicated high-speed memory

 Speed up execution of computationally intensive

1) Use GPU with MATLAB built-in functions