You are on page 1of 10

Linear Prediction Speech Compression

Danang University of Technology Advanced Program in Digital Systems


Instructor: Ph.D m Vn n

Objectives: Understand theory of LP based on Wiener Filter Understand theory of LPC based Speech Compression How to design LPC Analysis filter & LPC Synthesis filter Be able to explain Matlab program (given script and algorithm of the lpc command).

Theory of Linear Prediction (LP) based on Wiener Filter:

Linear prediction is an important problem in many signal processing applications. Linear prediction is concerned with the estimated of x(n+1) in terms of a linear combination of the current and previous values of x(n).

Theory of Linear Predictive Coding (LPC) based Speech Compression: Introduction: Linear predictive coding (LPC) is defined as a digital method for encoding an analog signal in which a particular value is predicted by a linear function of the past values of the signal. Under normal circumstances, speech is sampled at 8000 samples/second with 8 bits used to represent each sample. This provides a rate of 64000 bits/second. Linear predictive coding reduces this to 2400 bits/second. At this reduced rate the speech has a distinctive synthetic sound and there is a noticeable loss of quality. However, the speech is still audible and it can still be easily understood. Speech compression is often referred to as speech coding which is defined as a method for reducing the amount of information needed to represent a speech signal Speech coding or compression is usually conducted with the use of voice coders or vocoders. There are two types of voice coders: waveform-following coders and model-base coders. LPC vocoders are considered model-based coders which means that LPC coding is lossy even if no quantization errors occur. LPC vocoders have four main attributes: bit rate, delay, complexity, quality. Bit rate: The linear predictive coder transmits speech at a bit rate of 2.4 kb/s, an excellent rate of compression. Delay: Any delay that is greater than 300ms is considered unacceptable. Complexity: LPC because of its high compression rate is very complex and involves executing millions of instructions per second. The complexity affects both the cost and the power of the vocoder. Quality: LPC sacrifices quality in order to achieve a low bit rate and as a result often sound synthetic.

Physical Model: The process of speech production in humans can be summarized as air being pushed from the lungs, through the vocal tract, and out through the mouth to generate speech. The lungs can be thought of as the source of the sound and the vocal tract can be thought of as a filter that produces the various types of sounds that make up speech. Phonemes are defined as a limited set of individual sounds. There are two categories of phonemes, voiced and unvoiced sounds, that are considered by the LPC when analyzing and synthesizing speech signals. Voiced sounds are usually vowels and often have high average energy levels and very distinct resonant or formant frequencies. It is generally known that women and children have higher pitched voices than men as a result of a faster rate of vibration during the production of voiced sounds. Unvoiced sounds are usually consonants and generally have less energy and higher frequencies then voiced sounds. Pitch is an unimportant attribute of unvoiced speech since there is no vibration of the vocal cords and no glottal pulses. When people speak: Air is pushed from your lung through your vocal tract and out of your mouth comes speech. For certain voiced sound, your vocal cords vibrate (open and close). The rate at which the vocal cords vibrate determines the pitch of your voice. Women and young children tend to have high pitch (fast vibration) while adult males tend to have low pitch (slow vibration). For certain fricatives and plosive (or unvoiced) sound, your vocal cords do not vibrate but remain constantly opened. The shape of your vocal tract determines the sound that you make. As you speak, your vocal tract changes its shape producing different sound.

The shape of the vocal tract changes relatively slowly (on the scale of 10 msec to 100 msec). The amount of air coming from your lung determines the loudness of your voice.

Mathematical Model: The particular source-filter model used in LPC is known as the Linear Predictive Coding model. It has two key components: analysis or encoding and synthesis or decoding. The analysis part of LPC involves examining the speech signal and breaking it down into segments or blocks. LPC analysis is usually conducted by a sender who answers these questions and usually transmits these answers onto a receiver. The receiver performs LPC synthesis by using the answers received to build a filter that when provided the correct input source will be able to accurately reproduce the original speech signal. The relationship between the physical and the mathematical models: Vocal Tract Air Vocal Cord Vibration Vocal Cord Vibration Period Fricatives and Plosives Air Volume Design LPC Analysis filter & LPC Synthesis filter: Introduction The particular source-filter model used in LPC is known as the Linear predictive coding model. Ithas two key components: analysis or encoding and synthesis or decoding. The analysis part of LPC involves examining the speech signal and breaking it down into segments or blocks. Each segment is than examined further to find the answers to several key questions: Is the segment voiced or unvoiced? What is the pitch of the segment? What parameters are needed to build a filter that models the vocal tract for the current segment? (LPC Filter) (Innovations (voiced) (pitch period) (unvoiced) (gain)

LPC analysis is usually conducted by a sender who answers these questions and usually transmits these answers onto a receiver. LPC Analysis filter & LPC Synthesis filter: + About Theory:

In view of WSS property, this is independent of time. The optimum predictor (i.e., the optimum set of coefficient a*N,i ) is the one that minimizes this mean squared value. We see that the prediction error efN(n) can be regarded as the output of an FIR filter AN(z) in response to the WSS input x(n). See the above figure. The FIR filter transfer function is given by:

AN ( z ) 1 i 1 a* ,i z i N
N

The IIR filter 1/AN(z) can therefore be used to reconstructed x(n) from the error signal efN(n). The conjugate sign on aN,i is for future convenience. Because efN(n) is the output of a filter in response to a WSS input, we see that efN(n) is itself a WSS random process. AN(z) is called the prediction polynomial, although its output is only the prediction error. Thus, linear prediction essentially converts the signal x(n) into the set of N numbers{ aN,i } and the error signal efN(n). This fact is exploited in data compression applications. The technique of linear predictive coding (LPC) is the process of converting segments of a real time signal into the small sect of numbers { aN,i} for storage and transmission.

+ In practice:
+LPC Synthesis: The LPC synthesis filter is given by:

which is equivalent to saying that the input-output relationship of the filter is given by the linear difference equation:

The LPC model can be represented in vector form as: A changes every 20 msec or so. At a sampling rate of 8000 samples/sec, 20 msec is equivalent to 160 samples. The digital speech signal is divided into frames of size 20 msec. There are 50 frames/second. The model says that

is equivalent to Thus the 160 values of S is compactly represented by the 13 values of A. There's almost no perceptual difference in S if: For Voiced Sounds (V): the impulse train is shifted (insensitive to phase change). For Unvoiced Sounds (UV):} a different white noise sequence is used. LPC Synthesis: Given A, generate S (this is done using standard filtering techniques). LPC Analysis: Given S, find the best A (this is described in the next section). + LPC Analysis

We now have 10 linear equations with 10 unknowns

The above matrix equation could be solved using: + The Gaussian elimination method. + Any matrix inversion method (MATLAB) + The Levinson-Durbin recursion. Matlab Program:
load sp01VN.mat; seg=x(6000:6320); % extract forward linear prediction coefficients coefficients=real(lpc(seg,12)); % estimate the residual error based on FIR prediction filter residual=filter(coefficients,1,seg); figure, plot(seg,'b'), hold on, plot(residual,'r'),axis tight % synthesize signal based on the IIR inverse filter syn_seg=filter(1,coefficients,residual); figure, plot(seg,'b'), hold on, plot(residual,'r'),plot(syn_seg,'g'), axis tight Code Explanation:

This is a typical example of using LPC in speech compression. First of all, we need to deal with the problem of finding the coefficients by using the matlab command LPC(seg,12).We using the matlab command open lpc to view the inside codes in the function LPC in Matlab. Here is the inside code:

function [a,e]=lpc(x,N); error(nargchk(1,2,nargin)) if isempty(x) error('Input vector X should not be empty'); end [m,n] = size(x); if (n>1)&(m==1) x = x(:); [m,n] = size(x); end if nargin < 2, N = m-1; elseif N < 0, % Check for N positive error('Order of the predictor should be a positive integer.');

end if (N > m-1), % disp('Warning: zero-padding short input sequence') x(N+1,:)=zeros(1,n); end % Compute autocorrelation vector or matrix X = fft(x,2^nextpow2(2*size(x,1)-1)); R = ifft(abs(X).^2); R = R./m; % Biased autocorrelation estimate [a,e] = levinson(R,N); % Return only real coefficients for the predictor if the input is real for k = 1:n, if isreal(x(:,k)) a(k,:) = real(a(k,:)); end end

The algorithm of this command is what we have discussed above in Design LPC Analysis filter & LPC Synthesis filter. We will remind again about this. lpc uses the autocorrelation method of autoregressive (AR) modeling to find the filter coefficients. The generated filter might not model the process exactly even if the data sequence is truly an AR process of the correct order. This is because the autocorrelation method implicitly windows the data, that is, it assumes that signal samples beyond the length of x are 0. lpc computes the least squares solution to where

and m is the length of x. Solving the least squares problem via the normal equations leads to the Yule-Walker equations

where r = [r(1) r(2) ... r(p+1)] is an autocorrelation estimate for x computed using xcorr. The Yule-Walker equations are solved in O(p2) flops by the Levinson-Durbin algorithm (see levinson). After finding the coefficients, we can use it to find the residual error by using the analysis filter. Inputs of the filter are the coefficients and the segment. We can easily found out that this filter command can be replaced by the convolution command in best understanding. In fact, the coefficient can be found by the convolution of the coefficients and the signal.
% estimate the residual error based on FIR prediction filter residual = conv(seg,coefficients); figure, plot(seg,'b'), hold on, plot(residual,'r'),axis tight

Similarly, the synthesis signal actually can be found by deconvolution the coefficients and the residual.
% synthesize signal based on the IIR inverse filter syn_seg = deconv(residual, coefficients); figure, plot(seg,'b'), hold on, plot(residual,'r'),plot(syn_seg,'g'), axis tight

After executing the matlab code, we get the results:

From the figure, the left is the original signal and the residual error. The right is the synthesis signal and residual signal. In fact, the matlab code draws both signals: original signal and synthesis signal in the same plot on the right. But two signal are the same because of lossless in synthesizing, then the green signal (synthesis signal) overlaps the original one.

You might also like