You are on page 1of 18

PROJECT NAME: PITCH DETECTION

1. INTRODUCTION

Speech signal can be classified into voiced, unvoiced and silence regions, with majority of speech regions are voiced in nature that includes
vowels, semivowels and other voiced components. The voiced regions of a speech signal look like a near periodic signal in the time domain
representation; in a short term we may treat the voiced speech segments to be periodic for all practical analysis and processing. The
periodicity associated with such segments is defined as a pitch period T0 in the time domain and Pitch frequency or Fundamental Frequency
F0 in the frequency domain. Unless specified, the term pitch refers to the fundamental frequency F0.

Pitch is a perceptive quality that describes the highness or lowness of a sound. It is related to the frequencies contained in the signal.
Increasing the frequency causes an increase in perceived pitch. The mathematical representation of the pitch signal may be written as a
reciprocal of the fundamental frequency, thus;

T0 = 1/F0

Since the speech signal changes its properties roughly in every 5-30 ms, the best approaches in determination and analysis of these changes,
is to analyses speech signal by dividing it into frame after some interval ,for example we may divide the speech signal into segment of frame
length of 20ms. There are several methods for pitch detection among of them the two most used include cepstrum pitch determination and
autocorrelation of speech.

2. PURPOSE OF THE THIS PROJECT


The purpose of this project is to use MATLAB to estimate the speech signal’s pitch whose length is 0.9s (7200 samples) with sampling
frequency 8KH by dividing the signal into evenly 45 frames of 20ms (160 samples) per frame then estimate the pitch in every particular
frame.

3. ALGORITHM DESCRIPTION

This project will be implemented using the autocorrelation algorithm, the decision of choosing this algorithm over others is due to the facts
that the autocorrelation is very easy to implement and is most efficient at mid to low frequencies.
A commonly used method to estimate pitch is based on detecting the highest value of the autocorrelation function in the region of interest.
Given a discrete time signal x (n), defined for all n, the auto-correlation function is generally defined

N
1
R(m) = lim ∑ x(n)x(x + N)
n→∞ 2N + 1
k=−N

For a nonstationary signal, such as speech, the concept of a long-time autocorrelation measurement as given by equation above is not
really meaningful. Thus, it is reasonable to define a short-time autocorrelation function, which operates on short segments of the signal
as:

N−m
1
R(m) = ∑ [x(n + l)w(n)][x(x + l + m)]
N
n=0
0≤m≤Mo

Where w(n) is an appropriate window for analysis, m is the section length being analyzed, N is the number of signal samples used in the
computation of R(m), Mo is the number of autocorrelation points to be computed and l is the index of the starting sample of the frame.

4. PROJECT IMPLEMENTATION

The sample speech signal MaoYiSheng was first divided into 45 equal frames(short time analysis in order to make the signal time invariant)
with each frame taking 160 samples then performed autocorrelation using the matlab command xcoor(x,x), then plotted the signal of all 45
frames (amplitude against number of samples).
For pitch detection estimation the value of all negative amplitude was set to zero using find command then computed the maximum value of
the amplitude after centre value, this value gives the corresponding number of sample which can be used to calculate the fundamental
frequency and pitch period.

Fundamental frequency is calculated as; f0=Fs/Ns

Pitch period; T0 = 1/f0 or To=Ns/Fs second

Where, Fs, Ns sampling frequency and number of Sample at a point respectively

5. RESULT AND COMMENTS

The result of this project is shown in the plots below for all 45 frames, both numbers of sample at the pitch fundamental frequencies and
pitch period is indicated in the box to each frame figures; the plot of the original speech is also shown.

The pitch period varies from frame to frame but the differences are slightly among the frames, with some of the signal have very close pitch
and amplitudes.

The properties of the signal waves tends to changes as the numbers of frames changes this show that the more frames taken the better
estimation can be archived.

The plot that showing changing very fast that indicating unvoiced portion of the signal

As seen from the original signal plot it could be hard to detect the pitch without dividing the signal into the frames, now we can observe that
the highest pitch (voiced speech region with high amplitude) value is 70.796Hz or 14.1ms which is at frame 1 and minimum value (unvoiced
speech with low amplitude) is 4000Hz or 0.3 ms which is at frame 30 and 33.
Plot of the original speech before being framed into 45 frames

KEYS: - F0: Fundamental frequency, Ns: Number of sample at a pitch point, T0: pitch period

Frame1 (F0= 70.796, Ns=113) Frame2 (F0= 131.14, Ns= 61)


Lagging signal Real signal Lagging signal Real signal

T0= 0.0076s

T0=0.0141s

Frame3 (F0= 131.14, Ns=61) Frame4 (F0= 266.66, Ns=30)

Lagging signal Real signal Lagging signal Real signal

T0=0.0076s T0=0.0037s

Frame5 (F0=126.98, Ns=63 ) Frame6 (F0= 126.98, Ns=63)


Lagging signal Real signal Lagging signal Real signal

T0=0.0079s To=0.0079s

Frame7 (F0= 126.98, Ns=63) Frame8 (F0= 125, Ns=64)

Lagging signal Real signal Lagging signal Real signal

Pitch To= 0.0079s Pitch To= 0.0080s


To=0.0079s
To=0.0079s
Frame9 (F0= 129, Ns=62) Frame10 (F0= 142.85, Ns=56)
Lagging signal Real signal Lagging signal Real signal

T0=0.0077s T0=0.0070s

Frame11 (F0=148.14, Ns=54) Frame12 (F0= 166.66, Ns=48 )

Lagging signal Real signal Lagging signal Real signal

T0=0.0067s T0=0.0060s

Frame13 (F0=177.77, Ns =45) Frame14 (F0=186.04, Ns= 43)


Lagging signal Real signal Lagging signal Real signal

T0=0.0056s T0=0.0054s

Frame15 (F0=200, Ns= 40) Frame16 (F0=210.52, Ns=38 )

Lagging signal Real signal Lagging signal Real signal

T0=0.0050s T0=0.0047s

Frame17 (F0= 210.52, Ns=38) Frame 18 (F0= 195.12, Ns =41)


Lagging signal Real signal
Lagging signal Real signal

T0=0.0047s T0=0.0051s

Frame19 (F0=186.04 ,Ns=43 ) Frame20 (F0=150.94 ,Ns=53)

Lagging signal Real signal Lagging signal Real signal

T0=0.0054s T0=0.0066s

Frame21 (F0=121.21, Ns=66) Frame22 (F0= 108.108, Ns =74)


Real signal
Lagging signal Real signal Lagging signal

T0=0.0083s T0=0.0092s

Frame23 (F0=97.56, Ns =82) Frame24 (F0=275.86, Ns= 29)

Lagging signal Real signal Lagging signal Real signal

T0=0.0103s T0=0.0036s

Frame25 (F0= 89.88, Ns= 89) Frame26 (F0= 89.88, Ns= 89)
Lagging signal Lagging signal Real signal
Real signal

T0=0.0111s T0=0.0111s

Frame 27 (F0=90.90, Ns=88) Frame 28(F0=106.66, Ns= 75)

Lagging signal Real signal Lagging signal Real signal

T0=0.0110s T0=0.0094s
s

Frame 29 (F0=1000, Ns= 8) Frame 30 (F0= 4000, Ns= 2)


Lagging signal Real signal
Lagging signal Real signal

T0=0.0010s T0=0.0003s

Frame31 (F0=2666.66, Ns= 3) Frame32 (F0=1000, Ns= 8)

Lagging signal Real signal Lagging signal Real signal

T0=0.0004s T0=0.0010s

Frame33 (F0=4000, Ns =2) Frame34 (F0= 195.12, Ns=41)


Lagging signal Real signal Lagging signal Real signal

T0=0.0003s To=0.0051s

Frame 35 (F0=145.45, Ns= 55) Frame 36 (F0= 137.93, Ns = 58)

Lagging signal Real signal Lagging signal Real signal

T0=0.0069s T0=0.0072s

Frame 37 (F0=145.45, Na =55) Frame 38 (F0=148.148, Ns=54)


Lagging signal Real signal Lagging signal Real signal

T0=0.0069s T0=0.0067s

Frame 39 (F0=150.94, Ns= 53) Frame 40 (F0=153.84, Ns=52)

Lagging signal Real signal


Lagging signal Real signal

T0=0.0066s T0=0.0065s

Frame41 (F0=153.84, Ns=52) Frame 42 (F0=160.00, Ns=50)


Lagging signal Real signal Lagging signal
Real signal

T0=0.0065s T0=0.0063

Frame 43 (F0=163.26, Ns= 49) Frame44 (F0=163.26, Ns=49)

Lagging signal Real signal Lagging signal Real signal

T0=0.0061s T0=0.0061s

Frame 45 (F0=163.26, Ns =49)


Real signal
Lagging signal

T0=0.0061s

6. MATLABCODE
[y,Fs]=audioread('C:\Users\SALEH\Desktop\signa procesing\ppt\PitchDetection\MaoYiSheng.wav');
fs =8000;%also Fs
frle = 160;%number of sample per frame
frsize =45;%frame size
frmedur =0.02;%equivaent to 160 sample
k=1;%initialization
plot(y)
title('Original signal speech-student 2820170009');
ylabel('amplitude');
xlabel('samples');
hold on
while(k <=frsize)%compute frame
frame = y ((k-1)*frle + 1: frle*k);
%signa = xcorr(frame,frame);
[sig lag] = xcorr(frame, frame);%compute autocorreation
maxb=0;
k=k+1;
figure;
plot(lag,sig,'B')
grid on
title('autocoreation-student 2820170009');
ylabel('coefficient');
xlabel('samples');
sig(find(sig < 0)) = 0; %set any negative correlation values to zero
center_peak_width = find(sig(frle:end) == 0 ,1); %find first zero after center
sig(frle-center_peak_width : frle+center_peak_width ) = min(sig);
[max_val loc] = max(sig);
sampl = abs(loc - length(frame)+1);% the sample point that gives maximum amplitudes
%period=1/fo;
fo=fs/sampl;
period=1/fo;% or period = sampl/fs
fprintf('At sampling number of %d,Fundemental frequency is %2f and Periodicity
is%.4f\n',sampl,fo,period);%print fundamental and perioicty on same line
end

7. REFERENCES
1. Fast, accurate pitch detection tools for music analysis by Philip Mcleod
2. Li tan and montri karnjanadecha, department of computer engineering, faculty of engineering, prince of Songkhla University, hat yai,
songkhla, Thailand, 90112.
3. On the Use of Autocorrelation Analysis for Pitch Detection, LAWRENCE R. RABINER, FELLOW, IEEE
4. https://en.wikipedia.org/wiki/Pitch_detection_algorithm
5. https://en.wikipedia.org/wiki/Autocorrelation
6. Power point presentation on Digital speech signal processing by Ph.D. Associate Prof. Wang Jing

You might also like