Pitch

PROJECT NAME: PITCH DETECTION
1. INTRODUCTION
Speech signal can be classified into voiced, unvoiced and silence regions, with majority of speech regions are voiced in nature that includes
vowels, semivowels and other voiced components. The voiced regions of a speech signal look like a near periodic signal in the time domain
representation; in a short term we may treat the voiced speech segments to be periodic for all practical analysis and processing. The
periodicity associated with such segments is defined as a pitch period T0 in the time domain and Pitch frequency or Fundamental Frequency
F0 in the frequency domain. Unless specified, the term pitch refers to the fundamental frequency F0.
Pitch is a perceptive quality that describes the highness or lowness of a sound. It is related to the frequencies contained in the signal.
Increasing the frequency causes an increase in perceived pitch. The mathematical representation of the pitch signal may be written as a
reciprocal of the fundamental frequency, thus;
T0 = 1/F0
Since the speech signal changes its properties roughly in every 5-30 ms, the best approaches in determination and analysis of these changes,
is to analyses speech signal by dividing it into frame after some interval ,for example we may divide the speech signal into segment of frame
length of 20ms. There are several methods for pitch detection among of them the two most used include cepstrum pitch determination and
autocorrelation of speech.
2. PURPOSE OF THE THIS PROJECT

The purpose of this project is to use MATLAB to estimate the speech signal’s pitch whose length is 0.9s (7200 samples) with sampling
frequency 8KH by dividing the signal into evenly 45 frames of 20ms (160 samples) per frame then estimate the pitch in every particular
frame.
3. ALGORITHM DESCRIPTION
This project will be implemented using the autocorrelation algorithm, the decision of choosing this algorithm over others is due to the facts
that the autocorrelation is very easy to implement and is most efficient at mid to low frequencies.
A commonly used method to estimate pitch is based on detecting the highest value of the autocorrelation function in the region of interest.
Given a discrete time signal x (n), defined for all n, the auto-correlation function is generally defined
N
1
R(m) = lim ∑ x(n)x(x + N)
n→∞ 2N + 1
k=−N
For a nonstationary signal, such as speech, the concept of a long-time autocorrelation measurement as given by equation above is not
really meaningful. Thus, it is reasonable to define a short-time autocorrelation function, which operates on short segments of the signal
as:
N−m
1
R(m) = ∑ [x(n + l)w(n)][x(x + l + m)]
N
n=0
0≤m≤Mo
Where w(n) is an appropriate window for analysis, m is the section length being analyzed, N is the number of signal samples used in the
computation of R(m), Mo is the number of autocorrelation points to be computed and l is the index of the starting sample of the frame.
4. PROJECT IMPLEMENTATION
The sample speech signal MaoYiSheng was first divided into 45 equal frames(short time analysis in order to make the signal time invariant)
with each frame taking 160 samples then performed autocorrelation using the matlab command xcoor(x,x), then plotted the signal of all 45
frames (amplitude against number of samples).
For pitch detection estimation the value of all negative amplitude was set to zero using find command then computed the maximum value of
the amplitude after centre value, this value gives the corresponding number of sample which can be used to calculate the fundamental
frequency and pitch period.
Fundamental frequency is calculated as; f0=Fs/Ns
Pitch period; T0 = 1/f0 or To=Ns/Fs second
Where, Fs, Ns sampling frequency and number of Sample at a point respectively
5. RESULT AND COMMENTS
The result of this project is shown in the plots below for all 45 frames, both numbers of sample at the pitch fundamental frequencies and
pitch period is indicated in the box to each frame figures; the plot of the original speech is also shown.
The pitch period varies from frame to frame but the differences are slightly among the frames, with some of the signal have very close pitch
and amplitudes.
The properties of the signal waves tends to changes as the numbers of frames changes this show that the more frames taken the better
estimation can be archived.
The plot that showing changing very fast that indicating unvoiced portion of the signal
As seen from the original signal plot it could be hard to detect the pitch without dividing the signal into the frames, now we can observe that
the highest pitch (voiced speech region with high amplitude) value is 70.796Hz or 14.1ms which is at frame 1 and minimum value (unvoiced
speech with low amplitude) is 4000Hz or 0.3 ms which is at frame 30 and 33.
Plot of the original speech before being framed into 45 frames
KEYS: - F0: Fundamental frequency, Ns: Number of sample at a pitch point, T0: pitch period
Frame1 (F0= 70.796, Ns=113) Frame2 (F0= 131.14, Ns= 61)

Lagging signal Real signal Lagging signal Real signal
T0= 0.0076s
T0=0.0141s
Frame3 (F0= 131.14, Ns=61) Frame4 (F0= 266.66, Ns=30)
T0=0.0076s T0=0.0037s
Frame5 (F0=126.98, Ns=63 ) Frame6 (F0= 126.98, Ns=63)

T0=0.0079s To=0.0079s
Frame7 (F0= 126.98, Ns=63) Frame8 (F0= 125, Ns=64)
Pitch To= 0.0079s Pitch To= 0.0080s

To=0.0079s
To=0.0079s
Frame9 (F0= 129, Ns=62) Frame10 (F0= 142.85, Ns=56)
T0=0.0077s T0=0.0070s
Frame11 (F0=148.14, Ns=54) Frame12 (F0= 166.66, Ns=48 )
T0=0.0067s T0=0.0060s
Frame13 (F0=177.77, Ns =45) Frame14 (F0=186.04, Ns= 43)

T0=0.0056s T0=0.0054s
Frame15 (F0=200, Ns= 40) Frame16 (F0=210.52, Ns=38 )
T0=0.0050s T0=0.0047s
Frame17 (F0= 210.52, Ns=38) Frame 18 (F0= 195.12, Ns =41)

Lagging signal Real signal
T0=0.0047s T0=0.0051s
Frame19 (F0=186.04 ,Ns=43 ) Frame20 (F0=150.94 ,Ns=53)
T0=0.0054s T0=0.0066s
Frame21 (F0=121.21, Ns=66) Frame22 (F0= 108.108, Ns =74)

Real signal
Lagging signal Real signal Lagging signal
T0=0.0083s T0=0.0092s
Frame23 (F0=97.56, Ns =82) Frame24 (F0=275.86, Ns= 29)
T0=0.0103s T0=0.0036s
Frame25 (F0= 89.88, Ns= 89) Frame26 (F0= 89.88, Ns= 89)
Lagging signal Lagging signal Real signal
Real signal
T0=0.0111s T0=0.0111s
Frame 27 (F0=90.90, Ns=88) Frame 28(F0=106.66, Ns= 75)
T0=0.0110s T0=0.0094s
s
Frame 29 (F0=1000, Ns= 8) Frame 30 (F0= 4000, Ns= 2)

T0=0.0010s T0=0.0003s
Frame31 (F0=2666.66, Ns= 3) Frame32 (F0=1000, Ns= 8)
T0=0.0004s T0=0.0010s
Frame33 (F0=4000, Ns =2) Frame34 (F0= 195.12, Ns=41)

T0=0.0003s To=0.0051s
Frame 35 (F0=145.45, Ns= 55) Frame 36 (F0= 137.93, Ns = 58)
T0=0.0069s T0=0.0072s
Frame 37 (F0=145.45, Na =55) Frame 38 (F0=148.148, Ns=54)

T0=0.0069s T0=0.0067s
Frame 39 (F0=150.94, Ns= 53) Frame 40 (F0=153.84, Ns=52)

T0=0.0066s T0=0.0065s
Frame41 (F0=153.84, Ns=52) Frame 42 (F0=160.00, Ns=50)

Lagging signal Real signal Lagging signal
Real signal
T0=0.0065s T0=0.0063
Frame 43 (F0=163.26, Ns= 49) Frame44 (F0=163.26, Ns=49)
T0=0.0061s T0=0.0061s
Frame 45 (F0=163.26, Ns =49)

Real signal
Lagging signal
T0=0.0061s
6. MATLABCODE
[y,Fs]=audioread('C:\Users\SALEH\Desktop\signa procesing\ppt\PitchDetection\MaoYiSheng.wav');
fs =8000;%also Fs
frle = 160;%number of sample per frame
frsize =45;%frame size
frmedur =0.02;%equivaent to 160 sample
k=1;%initialization
plot(y)
title('Original signal speech-student 2820170009');
ylabel('amplitude');
xlabel('samples');
hold on
while(k <=frsize)%compute frame
frame = y ((k-1)*frle + 1: frle*k);
%signa = xcorr(frame,frame);
[sig lag] = xcorr(frame, frame);%compute autocorreation
maxb=0;
k=k+1;
figure;
plot(lag,sig,'B')
grid on
title('autocoreation-student 2820170009');
ylabel('coefficient');
xlabel('samples');
sig(find(sig < 0)) = 0; %set any negative correlation values to zero
center_peak_width = find(sig(frle:end) == 0 ,1); %find first zero after center
sig(frle-center_peak_width : frle+center_peak_width ) = min(sig);
[max_val loc] = max(sig);
sampl = abs(loc - length(frame)+1);% the sample point that gives maximum amplitudes
%period=1/fo;
fo=fs/sampl;
period=1/fo;% or period = sampl/fs
fprintf('At sampling number of %d,Fundemental frequency is %2f and Periodicity
is%.4f\n',sampl,fo,period);%print fundamental and perioicty on same line
end
7. REFERENCES
1. Fast, accurate pitch detection tools for music analysis by Philip Mcleod
2. Li tan and montri karnjanadecha, department of computer engineering, faculty of engineering, prince of Songkhla University, hat yai,
songkhla, Thailand, 90112.
3. On the Use of Autocorrelation Analysis for Pitch Detection, LAWRENCE R. RABINER, FELLOW, IEEE
4. https://en.wikipedia.org/wiki/Pitch_detection_algorithm
5. https://en.wikipedia.org/wiki/Autocorrelation
6. Power point presentation on Digital speech signal processing by Ph.D. Associate Prof. Wang Jing

Pitch

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pitch

Uploaded by

Copyright:

Available Formats

PROJECT NAME: PITCH DETECTION

2. PURPOSE OF THE THIS PROJECT

Fundamental frequency is calculated as; f0=Fs/Ns

Pitch period; T0 = 1/f0 or To=Ns/Fs second

Where, Fs, Ns sampling frequency and number of Sample at a point respectively

5. RESULT AND COMMENTS

Frame1 (F0= 70.796, Ns=113) Frame2 (F0= 131.14, Ns= 61)

Frame3 (F0= 131.14, Ns=61) Frame4 (F0= 266.66, Ns=30)

Lagging signal Real signal Lagging signal Real signal

Frame5 (F0=126.98, Ns=63 ) Frame6 (F0= 126.98, Ns=63)

Frame7 (F0= 126.98, Ns=63) Frame8 (F0= 125, Ns=64)

Lagging signal Real signal Lagging signal Real signal

Pitch To= 0.0079s Pitch To= 0.0080s

Frame11 (F0=148.14, Ns=54) Frame12 (F0= 166.66, Ns=48 )

Lagging signal Real signal Lagging signal Real signal

Frame13 (F0=177.77, Ns =45) Frame14 (F0=186.04, Ns= 43)

Frame15 (F0=200, Ns= 40) Frame16 (F0=210.52, Ns=38 )

Lagging signal Real signal Lagging signal Real signal

Frame17 (F0= 210.52, Ns=38) Frame 18 (F0= 195.12, Ns =41)

Frame19 (F0=186.04 ,Ns=43 ) Frame20 (F0=150.94 ,Ns=53)

Lagging signal Real signal Lagging signal Real signal

Frame21 (F0=121.21, Ns=66) Frame22 (F0= 108.108, Ns =74)

Frame23 (F0=97.56, Ns =82) Frame24 (F0=275.86, Ns= 29)

Lagging signal Real signal Lagging signal Real signal

Frame 27 (F0=90.90, Ns=88) Frame 28(F0=106.66, Ns= 75)

Lagging signal Real signal Lagging signal Real signal

Frame 29 (F0=1000, Ns= 8) Frame 30 (F0= 4000, Ns= 2)

Frame31 (F0=2666.66, Ns= 3) Frame32 (F0=1000, Ns= 8)

Lagging signal Real signal Lagging signal Real signal

Frame33 (F0=4000, Ns =2) Frame34 (F0= 195.12, Ns=41)

Frame 35 (F0=145.45, Ns= 55) Frame 36 (F0= 137.93, Ns = 58)

Lagging signal Real signal Lagging signal Real signal

Frame 37 (F0=145.45, Na =55) Frame 38 (F0=148.148, Ns=54)

Frame 39 (F0=150.94, Ns= 53) Frame 40 (F0=153.84, Ns=52)

Lagging signal Real signal

Frame41 (F0=153.84, Ns=52) Frame 42 (F0=160.00, Ns=50)

Frame 43 (F0=163.26, Ns= 49) Frame44 (F0=163.26, Ns=49)

Lagging signal Real signal Lagging signal Real signal

Frame 45 (F0=163.26, Ns =49)

You might also like