Professional Documents
Culture Documents
( )
Audio Features
Commonly used audio features
Volume, pitch, spectrum, zero crossing rate, etc.
Our goal
These features can be perceived (more or less)
subjectively.
Our goal is to compute them quantitatively (and
objectively) for further processing and
recognition.
3. Frame-based Analysis
Query by singing/humming
Mel-frequency
cepstral coefficients
Frame Blocking
0.3
0.2
0.1
0
-0. 1
-0. 2
-0. 3
-0. 4
500
1000
1500
2000
2500
0.3
0.2
0.1
Quiz!
Overlap
0
-0.1
Frame
-0.2
-0.3
-0.4
50
100
150
200
250
300
frame size
= hop size + overlap
hop size
overlap
A m p lit u d e
1
0.5
0
-0.5
-1
Quiz!
0.5
2.5
3
3.5
Fundamental
period
1.5
2
Sample index
x 10
A m p lit u d e
Intensity
0.5
0
-0.5
-1
50
100
150
200
250
300
350
Sample index within the frame
400
450
500
Energy
Pitch freq
Second formant
F2
Frame-based Manipulation
For simplicity, we usually pack frames into a
matrix for easy manipulation in MATLAB:
[y, fs] = audioread(file.wav);
frameMat = enframe(y, frameSize, overlap);
Frame n
Frame 2
Frame 1
frameMat =
Introduction to Volume
Loudness of audio signals
Visual cue: Amplitude of vibration
Also known as energy or intensity
vol si
Quiz!
i 1
Easy computation
s
i 1
2
i
Volume Computation
To avoid DC bias (or DC drifting)
DC bias: The vibration is not around zero
Computation (assuming constant DC bias):
n
Volume:
vol si median s
i 1
s mean s
i 1
median s
i 1
Quiz!
Examples of Volume
Functions for computing volume
Example: volume01
Example: volume02
Example: volume03
Volume depends on
Frequency
Equal loudness test
Timbre
Example: volume04
Characteristics
ZCR is higher for noise
and unvoiced sounds,
lower for voiced sounds.
Zero-justification is
required before
computing ZCR.
Usage
Quiz!
ZCR Computations
Two types of ZCR
definitions
If a zero-value sample is
considered a case of
ZCR, then the value of
ZCR is higher.
Otherwise its lower.
The above distinction
diminishes when using a
higher bit resolution.
Other consideration
ZCR with shift can be
used to distinguish
between unvoiced
sounds and silence.
But it is hard to set up
the right shift amount.
Examples of ZCR
ZCR computing
Example: zcr01
Example: zcr02
Pitch
Definition
Unit
440
semitone 69 12*log 2
Quiz!
Quiz!
ff
68.9352 semitone
440
pitch 69 12 * log 2
Quiz!
ff
46.42 semitone
440
pitch 69 12 * log 2
More examples
1234
?????
Taiwanese
Tone sandhi
vs.
vs.
Quiz!
Quiz!
Pitch Perception
Age-related hearing loss Frequencies vs. ages
25
Freq (kHz)
21k
20
17.4k
15k
15
12k
8k
10
18 24
40
50
Age
100
Doppler effect
Shepard tone
Auditory illusion of a
tone that ascends or
descends in pitch
continuously
Overtone singing
Beat
Beat: An interference between two sounds of
slightly different frequencies
Quiz!
f1 f 2
f1 f 2
f1 f 2
Not | f1 f2 |/2!
fs=8000;
duration=5;
t=(1:duration*fs)/fs;
y1=cos(2*pi*440*t)';
y2=cos(2*pi*444*t)';
sound(y1+y2, fs);
Beat frequency
= 4 Hz
Timbre
Timbre is represented by
Waveform within a fundamental period
Frame-based energy distribution over frequencies
Power spectrum (over a single frame)
Spectrogram (over many frames)
Timbre Demo:
Real-time Spectrogram
Simulink model for real-time display of spectrogram
dspstfft_audio (Before MATLAB R2011a)
dspstfft_audioInput (R2012a or later)
Spectrum:
Spectrogram: