Basic Audio Feature

Basic Features of Audio Signals
( )
Jyh-Shing Roger Jang ( )

http://mirlab.org/jang
MIR Lab, CSIE Dept
National Taiwan Univ., Taiwan
Audio Features
Commonly used audio features
Volume, pitch, spectrum, zero crossing rate, etc.
Our goal
These features can be perceived (more or less)
subjectively.
Our goal is to compute them quantitatively (and
objectively) for further processing and
recognition.
General Steps for Audio Analysis

1. Frame blocking
Frame duration of 20~40 ms or so
2. Frame-based feature extraction

Volume, zero-crossing rate, pitch, MFCC, etc.
3. Frame-based Analysis
Query by singing/humming
Mel-frequency
cepstral coefficients
Pitch vector for QBSH comparison

MFCC for speech recognition via HMM training
& evaluation
Hidden Markov models

Frame Blocking
0.3
0.2
0.1
0
-0. 1
-0. 2
-0. 3
-0. 4
500
1000
1500
2000
2500
0.3
0.2
0.1
Quiz!
Overlap
0
-0.1
Frame
Sample rate = 16 kHz

Frame size = 512 samples
Frame duration = 512/16000 = 0.032 s = 32 ms
Overlap = 192 samples
Hop size = frame size overlap = 512-192 = 320 samples
Frame rate = 16000/320 = 50 frames/sec
-0.2
-0.3
-0.4
50
100
150
200
250
300
frame size
= hop size + overlap
hop size
overlap
Audio Features in Time Domain

3 of the most prominent time-domain audio features in a
frame (aka analysis window)
taiwan.wav
A m p lit u d e
1
0.5
0
-0.5
-1
Quiz!
0.5
2.5
3
3.5
Fundamental
period
1.5
2
Sample index
x 10
A m p lit u d e
Intensity
0.5
0
-0.5
-1
50
100
150
200
250
300
350
Sample index within the frame
400
Timbre: Waveform within an FP
450
500
Audio Features in Frequency Domain

Frequency-domain audio features in a frame
Energy: Sum of power spectrum
Pitch: Distance between harmonics
Timbre: Smoothed spectrum
First formant
F1
Energy
Pitch freq
Second formant
F2
Frame-based Manipulation
For simplicity, we usually pack frames into a
matrix for easy manipulation in MATLAB:
[y, fs] = audioread(file.wav);
frameMat = enframe(y, frameSize, overlap);
Frame n
Frame 2
Frame 1
frameMat =
Introduction to Volume
Loudness of audio signals
Visual cue: Amplitude of vibration
Also known as energy or intensity
Two major ways to

compute volume in a frame:
n
Volume:
vol si
Quiz!
i 1
Easy computation
Energy (in decibel): energy 10*log10
Better correlation with our perception
s
i 1
2
i
Volume: Perceived and Computed

Perceived volume is influenced by
Frequency
Timbre
Computed volume is influenced by

Microphone types
Microphone setups
Volume Computation
To avoid DC bias (or DC drifting)
DC bias: The vibration is not around zero
Computation (assuming constant DC bias):
n
Volume:
vol si median s
i 1
Energy (in decibel): energy 10*log10
How to prove these identities?

n
s s1 , s2 ,..., sn arg min si x
x
i 1
n
s mean s
i 1
median s
s s1 , s2 ,..., sn arg min si x mean s

x
i 1
Quiz!
Examples of Volume
Functions for computing volume
Example: volume01
Example: volume02
Example: volume03
Volume depends on
Frequency
Equal loudness test
Timbre
Example: volume04
Zero Crossing Rate

Zero crossing rate
(ZCR)
The number of zero
crossings in a frame.
Characteristics
ZCR is higher for noise
and unvoiced sounds,
lower for voiced sounds.
Zero-justification is
required before
computing ZCR.
Usage
Quiz!
For endpoint detection,

especially in detection
the start and end of
unvoiced sounds.
To distinguish noise
from unvoiced sound,
usually we add a shift
before computing ZCR.
ZCR Computations
Two types of ZCR
definitions
If a zero-value sample is
considered a case of
ZCR, then the value of
ZCR is higher.
Otherwise its lower.
The above distinction
diminishes when using a
higher bit resolution.
Other consideration
ZCR with shift can be
used to distinguish
between unvoiced
sounds and silence.
But it is hard to set up
the right shift amount.
Examples of ZCR
ZCR computing
Example: zcr01
Example: zcr02
To use ZCR to distinguish between unvoiced

sounds and environmental noise
Example: Example: zcrWithShift
Pitch
Definition
Unit
Pitch is also known as

fundamental frequency,
which is equal to the no.
of fundamental period
within a second. The
unit used here is Hertz
(Hz).
More commonly, pitch is

in terms of semitone,
which can be converted
from pitch in Hertz:
Hz
440
semitone 69 12*log 2
Quiz!
Piano roll via HTML5
Pitch Computation for Tuning Forks

Pitch of tuning forks (code)
Quiz!
fp (226 7) / 6 / 16000 0.00228125 sec

ff 1 / fp 438.356 Hz
ff
68.9352 semitone
440
pitch 69 12 * log 2
Pitch Computation for Speech

Pitch of speech (code)
Quiz!
fp (477 75) / 3 / 16000 0.008375 sec

ff 1 / fp 119 .403 Hz
ff
46.42 semitone
440
pitch 69 12 * log 2
Tones in Mandarin Chinese

Some statistics about
Mandarin Chinese
5401 characters, each
character is at least
associated with a base
syllable and a tone
411 base syllables, and
most syllables have 4
tones, so we have 1501
tonal syllables
Syllables with 3 or less

tones

More examples
1234
?????
Taiwanese
Tone sandhi
Features Related to Tones

Tone is characterized by the pitch curves:
Quiz!
Tone 1: high-high
Tone 2: low-high
Tone 3: high-low-high
Tone 4: high-low
(Put you hand on your throat and you can feel it)
Tone recognition is mostly based on features

obtained from pitch and volume
Tones in Mandarin TTS

TTS: Text to speech demo
Tone Sandhi: phonological change occurring
in tonal language
3+3 2+3

vs.

vs.
Mandarin Tone Practice

Sentences of All Tone 3

Tone Sandhi of 3+3
Quiz!
Pitch Change due to Fast Forward

If audio is played at a higher sample rate
Pitch is higher
Duration is shorter
Pitch change due to sample rate change at

playback
Sample rate: fs k*fs (at playback)

Duration: d d/k
Fundamental frequency: ff k*ff
Pitch: pitch pitch+12*log2(k)
Quiz!
Pitch Perception
Age-related hearing loss Frequencies vs. ages
Low to high, high to low

Applications
25
Freq (kHz)
As one grows old, the

audible frequency
bandwidth is getting
narrower
Mosquito ringtone
21k
20
17.4k
15k
15
12k
8k
10
18 24
40
50
Age
100
Other Things about Pitch

Some interesting
phenomena about pitch
Beat
Music by beats
Doppler effect
Shepard tone
Auditory illusion of a
tone that ascends or
descends in pitch
continuously
Overtone singing
Have you tried these?

Inhale helium to produce
high (squeaky) pitch
Resonance: break a glass
with the right pitch (just
like a swing)
How to create these effects

in MATLAB?
Beat
Beat: An interference between two sounds of
slightly different frequencies
Quiz!
f1 f 2
f1 f 2
cos 2f1t cos 2f 2t 2 cos 2

t cos 2
t
2
2
Audible beat frequency =
f1 f 2
Not | f1 f2 |/2!
fs=8000;
duration=5;
t=(1:duration*fs)/fs;
y1=cos(2*pi*440*t)';
y2=cos(2*pi*444*t)';
sound(y1+y2, fs);
Beat frequency
= 4 Hz
Timbre
Timbre is represented by
Waveform within a fundamental period
Frame-based energy distribution over frequencies
Power spectrum (over a single frame)
Spectrogram (over many frames)
Frame-based MFCC (mel-frequency cepstral

coefficients)
Timbre Demo:
Real-time Spectrogram
Simulink model for real-time display of spectrogram
dspstfft_audio (Before MATLAB R2011a)
dspstfft_audioInput (R2012a or later)
Spectrum:
Spectrogram:

Basic Audio Feature

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Audio Feature

Uploaded by

Copyright:

Available Formats

Basic Features of Audio Signals

Jyh-Shing Roger Jang ( )

General Steps for Audio Analysis

2. Frame-based feature extraction

Pitch vector for QBSH comparison

Sample rate = 16 kHz

Audio Features in Time Domain

Timbre: Waveform within an FP

Audio Features in Frequency Domain

Two major ways to

Energy (in decibel): energy 10*log10

Better correlation with our perception

Volume: Perceived and Computed

Computed volume is influenced by

Energy (in decibel): energy 10*log10

How to prove these identities?

s s1 , s2 ,..., sn arg min si x mean s

Zero Crossing Rate

For endpoint detection,

To use ZCR to distinguish between unvoiced

Pitch is also known as

More commonly, pitch is

Piano roll via HTML5

Pitch Computation for Tuning Forks

fp (226 7) / 6 / 16000 0.00228125 sec

Pitch Computation for Speech

fp (477 75) / 3 / 16000 0.008375 sec

Tones in Mandarin Chinese

Syllables with 3 or less

Features Related to Tones

Tone recognition is mostly based on features

Tones in Mandarin TTS

Mandarin Tone Practice

Sentences of All Tone 3

Pitch Change due to Fast Forward

Pitch change due to sample rate change at

Sample rate: fs k*fs (at playback)

Low to high, high to low

As one grows old, the

Other Things about Pitch

Have you tried these?

How to create these effects

cos 2f1t cos 2f 2t 2 cos 2

Audible beat frequency =

Frame-based MFCC (mel-frequency cepstral

You might also like