Professional Documents
Culture Documents
reconstruction
approach for robust speaker
recognition
By: RAJAT SHUKLA(1209531025)
SACHIN PAL(1209531027)
GUIDED BY: Mrs SHILPI SHUKLA
INTRODUCTION
Speaker recognition is the identification of a person
from characteristics of voices (voice biometrics). It is
also called voice recognition. There is a difference
between speaker recognition (recognizing who is
speaking) and speech recognition (recognizing what
is being said).
In automatic speaker or speech recognition the lack of
robustness has remained a major challenge.
HUMAN SPEECH
The human speech contains numerous discriminative
features that can be used to identify speakers.
Speech contains significant energy from zero
frequency up to around 5 kHz.
Objective of automatic speaker recognition is to
extract, characterize and recognize the information
about speaker identity.
The property of speech signal changes markedly as a
function of time.
ROBUSTNESS
Robustness is the ability of a computer system to
cope with errors during execution.
Robustness can also be defined as the ability of an
algorithm to continue operating despite abnormalities
in input, calculations, etc.
In communication robustness means the strength of
the speech signal originally should be same at the
receivers end.
NO DATA SHOULD BE LOST OR DAMAGED.
FEATURE RECONSTRUCTION
Feature extraction starts from an initial set of
measured data and builds derived values (features)
intended to be informative, non redundant, facilitating
the subsequent learning and generalization steps, in
some cases leading to better human interpretations.
Feature extraction is related to dimensionality
reduction.
General dimensionality reduction technique used is
PCA(Principle Component Analysis).
PCA
PCA was invented in 1901 by Karl Pearson.
The main purposes of a principal component analysis
are the analysis of data to identify patterns and
finding patterns to reduce the dimensions of the
dataset with minimal loss of information.
X
t1 t2
P1
E
P2
(Nois
e)
P (Loading Matrix)
PC2
X2
0
Q1
Q2
2
PC1
0
X1
Clean
speech(extract
Mel log-spectral
log-spectral
divide into 2
SB SB1:
channel 1 to
P/2 SB2:
channel P/2+1
to P
Execute
reconstruction
on SB1 and
SB2(increased
robustness)
Recombine
the
reconstructed
vectors
Speech output