Professional Documents
Culture Documents
Chunjian Li
Aalborg University, Denmark
Introduction
Applications:
- Improving quality and inteligibility (hearing
aid, cockpit comm., video conferencing ...)
- Source coding (mobile phone, video
conferencing, IP phone ...)
- Pre-processor for other speech processing
applications (speech recognition, speaker
varification ...)
Introduction
Classification 1
- Single channel
- Multi-channel
* with accoustic barrier (Adaptive Noise Cancelling)
* without accoustic barrier (Array Processing)
Classification 2
- Spectrum subtraction (Power Spectral Subtraction, Amplitude
Spectral Subtraction, Autocorrelation Subtraction, Non-causal
Infinite Wiener Filtering)
- Parametric method (Iterative Wiener Filtering)
- Adaptive noise cancelling
- Adaptive comb filtering
Single Channel Speech
Enhancement
Stochastic Model
- Noise process: broadband (white),
stationary (or short-time stationary),
uncorrelated to speech, additive.
- Speech process: short-time stationary.
- Need short-time processing
Γy (ω ; m) = Γs (ω ; m) + Γd (ω ; m)
Γˆ s (ω ) = Γy (ω ) − Γˆ d (ω )
| Sˆs (ω ) |= N Γˆ s (ω )
jϕ y ( ω )
S s (ω ) =| S s (ω ) | e
ˆ ˆ (1)
Generalization
Eq(1) can be written as:
[ ]e
1 (2)
jϕ y ( ω )
Sˆs (ω ) = | S y (ω ) |α − | Sˆd (ω ) |α α
Low complexity
Severe musical noise
Usually need further enhancement
- Smoothing in time and frequency; Rectification;
Oversuppressing ASS:
Oversuppressing PSS:
Smoothing in time:
Wiener Filtering
Orthogonality principle:
∞
E s (n) − ∑ h(q ) y (n − q ) y (n − k ) = 0
q = −∞
∞
⇒ R ys (k ) = ∑ h( q ) R
q = −∞
yy (k − q) Wiener-Hopf equation
Noncausal Wiener Filter
The algorithm:
1. Estimate the LP coef. From the noisy
oberservation samples. Estimate the noise
spectrum during nonspeech activity.
2. Estimate the waveform using noncasual WF given
the current estimate of LP coef. and current
estimate of the noise spectrum.
3. Estimate the LP coef. again given the current
estimate of the waveform.
4. Keep doing the iteration until some criterion is
satisfied.
Iterative Wiener Filtering
Comments:
- Convergence is not garanteed, a heuristic stop
criterion is needed
- Result in unrealisticly sharp formants and pole
jittering
- Suffer from musical noise
- Need some kind of smoothing
10 dB noisy sample:
Iterative WF:
Iterative WF with smoothing:
Further enhancement to IWF
Ephraim-Malah, 1984
The basis of the noise reduction
function of MELPe coding standard
Consists of two parts: Decision-
Directed method estimating the a priori
speech spectrum, and the MMSE
Short-Time Spectral Amplitude (STSA)
estimator
MMSE STSA estimator
Assumptions:
- Stationary additive Gaussian noise with known spectrum.
- An estimate of the speech spectrum is available.
- Spectral components (DFT coefficients) are statistically independent
and each follows Gaussian distribution (the DFT amplitude follows
Rayleigh distribution).
- The DFT phase follows uniform distribution and is independent of the
amplitude.
1 1 Ak Ak 2
p(Yk | Ak ,αk ) = exp− | Yk − Ake jαk |2 , p( Ak , α k ) = exp−
πλd (k) λd (k) πλx (k ) λx (k )
Where I 0 (⋅) and I1 (⋅) denote the modified Bessel functions of zero
and first order, and vk is defined by:
ξk
vk = γk
1 + ξk
MMSE STSA estimator
Where ξ k and γk are defined by:
λ (k ) Rk2
ξk = x γk =
λd (k ) λd ( k )
Solid line: power subtraction; dashed line: The MMSE STSA. Rpost denotes the A priori
Wiener filter. SNR estimated without smoothing (the
instantaneous SNR).
Comments on the MMSE
STSA estimator
The gain curve transit smoothly between the power
subtraction curve and the Wiener curve. This transit is
controled by the un-smoothed estimate of a priori SNR (Rprio).
The larger Rprio, the stronger the anttenuation.
This counter-intuitive behavior manages to flatten the spurious
spectral peaks caused by the noise at the low SNR part of the
spectrum. While WF tends to sharpen the spurious peaks at
the low SNR part of the specatrum.
The phase of the noisy speech is used as the phase of the
enhanced speech, because of the assumption of uniform
distributed phase. An independent MMSE estimate of the
phasor has nonunity modulus, thus can not be combined with
the MMSE STSA.
Suffer less musical noise than the WF.
MMSE Log-Spectral Amplitude
Estimator
A modification to the MMSE STSA based on the fact that a distortion
measure based on the mean-square error of the log-spectra is more
suitable for speech processing.
Minimize the distortion measure E[(log Ak − log Aˆ k ) 2 ]
The MMSE LSA estimator can be shown to be:
Aˆ k = exp( E[ln Ak | Yk ])
ξk 1 ∞ e −t
= exp( ∫ dt ) Rk
1 + ξk 2 v tk
ξk
where vk = γ k , ξ k and γ k are a priori SNR and a
1 + ξk
posteriori SNR as defined before.
MMSE Log-Spectral Amplitude
Estimator
Comparison of the suppression gains of MMSE STSA and MMSE LSA
Decision-Directed
Wiener Filter: MMSE LSA:
Noisy sample
(0 dB):
MMSE estimator with non-
Gaussian prior
How well does Gaussian model fit the real probability distribution of DFT
coefficients?