Estimation

ESTIMATION AND DETECTION THEORY
All real signals are random because any measurable signals are corrupted by
noise. When unknown parameters are not directly measurable, we go for
estimation and estimate them from known measurable parameters. So some
data or measurements are prerequisite for initiating estimation. Since noise
is zero mean process, averaging can minimize noise to a large extend. The
steps in estimation are: Modeling like a physical or mathematical model, Data
collection, estimation of unknown parameters and finally validation of
results. Example for estimation is DSB SC signal transmission and reception.
In discrete time estimation, noise samples will differ with time, so estimated
parameter also varies with time. Uncertainty about which bit is transmitted is
the problem in discrete systems. The pdf of the estimate depends upon the
noise model, form of estimation function and the original signal structure.
The assumptions about noise are noise are Gaussian which holds always true
due to central limit theorem and noise is assumed to be zero mean white due
to which noise samples are uncorrelated. Sometimes whiteness assumption
is removed. If not zero mean, we can make it zero mean. Since noise is
assumed to be Gaussian, x(n) will also be Gaussian and the estimates will
also be Gaussian (x(n) = s(n) + w(n)). The desired properties of the
estimates are their mean i.e.; E(x); should be equal to the original parameter
and variance should be small.
All the statistical information needed for an estimation problem is captured

by the parameterized pdf p(x;θ) (likelihood function), where θ is the
unknown vector and is deterministic and x is n-dimensional random data
vector. This pdf is used to find the estimate or estimator function. (In p(x,θ),
both x and θ are random).
We have to choose a suitable noise model to find the pdf to get tractable
solutions (tractable means solution which suits the problem with affordable
complexity). If θ strongly influences the pdf, we can find good estimates. In
classical estimation, θ is taken as deterministic. No apriori information about
θ is included. In Bayesian, θ is taken as random and apriori information is
included. ML, MVU, LS etc are classical methods and MMSE, MAP, Wiener
filter, Kalman filter etc are Bayesian methods. Example is stock market
prediction based on each value of previous one week (Classical) and based on
distribution of points (Bayesian). Bayesian cannot give sudden dips and hikes
in prediction.
Estimator performance is analyzed through expectation of estimate (same as

original) and its variance (small) and also by expectation of error (zero) and
its variance (small).
For dc level plus noise, sample mean estimator is an optimal estimator. Its
expectation is original dc A and variance is σ2/N. We can also use mse to find
the optimality but mse depends on θ; so it can only be used in simulation to
know the validity of estimator. So theoretical metric can’t be mse but can be
variance (it does not depend on θ) (Look proof in Notebook).
We can find many estimators. Among this many may be unbiased also. But
only a few will have minimum variance. Unbiased means expectation of the
estimate is original parameter. MVU estimators can also be called as MMSE
unbiased estimators. MVU estimators sometimes does not exist due to two
reasons; there may not be any unbiased estimator or none of the unbiased
estimators has a uniformly minimum variance. So to find whether the
estimator derived is indeed the MVUE or to find if there does exist some
other MVUE we use Crammer Rao Lower Bound (CRLB) and sees if some
estimator satisfies it or we restrict to linear unbiased and find the MVLU. We
cannot always find the MVU estimator.
CRLB is a lower bound on the variance of any unbiased estimator. None of the
unbiased estimators can have a variance less than CRLB. CRLB tells us the
best we can ever expect to be able to do with an unbiased estimator (Study
CRLB theorem. Some uses of CRLB are feasibility studies (i.e. can we meet
our specifications), judgement of proposed estimators, can sometimes
provide form for MVU estimator and demonstrates the importance of
physical or signal parameters to the estimation problem. (If variance is zero
then x is deterministic and there is no need for CRLB).
Using the curvature of the pdf, we are measuring the accuracy. 1st derivative
gives slope, second derivative gives curvature, sharpness is measured using
curvature. If curvature increases, pdf concentration increases, accuracy
increases and variance decreases. This is why we take second derivative in
CRLB theorem. To find this curvature in general, we average over the random
vector to get the average curvature.
In finding CRLB, after finding the second derivative, if result still depends on
x, it means it doesn’t give a uniform curvature. So average out x by taking the
expected value with respect to x. Otherwise skip this step. The result may still
depend on θ. So evaluate at each specific value of θ desired. Dependence on x
and θ means there is a non-uniform bound. CRLB is found for the estimation
problem and not for the estimator; so for an estimation problem, CRLB is
unique. An estimator that is unbiased and attains the CRLB is said to be an
“Efficient Estimator”. Not all estimators are efficient. Not even all MVU
estimators are efficient. If the first partial test fails, we cannot find the
efficient estimator (but does it exist (doubt)).
Sample mean estimator is an optimal estimator for dc level with noise,

provided noise is AWGN. So it is the MVUE for DC level in AWGN. In phase
estimation problem, CRLB is 1/(N*SNR), but an efficient estimator cannot be
found. But an estimator with variance tending to CRLB as N tends to infinity
can be found. Such estimators are called asymptotically efficient estimators.
(study second equation to find CRLB from Notebook marked star with red
ink). I(θ)is called Fisher Information. Fisher is the father of Modern
Informatics. Fisher Information is a measure of the expected goodness of the
data for the purpose of making an estimate. It has the properties for
Information like ≥ 0 and additive for IID.
CRLB for signals marked star. If signal is very sensitive to parameter change,
then CRLB is small and get very accurate estimate.
Transformation of parameters: If α=g(θ), then CRLBα = (𝜕𝑔(θ)/𝜕θ)2 *CRLBθ.
To find estimate of SNR (A/σ2) given estimate of variance.
If θ has an efficient estimator, then 𝛼̂=g(θ̂)is an efficient estimator of α if g(θ)
has the form g(θ) = a θ + b (Proof marked star).
For vector case instead of variance we take the covariance matrix and Fisher
Information matrix. Diagonal elements in the inverse Fisher Information
matrix give the CRLB bounds. Expression for (m,n)th element of Fisher
Information matrix is marked star. Inverse Fisher Information matrix is
positive semidefinite (star). CRLB definition from text or Note star (In note
after some pages).
Transformation of parameter formula for vector parameter case by finding
the Jacobian (star).
Linear Model with WGN: Since an MVU estimator is not always guaranteed,
we can define a class of model for which we can find MVU estimators. i.e. if
the n point data can be modeled as x=H θ + W (where x is N by 1, H is N by P,
θ is P by 1, W is N by 1), then we can guarantee an MVU estimator for this and
can find the MVU estimator and CRLB as usual. The MVU estimator for this is
given by θ̂ = (HTH)-1HTX and variance or CRLB by σ2(HTH)-1 (Derivation
marked star). (HTH)-1 is the pseudo inverse of H (since we cannot take the
inverse of H due to its dimension). For (HTH)-1 to exist H should have rank p
(Doubt – why?). θ̂ is a linear transformation of a Gaussian vector x. So θ̂ has
also a Gaussian distribution. i.e. θ̂ ~ (θ, σ2(HTH)-1). Also the estimator is
efficient because first derivative can be written in that form. If we can write
in that form, that means variance is CRLB itself. By taking E(θ̂), we can see
that it is unbiased. Examples for linear models is curve fitting, dc level in
AWGN, line fitting, Fourier analysis, System Identification etc (Equation and
matrix for curve fitting, line fitting Fourier analysis and System Identification
marked star). In Linear Models with known parameters, i.e. if X= H θ + S +
W, take θ̂ as θ̂ = (HTH)-1HT(X-S). If noise is coloured, then θ̂ = (HTC-1H)-1HTC-
1X and CRLB or covariance is (HTC-1H)-1. For DC level in AWGN, H is I.
BLUE: Sometimes an MVU estimator does not occur or cannot be found. Till
now we have taken pdf and derived all estimators. But sometimes pdf may
not be available. So we go for a suboptimal estimator BLUE. In BLUE, we need
to know only the first and second moments of pdf. Estimator is restricted to
be linear in data here. There may be a large number of linear estimators but
the best linear estimator or BLUE is the one which is unbiased and has
minimum variance. For DC level in AWGN, BLUE is the MVU estimator itself.
So BLUE is optimal here. But for some cases it is suboptimal. The
prerequisites of BLUE are H must be deterministic to get unbiasedness, input
sequence has to be deterministic and noise must have zero mean with a
known positive definite covariance matrix C. then θ̂BLUE = (HTC-1H)-1HTC-1X
and variance is (HTC-1H)-1. θ̂BLUE is weighted by C-1. So if a measurement is
more noisy, it is given less weightage. So we get a better result. BLUE is
optimal (i.e. MVUE only if noise is Gaussian (derivation of BLUE important
points marked star). Also here we have assumed that the data model is linear.
But if it is actually not linear, then also the estimator is not optimal even if the
noise is Gaussian. I think in linear models in the previous paragraph also if a
C-1 term comes, it is not efficient then.
MLE: If pdf is known then first we try to find MVUE and if not found we go for
MLE. So if pdf is known MLE is the best estimator. It is optimal for large
enough data and it is a “Turn – the – Crank” method. So MLE is asymptotically
unbiased and optimal. For MVU, we don’t have such method. For small data, it
is not optimal and it is computationally complex also. It is found by finding
the parameter values which maximizes the log likelihood function. Example
for MLE when MVUE does not exist is dc level with variance of noise as the
DC value A. In this case it is asymptotically efficient also. If a truly efficient
Estimator exists, then the ML procedure finds it. i.e. θ̂ML ~ 𝒩(θ, I-1(θ). The
size of data, i.e. N is found by Monte Carlo Simulations.
E(A4) = μ4 + 3σ4 + 6 μ2 σ2.
If we find MLE for phase estimator (here we cannot find MVUE), we get the
implementation of a correlator. It comes as a tan-1 of the ratio of (Q/I). (We
find MLE by minimizing something like error (Just see the example of phase
estimation to know how (star)). In invariance property of MLE (i.e. MLE for
transformed parameter α=g(θ)), this can be a one – to – one transformation
or not. If it is not one-to-one, then we have to take the modified likelihood
function (i.e. we take likelihood functions for all the values of θ that maps to α
and find the maximum of these and tries to maximize that to find α. See text
for the property (Theorem 7.2)). If we do not have a closed form expression
for the MLE we go for numerical methods like a Brute force method
(computing p(x; θ) on a fine grid of values) or Greedy max algorithm (but
this may not converge, sometimes will converge but to a local minimum, so
initial guess is important) or for Newton-Raphson method (equation star).
For vector parameter case also study Theorem 7.3. Vector MLE is also
asymptotically unbiased and efficient and invariance property also holds. If
model is linear and Gaussian, we need not go for MLE because it will give the
MVU estimator itself. i.e. θ̂ = (HTC-1H)-1HTC-1X and variance is (HTC-1H)-1.
Least Squares: If statistical information about noise is not available, we

cannot use any of the above methods to find the estimator. In such cases, we
go LS method. Among all sub optimal estimator, LS is the best. In this, we take
the noise as model error e[n] need not be modeled statistically. In this we
minimize the error ∑𝑁−1𝑛=0 (x[n] − s[n]) . If error is a WGN, then LS is MVU. We
2
cannot establish that it an efficient estimator because we do not know the pdf
and MVU. Weighted LS is ∑𝑁−1 𝑛=0 wn(x[n] − s[n]) , wn is weight given to
2
deemphasize less important samples. Weighted LS is more good than the

conventional LS. LS can be applied for both linear and non-linear cases. In
Linear parameter fitting model is linear i.e. x=H θ + e, S = H θ. We assume
that H is full rank otherwise there is multiple parameter vectors mapping to
the same S. Linear does not imply fitting a line to data, also S can be like other
form like S = A + Bn. For linear case, the equation for error or cost function
can be written as J(θ) = ∑𝑁−1 𝑛=0 wn(x[n] − s[n]) = (x - Hθ) (x - Hθ) which
2 T
when minimized gives H Hθ̂ = H X called LS Normal Equations. So θ̂ =

T T
(HTH)-1HTX and ŜLS = H(HTH)-1HTX. If noise is white and Gaussian, then ML is

LS. But for ML we can say whether it is asymptotically efficient or optimal but
for LS we can do this only id statistical information is available. LS can never
be considered optimal since we do not have the statistical information (we go
for LS only when statistical information is available). The resulting LS cost for
linear LS is Jmin = XT[I - H(HTH)-1HT]X and 0≤ Jmin≤||X||2. For weighted linear
LS, J(θ) = (x - Hθ)TW(x - Hθ) which gives θ̂WLS = (HTWH)-1HTWX and Jmin =
XT[W - WH(HTWH)-1HTW]X. Even though there is no true reason, many
people use inverse covariance matrix as the weight. So this makes WLS looks
like BLUE.
In Geometry of LS, the important points to be remembered is that, error
vector is perpendicular to all H and from this also we can derive the LS
estimate. i.e. HTԐ = 0 and Ŝ is the projection of X on to range of H and
projection matrix PH is idempotent and symmetric. PH = H(HTH)-1HT. LS
minimizes the length of the error vector between the data and the signal
estimate i.e. Ԑ = X - Ŝ. Ŝ must lie right below X. If H has orthonormal columns,
then θ̂LS = (HTH)-1HTX = I-1HTX = HTX and θ̂i = hiTX, so 𝑆̂ = ∑𝑃𝑖=0 θ̂i hi =
∑𝑃𝑖=0(ℎiTX) hi. i.e. we can first find the projection onto each 1-D subspace
independently, then add these independently derived results. Ŝ must lie right
below x (figures marked star).
Order Recursive LS: Here we are trying to find the best model for s[n] in
order to reduce Jmin to the required level. If the order is equal to the number
of data points, then error is zero but it becomes complex. So we should find
the order such that the error is just below the acceptable level. Sometimes
each portion may need a different order (called adaptive fitting). Increase the
order only if cost (error) reduction is significant. In order recursive,
recursion is based on the order; parameters are not changing. If we have
computed p – order model, we can compute P+1 order model recursively.
Hp+1 =[Hp hp+1]. If all hi’s are perpendicular, then it is easy.
Sequential LS: Here Model order is fixed but data length increases. We find
the general expression for this by first finding the expression for DC level
case and then writing the general result without proof (results marked star in
Estimation Notebook and Adaptive Notebook).
Bayesian Philosophy: In this we consider the parameter as random. We go for

this when we have some prior knowledge about the parameter. Here we
consider the joint pdf. Here variance can’t depend on θ since expectation is
w.r.t. joint pdf. In Monte Carlo simulations each run is done at randomly
chosen θ and averaging is done over data and θ values. In classical
approaches we considered θ as deterministic and variance of estimate should
depend on θ. Also in Monte Carlo simulations each run is done at same θ and
averaging is done over data only. Bayesian is useful when Classical MVU does
not exist because of non-uniformity of minimal variance and also to combat
the signal estimation problem. E.g.: Wiener filter is a Bayesian method to
combat this signal estimation problem.
Summary: MVU estimator is the best (optimal) and it is efficient when it has
the minimum possible variance i.e. CRLB. If we can write the model in linear
form, we can always find an efficient MVU estimator. If it is not possible to
find a linear model or to find the MVU estimator using first derivative
expression or if MVU estimator does not exist, it is still possible to find an
estimator. This is BLUE which is found by restricting the estimator to be
linear to data. But it need not be optimal. If it has to be optimal, then it should
be an MVU estimator or in other words MVU estimator has to be linear. BLUE
is optimal only if noise is Gaussian (doubt - why?).

Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation

Uploaded by

Copyright:

Available Formats

ESTIMATION AND DETECTION THEORY

All the statistical information needed for an estimation problem is captured

Estimator performance is analyzed through expectation of estimate (same as

Sample mean estimator is an optimal estimator for dc level with noise,

Least Squares: If statistical information about noise is not available, we

deemphasize less important samples. Weighted LS is more good than the

when minimized gives H Hθ̂ = H X called LS Normal Equations. So θ̂ =

(HTH)-1HTX and ŜLS = H(HTH)-1HTX. If noise is white and Gaussian, then ML is

Bayesian Philosophy: In this we consider the parameter as random. We go for

You might also like