Probability & Statistics Unit 2-5

UNIT-2
ESTIMATION THEORY
The Principle of Least Squares
The best-fitting line is the one with the smallest sum of squared
residuals.
Assumptions:
o Other variables that affect the response are controlled.
o Residuals are independent and normally distributed, with a mean of zero

and constant variance.
o No uncertainty in x.
Reading a Calibration Curve
compute concentrations of unknowns from fitted parameters (e. g. m, b)
error estimates can be obtained by propagation (or statistical formula)
-----------------------------------------------------------------------------------------------
--
Regression
What Does Regression Mean?

A statistical measure that attempts to determine the strength of the relationship
between one dependent variable (usually denoted by Y) and a series of other
changing variables (known as independent variables).
Investopedia explains Regression

The two basic types of regression are linear regression and multiple regression. Linear
regression uses one independent variable to explain and/or predict the outcome of Y,
while multiple regressions uses two or more independent variables to predict the
outcome. The general form of each type of regression is:
Linear Regression: Y = a + bX + u
Multiple Regression: Y = a + b1X1 + b2X2 + B3X3 + ... + BtXt + u
Where:
Y= the variable that we are trying to predict
X= the variable that we are using to predict Y
a= the intercept
b= the slope
u= the regression residual.
In multiple regression the separate variables are differentiated by using subscripted
numbers.
Regression takes a group of random variables, thought to be predicting Y, and tries to

find a mathematical relationship between them. This relationship is typically in the
form of a straight line (linear regression) that best approximates all the individual
data points. Regression is often used to determine how much specific factors such as
the price of a commodity, interest rates, particular industries or sectors influence the
price movement of an asset.
What is Regression Testing?
Introduction:
This article attempts to take a close look at the process and techniques in Regression
Testing.
What is Regression Testing?
If a piece of Software is modified for any reason testing needs to be done to ensure
that it works as specified and that it has not negatively impacted any functionality
that it offered previously. This is known as Regression Testing.
Regression Testing attempts to verify:
- That the application works as specified even after the

changes/additions/modification were made to it
- The original functionality continues to work as specified even after

changes/additions/modification to the software application
- The changes/additions/modification to the software application have not introduced

any new bugs
When is Regression Testing necessary?
Regression Testing plays an important role in any Scenario where a change has been
made to a previously tested software code. Regression Testing is hence an important
aspect in various Software Methodologies where software changes enhancements
occur frequently.
Any Software Development Project is invariably faced with requests for changing
Design, code, features or all of them.
Some Development Methodologies embrace change.
For example Extreme Programming Methodology advocates applying small

incremental changes to the system based on the end user feedback.
Each change implies more Regression Testing needs to be done to ensure that the
System meets the Project Goals.
Why is Regression Testing important?
Any Software change can cause existing functionality to break.

Changes to a Software component could impact dependent Components.
It is commonly observed that a Software fix could cause other bugs.
All this affects the quality and reliability of the system. Hence Regression Testing,
since it aims to verify all this, is very important.
Making Regression Testing Cost Effective:
Every time a change occurs one or more of the following scenarios may occur:
- More Functionality may be added to the system
- More complexity may be added to the system
- New bugs may be introduced
- New vulnerabilities may be introduced in the system
- System may tend to become more and more fragile with each change
After the change the new functionality may have to be tested along with all the
original functionality.
With each change Regression Testing could become more and more costly.
To make the Regression Testing Cost Effective and yet ensure good coverage one or
more of the following techniques may be applied:
- Test Automation:
If the Test cases are automated the test cases may be executed using scripts after
each change is introduced in the system. The execution of test cases in this way helps
eliminate oversight, human errors,. It may also result in faster and cheaper execution
of Test cases. However there is cost involved in building the scripts.
- Selective Testing:
Some Teams choose execute the test cases selectively. They do not execute all the
Test Cases during the Regression Testing. They test only what they decide is relevant.
This helps reduce the Testing Time and Effort.
Regression Testing What to Test?
Since Regression Testing tends to verify the software application after a change has
been made everything that may be impacted by the change should be tested during
Regression Testing. Generally the following areas are covered during Regression
Testing:
- Any functionality that was addressed by the change
- Original Functionality of the system
PARTIAL CORRELATION ANALYSIS
Partial correlation analysis involves studying the linear relationship between two
variables after excluding the effect of one or more independent factors.
by Amit Choudhury (2010)
Simple correlation does not prove to be an all-encompassing technique especially

under the above circumstances. In order to get a correct picture of the relationship
between two variables, we should first eliminate the influence of other variables.
For example, study of partial correlation between price and demand would involve
studying the relationship between price and demand excluding the effect of money
supply, exports, etc.
WHAT CORRELATION DOES NOT PROVIDE
Generally, a large number of factors simultaneously influence all social and natural
phenomena. Correlation and regression studies aim at studying the effects of a large
number of factors on one another.
In simple correlation, we measure the strength of the linear relationship between two
variables, without taking into consideration the fact that both these variables may be
influenced by a third variable.
For example, when we study the correlation between price (dependent variable) and
demand (independent variable), we completely ignore the effect of other factors like
money supply, import and exports etc. which definitely have a bearing on the price.
RANGE
The correlation co-efficient between two variables X1 and X2, studied partially after
eliminating the influence of the third variable X3 from both of them, is the partial
correlation co-efficient r12.3.
Simple correlation between two variables is called the zero order co-efficient since in
simple correlation, no factor is held constant. The partial correlation studied between
two variables by keeping the third variable constant is called a first order co-efficient,
as one variable is kept constant. Similarly, we can define a second order co-efficient
and so on. The partial correlation co-efficient varies between -1 and +1. Its
calculation is based on the simple correlation co-efficient.
The partial correlation analysis assumes great significance in cases where the
phenomena under consideration have multiple factors influencing them, especially in
physical and experimental sciences, where it is possible to control the variables and
the effect of each variable can be studied separately. This technique is of great use in
various experimental designs where various interrelated phenomena are to be
studied.
LIMITATIONS
However, this technique suffers from some limitations some of which are stated
below.
The calculation of the partial correlation co-efficient is based on the simple correlation
co-efficient. However, simple correlation coefficient assumes linear relationship.
Generally this assumption is not valid especially in social sciences, as linear
relationship rarely exists in such phenomena.
As the order of the partial correlation co-efficient goes up, its reliability goes down.
Its calculation is somewhat cumbersome often difficult to the mathematically

uninitiated (though softwares have made life a lot easier).
MULTIPLE CORRELATION
Another technique used to overcome the drawbacks of simple correlation is multiple

regression analysis.
Here, we study the effects of all the independent variables simultaneously on a

dependent variable. For example, the correlation co-efficient between the yield of
paddy (X1) and the other variables, viz. type of seedlings (X2), manure (X3), rainfall
(X4), humidity (X5) is the multiple correlation co-efficient R1.2345 . This co-efficient
takes value between 0 and +1.
The limitations of multiple correlations are similar to those of partial correlation. If

multiple and partial correlation are studied together, a very useful analysis of the
relationship between the different variables is possible.
Multiple and Partial Correlation
I. With only two predictors

A. The beta weights can be computed as follows:
(1)
(2)
B. Multiple R can be computed several ways. From the simple
correlations, as
(3)
or from the beta weights and validities as
(4)
C. Semi partial correlations in general equal the square root of R2

complete minus R2 reduced. These are called semi partial correlations
because the variance of the other controlled variable(s) is removed
from the predictor, but not from the criterion. Therefore, in the two
predictor case, they are equal
(5)
Using Equation 3 above and some algebra
So the relationship between the semi partial correlation and the beta
weight from Equations 1 and 7 is
(6)
(7)
(8)
(9)
D. Partial correlations differ from semi partial correlations in that the

partially (or covered) variance is removed from both the criterion and
the predictor. The squared partial correlation is equal to R2 complete
minus R2 reduced divided by 1 minus R2 reduced. In the two variable
case the equation is
(10)
Again using Equation 3 and some more algebra
(11)
(12)
(13)
The relation between partial correlations and beta weights for the two
predictor problem turns out to be
(14)
So semi partial correlations are directional but partial correlations are

nondirectional.
E. Following Cohen and Cohen (1975, p. 80), we can think of all these in
terms of what they call Ballentines (we can call Mickeys)
Here, the total Y variance is a+b+c+e = 1.
.
The semi partial correlations are:
And the partial correlations are:
II. With more than two predictors
A. First the relation between a multiple R and various partial r's.
(15)
This should remind the reader of stepwise multiple regression where each new
variable is entered while controlling the variance explained by earlier entered
variables. Therefore, if we could compute the higher order partial correlations,
we could do multiple regression by hand. A recurrence relationship allows us
to do just that, which is
(16)
Unfortunately, the work involved in solving all the necessary partial

correlations is about the same as the work required to solve the normal
equations in the first place, but at least each step is interpretable. Again in the
general case the relation between partial correlations and beta weights is
(17)
2.2 Estimation of the Parameters
Consider for now a rather abstract model where i = xifor some predictors xi. How do
we estimate the parameters and 2?
2.2.1 Estimation of
The likelihood principle instructs us to pick the values of the parameters that maximize
the likelihood, or equivalently, the logarithm of the likelihood function. If the
observations are independent, then the likelihood function is a product of normal
densities of the form given in Equation 2.1. Taking logarithms we obtain the normal log-
likelihood
n 1
logL(, 2) = - log(2 2) - (yi - i)2/2,
2 2
(2.5)
where i = xi.The most important thing to notice about this expression is that
maximizing the log-likelihood with respect to the linear parameters for a fixed value of
2 is exactly equivalent to minimizing the sum of squared differences between observed
and expected values, or residual sum of squares
RSS() = (yi - i)2 = (y-X)(y-X).
(2.6)
In other words, we need to pick values of that make the fitted values i = xi as close
as possible to the observed values yi.
Taking derivatives of the residual sum of squares with respect to and setting the
derivative equal to zero leads to the so-called normal equations for the maximum-
likelihood estimator [^()]
^ = Xy.
XX

If the model matrix X is of full column rank, so that no column is an exact linear
combination of the others, then the matrix of cross-products XX is of full rank and can
be inverted to solve the normal equations. This gives an explicit formula for the ordinary
least squares (OLS) or maximum likelihood estimator of the linear parameters:
^ = (XX)-1 Xy.

(2.7)
If X is not of full column rank one can use generalized inverses, but interpretation of the
results is much more straightforward if one simple eliminates redundant columns. Most
current statistical packages are smart enough to detect and omit redundancies
automatically.
There are several numerical methods for solving the normal equations, including
methods that operate on XX, such as Gaussian elimination or the Choleski
decomposition, and methods that attempt to simplify the calculations by factoring the
model matrix X, including Householder reflections, Givens rotations and the Gram-
Schmidt orthogonalization. We will not discuss these methods here, assuming that you
will trust the calculations to a reliable statistical package. For further details see
McCullagh and Nelder (1989, Section 3.8) and the references therein.
The foregoing results were obtained by maximizing the log-likelihood with respect to
for a fixed value of 2. The result obtained in Equation 2.7 does not depend on 2, and is
therefore a global maximum.
For the null model X is a vector of ones, XX = n and Xy = yi are scalars and [^()] =
[y], the sample mean. For our sample data [y] = 14.3. Thus, the calculation of a
sample mean can be viewed as the simplest case of maximum likelihood estimation in a
linear model.
2.2.2 Properties of the Estimator

The least squares estimator [^()] of Equation 2.7has several interesting properties. If
the model is correct, in the (weak) sense that the expected value of the response Yigiven
the predictors xi is indeed xi, then the OLS estimator is unbiased, its expected value
equals the true parameter value:
^ ) = .
E(

(2.8)
It can also be shown that if the observations are uncorrelated and have constant
variance 2, then the variance-covariance matrix of the OLS estimator is
^ ) = (XX)-1 2.
var(

(2.9)
This result follows immediately from the fact that [^()]is a linear function of the data y
(see Equation 2.7), and the assumption that the variance-covariance matrix of the data
is var(Y) = 2 I, where I is the identity matrix.
A further property of the estimator is that it has minimum variance among all unbiased
estimators that are linear functions of the data, i.e. it is the best linear unbiased
estimator (BLUE). Since no other unbiased estimator can have lower variance for a fixed
sample size, we say that OLS estimators are fully efficient.
Finally, it can be shown that the sampling distribution of the OLS estimator [^()] in
large samples is approximately multivariate normal with the mean and variance given
above, i.e.
^ Np( , (XX)-1 2).

Applying these results to the null model we see that the sample mean [y] is an
unbiased estimator of ,has variance 2/n, and is approximately normally distributed in
large samples.
All of these results depend only on second-order assumptions concerning the mean,
variance and covariance of the observations, namely the assumption that E(Y) = X and
var(Y) = 2 I.
Of course, [^()] is also a maximum likelihood estimator under the assumption of

normality of the observations. If YNn(X, 2I) then the sampling distribution of
[^()] is exactly multivariate normal with the indicated mean and variance.
The significance of these results cannot be overstated: the assumption of normality of

the observations is required only for inference in small samples. The really important
assumption is that the observations are uncorrelated and have constant variance, and
this is sufficient for inference in large samples.
2.2.3 Estimation of 2
Substituting the OLS estimator of into the log-likelihood in Equation 2.5 gives a profile
likelihood for 2
n 1
^
logL(2) = - log(22) - RSS( ) / 2 .

2 2
Differentiating this expression with respect to 2 (not )and setting the derivative to zero
leads to the maximum likelihood estimator
^ = RSS( ^ )/n.
2
This estimator happens to be biased, but the bias is easily corrected dividing by n-p
instead of n. The situation is exactly analogous to the use of n-1 instead of n when
estimating a variance. In fact, the estimator of 2 for the null model is the sample
variance, since [^()] = [y] and the residual sum of squares is RSS = (yi-[y])2.
Under the assumption of normality, the ratio RSS/2of the residual sum of squares to the
true parameter value has a chi-squared distribution with n-p degrees of freedom and is
independent of the estimator of the linear parameters. You might be interested to know
that using the chi-squared distribution as a likelihood to estimate 2 (instead of the
normal likelihood to estimate both and 2) leads to the unbiased estimator.
For the sample data the RSS for the null model is 2650.2 on 19 d.f. and therefore [^()]
= 11.81, the sample standard deviation.
Maximum Likelihood
Maximum likelihood, also called the maximum likelihood method, is the procedure of
finding the value of one or more parameters for a given statistic which makes the known
likelihood distribution a maximum. The maximum likelihood estimate for a parameter is
denoted .
For a Bernoulli distribution,
(1
)
so maximum likelihood occurs for . If is not known ahead of time, the likelihood
function is
(2
)
(3
)
(4
)
where or 1, and , ..., .
(5
)
(6
)
Rearranging gives
(7
)
so
(8
)
For a normal distribution,
(9)
(10
)
so
(11
)
and
(12
)
giving
(13
)
Similarly,
(14
)
gives
(15
)
Note that in this case, the maximum likelihood standard deviation is the sample
standard deviation, which is a biased estimator for the population standard deviation.
For a weighted normal distribution,
(16
)
(17
)
(18
)
gives
(19
)
The variance of the mean is then
(20
)
But
(21
)
so
(22
)
(23
)
(24
)
For a Poisson distribution,
(25
)
(26
)
(27
)
Method of Moments
The method of moments equates sample moments to parameter estimates. When
moment methods are available, they have the advantage of simplicity. The disadvantage
is that they are often not available and they do not have the desirable optimality
properties of maximum likelihood and least squares estimators.
The primary use of moment estimates is as starting values for the more precise
maximum likelihood and least squares estimates.
The Method
Suppose that we have a basic random experimentwith an observable, real-valued
random variableX. The distribution of X has k unknown parameters, or equivalently, a
parameter vector
a = (a1, a2, ..., ak)
taking values in a parameter space A Rk. As usual, we repeat the experiment n times
to generate a random sample of size nfrom the distribution of X.
(X1, X2, ..., Xn).
Thus, X1, X2, ..., Xnare independent random variables, each with the distribution of X.
The method of moments is a technique for constructing estimators of the parameters

that is based on matching the sample moments with the correspondingdistribution
moments. First, let
i(a) = E(X i| a)
denote the i'th momentof X about 0. Note that we are emphasizing the dependence of
these moments on the vector of parameters a. Note also that 1(a) is just the mean of
X, which we usually denote by . Next, let
Mi(X) = (X1i + X2i + + Xni) / n
denote the i'th sample moment. Note that we are emphasizing the dependence of the
sample moments on the sample X. Note also that M1(X) is just the ordinary sample
mean, which we usually just denote by Mn.
To construct estimators W1, W2, ..., Wkfor our unknown parameters a1, a2, ..., ak,
respectively, we attempt to solve the set of simultaneous equations
1(W1, W2, ..., Wk) = M1(X1, X2, ..., Xn)
2(W1, W2, ..., Wk) = M2(X1, X2, ..., Xn)
k(W1, W2, ..., Wk) = Mk(X1, X2, ..., Xn)
for W1, W2, ..., Wk in terms of X1, X2, ..., Xn. Note that we have k equations in k
unknowns, so there is hope that the equations can be solved.
Estimates for the Mean and Variance
1. Suppose that (X1,X2, ..., Xn) is a random sample of size nfrom a distribution with
unknown mean and variance d2. Show that the method of moments estimators for
and d2 are, respectively
a. Mn = (1 / n) j = 1, ..., n Xj.
b. Tn2 = (1 / n) j = 1, ..., n (Xj - Mn)2
Note that Mn is just the ordinary sample mean, but Tn2= [(n - 1) / n]Sn2where Sn2is the
usual sample variance. In the remainder of this subsection, we will compare the
estimatorsSn2and Tn2.
2. Show that bias(Tn2) = -d2 / n.
Thus, Tn2is negatively biased, and so on average underestimates d2.
3. Show thatTn2is asymptotically unbiased.
4. Show that
MSE(Tn2) = [(n - 1)2 / n3][d4- (n - 3)d4 / (n - 1)] + d4 / n2.
5. Show that the asymptotic relative efficiency of Tn2to Sn2is 1.
6. Suppose that the sampling distribution is normal. Show that in this case
a. MSE(Tn2) = (2n - 1)d4 / n2.
b. MSE(Sn2) = 2d4 / (n - 1).
c. MSE(Tn2) < MSE(Sn2) for n = 2, 3, ...
Thus, Sn2and Tn2are multiplies of one another; Sn2is unbiased but Tn2has smaller mean
square error.
7. Run the normal estimation experiment 1000 times, updating every 10 runs, for
several values of the parameters. Compare the empirical bias and mean square error
ofSn2and of Tn2to their theoretical values. Which estimator is better in terms of bias?
Which estimator is better in terms of mean square error?
There are several important one-parameter families of distributions for which the
parameter is the mean, including the Bernoulli distribution with parameter p and the
Poisson distribution with parameter . For these families, the method of moments
estimator of the parameter is M, the sample mean. Similarly, the parameters of the
normal distribution are and d2, so the method of moments estimators are M and Tn2.
Additional Exercises
8. Suppose that (X1,X2, ..., Xn) is a random sample from the gamma distribution with
shape parameter k and scale parameter b. Show that the method of moments estimators
of kand b are respectively
a. U = Mn2/ Tn2.
b. V = Tn2/ Mn .
9. Run the gamma estimation experiment 1000 times, updating every 10 runs for
several different values of the shape and scale parameter. Record the empirical bias and
mean square error in each case.
10. Suppose that (X1,X2, ..., Xn) is a random sample from the beta distribution with
parameters a and 1. Show that the method of moments estimator of a is Un = Mn/ (1 -
Mn ).
11. Run the beta estimation experiment 1000 times, updating every 10 runs, for several
different values of a. Record the empirical bias and mean square error in each case.
Draw graphs of the empirical bias and mean square error as a function of a.
12. Suppose that (X1,X2, ..., Xn) is a random sample from the Pareto distribution with
shape parameter a > 1. Show that the method of moments estimator of a is Un = Mn /
(Mn- 1).
UNIT-5
TIME SERIES
What Does Moving Average - MA Mean?

An indicator frequently used in technical analysis showing the average value of a
security's price over a set period. Moving averages are generally used to measure
momentum and define areas of possible support and resistance.
What is moving average?
Moving average is an indicator used in technical analysis that shows a stock's average
price over a certain period of time. It is good to show a stock's "momentum" and it's
propensity to move above or below a point. Generally moving average is plotted on a
graph alongside the stock's price. For example, a picture of a stock's moving average
along with its price is below (Dell's 50-day moving average):
How do you find moving average?
There are different types of moving average, and therefore different formulas to
calculate it. However, for any given moment in time, the moving average is the average
of the stock prices over the past x days, where x is the period that you are measuring.
For example, if the stock price on Monday was $3, the price on Tuesday was $5, and the
price on Wednesday was $7, the three-day moving average on Thursday would be $5, or
the average of the past three days.
Why is it important?
Moving average can be a better indicator of the stock's average price over time than the
stock price. The moving average curve is a much smoother version of the price curve
because it eliminates all of the sharp bumps that are caused by short deviations.
What are some different types of moving averages?
Two of the most important types of moving average are linear moving average (this is
generally just called "moving average") and exponential moving average. Linear moving
average is the more simple one, and just consists of the average derived by summing up
all of the stock prices and dividing by the number of stock prices. In an exponential
moving average, the more recent days are given exponentially more weight, and it uses
a much more complicated formula to find the average. There is no one formula because
one can weight the days differently. For example, in a ten day EMA, you could give the
last day a weight of 20%, the second-to-last day a weight of 14%, etc. Some common
time frames for both moving averages are 5, 10, 20, 50, 100, and 200 days.
How can I take advantage of the "crossover" principle of moving averages?
The "crossover" principle of moving averages is very important to investing. Whenever a

stock price goes below its moving average, that means that it has downward momentum
and it is not a good buy. If you have the stock, you should think about selling it.
Whenever a stock price goes above its moving average, that means that it has upward
momentum and it is a good buy. The Google graph with 50 day MA below gives an
example:
What is Exponential Smoothing?
This is a very popular scheme to produce a smoothed Time Series. Whereas in Single
Moving Averages the past observations are weighted equally, Exponential Smoothing
assigns exponentially decreasing weights as the observation get older.
In other words, recent observations are given relatively more weight in forecasting than
the older observations.
In the case of moving averages, the weights assigned to the observations are the same
and are equal to 1/N. In exponential smoothing, however, there are one or more
smoothing parameters to be determined (or estimated) and these choices determine the
weights assigned to the observations.
Exponential
smoothing
Lets now dig into the most popular forecasting methods: the simple exponential
smoothing model to forecast the next period for a time series without trend, and the
double exponential smoothing model taking into account a trend effect.
Single exponential smoothing
The single exponential smoothing is somewhat similar to the moving average methods
except that there is a single weighting factor called alpha, which can take a value
between 0 and 1.
The next period forecast (Fn+1 ) is then calculated as following:
Where:
Fn is the current period forecast
Alpha is the smoothing factor
An is the actual sales for the current period
If alpha is near 0, then the next period forecast is equal to the previous forecast, which
means that the model is less reactive.
On the opposite, if alpha is near 1, the next period forecast is equal to the previous
period actual sales, meaning that now the model is over reactive.
In summary, the reactivity of the model depends on alpha: when close to 1, the model is
very reactive and the latest periods are more important and when alpha is close to 0,
the model is less reactive so former periods are more important.
Tip: When alpha = 2 / (N+1) with N is the number of periods, we get the straight
moving average!
How to determine the smoothing factor (alpha)
You can determine alpha either on an empiric or scientific way. Its value allows to fine
tune the model sensitivity.
The empiric way basically is you in front of your computer playing with the model till it
fits the actual sales based on historical data. This way you get your golden number for
alpha.
The scientific method is using the standard error method for estimating alpha.
Lets take the Dow Jones index to see the results with 2 alpha values:
We can notice that the green curve with alpha equal to 0.2 is certainly not precise, and
the purple curve with alpha equal to 0.9 looks much better.
Having done the standard error method, it has confirmed that 0.9 is the best value for
alpha to fit this very instable data.
You can also see that the forecast data lag the actual data in the model whatever value
for alpha.
Autoregressive Process
An n-dimensional autoregressive process of order p, AR(p), is a stochastic process of

the form
Where a is an n-dimensional vector, they are n n matrices, and W is n-dimensional
white noise (see the notation conventions documentation). The name "autoregressive"
indicates that [1] defines a regression of on its own past values. In applications, AR(1)
and AR(2) processes are popular.
Exhibit 1 indicates a realization of the univariate AR(2) process
Use this to generate a corresponding realization of the AR(2) process
The autoregressive process
Consider a sensor with a characteristic response time tR, driven by a varying

signal Z(t). The sensor for example might be a thermometer that changes
temperature toward the air temperature X(t) to which it is exposed, so that if X(t)
is the temperature of the sensor then
(8.4
4)
The solution is
(8.4
5)
as can be verified by differentiation. X(t) is thus the convolution of Z(t) with a

transfer function h(u) given by
h(u)
(8.4
6)
The effect is the same as applying a filter to the input signal Z(t), and the
variance spectrum of X(t) can then be determined as in the preceding example.
In the case where Z(t) is a random variable or "white noise" source, the variance
spectrum from this source would be constant with frequency:
(8.4
7)
The transform of the decaying exponential function h(t) is

(8.4
8)
The variance spectrum is given by (8.42) as
(8.4
9)
And therefore the autocorrelation function, obtained from the Fourier transform
of (8.49), is a decaying exponential:

Probability &amp; Statistics Unit 2-5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability &amp; Statistics Unit 2-5

Uploaded by

Copyright:

Available Formats

UNIT-2

o Other variables that affect the response are controlled.

o Residuals are independent and normally distributed, with a mean of zero

Reading a Calibration Curve

compute concentrations of unknowns from fitted parameters (e. g. m, b)

error estimates can be obtained by propagation (or statistical formula)

What Does Regression Mean?

Investopedia explains Regression

Regression takes a group of random variables, thought to be predicting Y, and tries to

What is Regression Testing?

What is Regression Testing?

Regression Testing attempts to verify:

- That the application works as specified even after the

- The original functionality continues to work as specified even after

- The changes/additions/modification to the software application have not introduced

When is Regression Testing necessary?

Some Development Methodologies embrace change.

For example Extreme Programming Methodology advocates applying small

Any Software change can cause existing functionality to break.

It is commonly observed that a Software fix could cause other bugs.

Making Regression Testing Cost Effective:

Regression Testing What to Test?

- Any functionality that was addressed by the change

- Original Functionality of the system

PARTIAL CORRELATION ANALYSIS

by Amit Choudhury (2010)

Simple correlation does not prove to be an all-encompassing technique especially

WHAT CORRELATION DOES NOT PROVIDE

Its calculation is somewhat cumbersome often difficult to the mathematically

Another technique used to overcome the drawbacks of simple correlation is multiple

Here, we study the effects of all the independent variables simultaneously on a

The limitations of multiple correlations are similar to those of partial correlation. If

Multiple and Partial Correlation

I. With only two predictors

or from the beta weights and validities as

C. Semi partial correlations in general equal the square root of R2

Using Equation 3 above and some algebra

D. Partial correlations differ from semi partial correlations in that the

Again using Equation 3 and some more algebra

So semi partial correlations are directional but partial correlations are

Here, the total Y variance is a+b+c+e = 1.

II. With more than two predictors

A. First the relation between a multiple R and various partial r's.

Unfortunately, the work involved in solving all the necessary partial

2.2 Estimation of the Parameters

logL(, 2) = - log(2 2) - (yi - i)2/2,

RSS() = (yi - i)2 = (y-X)(y-X).

2.2.2 Properties of the Estimator

Of course, [^()] is also a maximum likelihood estimator under the assumption of

The significance of these results cannot be overstated: the assumption of normality of

For a Bernoulli distribution,

where or 1, and , ..., .

For a normal distribution,

For a weighted normal distribution,

The variance of the mean is then

For a Poisson distribution,

a = (a1, a2, ..., ak)

(X1, X2, ..., Xn).

The method of moments is a technique for constructing estimators of the parameters

Mi(X) = (X1i + X2i + + Xni) / n

1(W1, W2, ..., Wk) = M1(X1, X2, ..., Xn)

2(W1, W2, ..., Wk) = M2(X1, X2, ..., Xn)

k(W1, W2, ..., Wk) = Mk(X1, X2, ..., Xn)

b. Tn2 = (1 / n) j = 1, ..., n (Xj - Mn)2

Probability & Statistics Unit 2-5

Probability & Statistics Unit 2-5