Professional Documents
Culture Documents
7. Forecasting or prediction
8. Using the model for control or policy purposes.
Answer 2.
In regression analysis we are concerned with what is known as the statistical, not
functional or deterministic, dependence among variables, such as those of classical
physics.
In statistical relationships among variables we essentially deal with random or
stochastic4 variables, that is, variables that have probability distributions. In functional
or deterministic dependency, on the other hand, we also deal with variables, but these
variables are not random or stochastic. The dependence of crop yield on temperature,
rainfall, sunshine, and fertilizer, for example, is statistical in nature in the sense that
the explanatory variables, although certainly important, will not enable the agronomist
to predict crop yield exactly because of errors involved in measuring these variables as
well as a host of other factors (variables) that collectively affect the yield but may be
difficult to identify individually. Thus, there is bound to be some intrinsic or random
variability in the dependent-variable crop yield that cannot be fully explained no
matter how many explanatory variables we consider.
In deterministic phenomena, on the other hand, we deal with relationships of the type,
say, exhibited by Newtons law of gravity, which states: Every particle in the universe
attracts every other particle with a force directly proportional to the product of their
masses and inversely proportional to the square of the distance between them.
Symbolically, F = k(m1m2/r 2), where F = force, m1 and m2 are the masses of the two
particles, r = distance, and k = constant of proportionality. Another example is Ohms
law, which states: For metallic conductors over a limited range of temperature the
current C is proportional to the voltage V; that is, C = ( 1 k )V where 1 k is the
constant of proportionality. Other examples of such deterministic relationships are
Boyles gas law, Kirchhoffs law of electricity, and Newtons law of motion. In this
text we are not concerned with such deterministic relationships. Of course, if there are
errors of measurement, say, in the k of Newtons law of gravity, the otherwise
deterministic relationship becomes a statistical relationship. In this situation, force can
be predicted only approximately from the given value of k (and m1, m2, and r), which
contains errors. The variable F in this case becomes a random variable.
Answer 3.
Endogenous variable: A factor in a causal model or causal system whose value is
determined by the states of other variables in the system; contrasted with
an exogenous variable. Related but non-equivalent distinctions are those between
dependent and independent variables and between explanandum and explanans. A
factor can be classified as endogenous or exogenous only relative to a specification of
a model representing the causal relationships producing the outcome y among a set of
causal factors X (x1, x2, xk) (y = M(X)). A variable xj is said to be endogenous within
the causal model M if its value is determined or influenced by one or more of the
independent variables X(excluding itself). A purely endogenous variable is a factor
that is entirely determined by the states of other variables in the system. (If a factor is
purely endogenous, then in theory we could replace the occurrence of this factor with
the functional form representing the composition of xj as a function of X.) In real
causal systems, however, there can be a range of endogeneity. Some factors are
causally influenced by factors within the system but also by factors not included in the
model. So a given factor may be partially endogenous and partially exogenous
partially but not wholly determined by the values of other variables in the model.
Consider a simple causal systemfarming. The outcome we are interested in
explaining (the dependent variable or the explanandum) is crop output. Many factors
(independent variables, explanans) influence crop output: labor, farmer skill,
availability of seed varieties, availability of credit, climate, weather, soil quality and
type, irrigation, pests, temperature, pesticides and fertilizers, animal practices, and
availability of traction. These variables are all causally relevant to crop yield, in a
specifiable sense: if we alter the levels of these variables over a series of tests, the
level of crop yield will vary as well (up or down). These factors have real causal
influence on crop yield, and it is a reasonable scientific problem to attempt to assess
the nature and weight of the various factors. We can also notice, however, that there
are causal relations among some but not all of these factors. For example, the level of
pest infestation is influenced by rainfall and fertilizer (positively) and pesticide, labor,
and skill (negatively). So pest infestation is partially endogenous within this system
and partially exogenous, in that it is also influenced by factors that are external to this
system (average temperature, presence of pest vectors, decline of predators, etc.).
The concept of endogeneity is particularly relevant in the context of time series
analysis of causal processes. It is common for some factors within a causal system to
be dependent for their value in period n on the values of other factors in the causal
system in period n-1. Suppose that the level of pest infestation is independent of all
other factors within a given period, but is influenced by the level of rainfall and
fertilizer in the preceding period. In this instance it would be correct to say that
infestation is exogenous within the period, but endogenous over time.
(Note that tastes and preferences are abstract variables incapable of quantitative
measurement. For the sake of simplicity, we ignore them hereor assume them to be
unchanged. Else, we will have to take dummy variables for them, which wel take
later)
3. Specification of the model
This involves formulation of mathematical relationship among the variables, with
signs- negative/positive and values of coefficients of the variables. Signs depend on
the nature of variation of Qd with each of the regressors. It is + if variation is direct
and - if it is inverse. We can express the hypothesis as
Qd= a+bP+cPr+dY+u
Where a,b,c,d are coefficients whose values and signs we determine through collection
of data and u is a random variable representing the error term.
4. Estimation of the model
This is done through collection of statistical data for regressands and regressors.
5. Choice of appropriate economic technique
The techniques are classified into two groups:
a. Group A: Single Equation Technique: As in step 5 above, if the equation is single,
the econometric techniques employed are:
Qd = Qs (equilibrium market)
Whichever the form of equation, the choice of technique in each group depends on the
properties of the estimates of the coefficients. They are
1. Unbiased
2. Consistency
3. Efficiency
4. Sufficiency
Whichever technique possess most of the above properties, is considered the best
technique.
is an
have only one sample (i.e, one realization of the random variable), so we can
not assure anything about the distance between and . This fact leads us to
employ the concept of variance, or the variance-covariance matrix if we have a
vector of estimates. This concept measures the average distance between the
estimated value obtained from the only sample we have and its expected value.
From the previous argument we can deduce that, although the unbiasedness
property is not sufficient in itself, it is the minimum requirement to be satisfied
by an estimator.
Efficiency. An estimator is efficient if it is the minimum variance unbiased
estimator. The Cramer Rao inequality provides verification of efficiency, since it
establishes the lower bound for the variance-covariance matrix of any unbiased
estimator.
A property which is less strict than efficiency, is the so called best, linear unbiased
estimator (BLUE) property, which also uses the variance of the estimators.
BLUE. A vector of estimators is BLUE if it is the minimum variance linear
unbiased estimator. To show this property, we use the Gauss-Markov Theorem.
In the MLRM framework, this theorem provides a general expression for the
variance-covariance matrix of a linear unbiased vector of estimators. Then, the
comparison of this matrix with the corresponding matrix of
conclude that
(or
allows us to
) is BLUE.
and
are expressed
Unbiasedness.
Nevertheless, given that
is biased, this estimator can not be efficient, so we
focus on the study of such a property for . With respect to the BLUE
property, neither
nor
are linear, so they can not be BLUE.
Efficiency. The comparison of the variance of
element
of the matrix
(expression (2.63)) allows us to deduce
that this estimator does not satisfy the Cramer-Rao inequality, given
that
. Nevertheless, as Schmidt (1976) shows, there is no
unbiased estimator of
with a smaller variance, so it can be said that
efficient estimator.
is an
Note that the second part of (2.96) also means that the possible bias of
disappears as increases, so we can deduce that an unbiased estimator is also
an asymptotic unbiased estimator.
Consistency. An estimator is said to be consistent if it converges in
probability to the unknown parameter, that is to say:
(2.99)
Means that a consistent estimator satisfies the convergence in probability to a constant,
with the unknown parameter being such a constant.
Asymptotic efficiency A sufficient condition for a consistent asymptotically
normal estimator vector to be asymptotically efficient is that its asymptotic
variance-covariance matrix equals the asymptotic Cramer-Rao lower bound
(see Theil (1971)), which can be expressed as:
(2.108)
where
denotes the so-called asymptotic information matrix, while
is the
previously described sample information matrix (or simply, information matrix).
Asymptotic Properties of the OLS and ML Estimators of
Asymptotic unbiasedness. The OLS estimator of
satisfies the finite sample
unbiasedness property, according to result (2.86), so we deduce that it is
asymptotically unbiased.
With respect to the ML estimator of , which does not satisfy the finite sample
unbiasedness (result (2.87)), we must calculate its asymptotic expectation. On
the basis of the first definition of asymptotic unbiasedness, presented in (2.96),
we have:
(2.111)
so we conclude that
is asymptotically unbiased.
let 1 = 1.572 and 2 = 1.357 (let us not worry right now about how we got these
values; say, it is just a guess).1 Using these values and the X values given in
column (2) of Table 3.1, we can easily compute the estimated Yi given in column (3)
of the table as Y1i (the subscript 1 is to denote the first experiment). Now let us
conduct another experiment, but this time using the values of 1 = 3 and 2 = 1.
The estimated values of Yi from this experiment are given as Y2i in column (6) of
Table 3.1. Since the values in the two experiments are different, we get different
values for the estimated residuals, as shown in the table; u1i are the residuals from the
first experiment and u2i from the second experiment. The squares of these residuals
are given in columns (5) and (8). Obviously, as expected from (3.1.3), these residual
sums of squares are different since they are based on different sets of values.
Now which sets of values should we choose? Since the values of the first
experiment give us a lower Sumu2i (= 12.214) than that obtained from the values
of the second experiment (= 14), we might say that the s of the first experiment are
the best values. But how do we know? For, if we had infinite time and infinite
patience, we could have conducted many more such experiments, choosing different
sets of s each time and comparing the resulting Sumu2i and then choosing that set
of values that gives us the least possible value of Sumu2i assuming of course that
we have considered all the conceivable values of 1 and 2. But since time, and
certainly patience, are generally in short supply, we need to consider some shortcuts to
this trialand- error process. Fortunately, the method of least squares provides us such a
shortcut. The principle or the method of least squares chooses 1 and 2 in such a
manner that, for a given sample or set of data, Sumu2i is as small as possible. In other
words, for a given sample, the method of least squares
provides us with unique estimates of 1 and 2 that give the smallest possible value of
Sumu2i . How is this accomplished? This is a straight-forward exercise in differential
calculus. As shown in Appendix 3A, Section 3A.1, the process of differentiation yields
the following equations for estimating 1 and 2:
where n is the sample size. These simultaneous equations are known as the normal
equations.
Solving the normal equations simultaneously, we obtain
where .X and .Y are the sample means of X and Y and where we define xi = (Xi .X )
and yi = (Yi-.Y). Henceforth we adopt the convention of letting the lowercase letters
denote deviations from mean values.
The last step in (3.1.7) can be obtained directly from (3.1.4) by simple algebraic
manipulations.
Incidentally, note that, by making use of simple algebraic identities, formula (3.1.6)
for estimating 2 can be alternatively expressed as
The estimators obtained previously are known as the least-squares estimators, for
they are derived from the least-squares principle. Note the following numerical
properties of estimators obtained by the method of OLS: Numerical properties are
those that hold as a consequence of the use of ordinary least squares, regardless of
how the data were generated. Shortly, we will also consider the statistical properties
of OLS estimators, that is, properties that hold only under certain assumptions about
the way the data were generated.4 (See the classical linear regression model in
Section 3.2.)
I. The OLS estimators are expressed solely in terms of the observable (i.e., sample)
quantities (i.e., X and Y). Therefore, they can be easily computed.
II. They are point estimators; that is, given the sample, each estimator will provide
only a single (point) value of the relevant population parameter. (In Chapter 5 we will
consider the so-called interval estimators, which provide a range of possible values
for the unknown population
parameters.)
III. Once the OLS estimates are obtained from the sample data, the sample regression
line (Figure 3.1) can be easily obtained. The regression line thus obtained has the
following properties:
1. It passes through the sample means of Y and X. This fact is obvious from (3.1.7), for
the latter can be written as .Y = 1 + 2.X, which is shown diagrammatically in
Figure 3.2.
.Y = .Y
(3.1.10)
where use is made of the fact that Sum(Xi .X ) = 0. (Why?)
3. The mean value of the residuals ui is zero. From Appendix 3A,
Section 3A.1, the first equation is
2Sum (Yi 1 2Xi) = 0
But since ui = Yi 1 2Xi , the preceding equation reduces to
2Sum ui = 0, whence .u = 0.
As a result of the preceding property, the sample regression
Yi = 1 + 2Xi + ui
(2.6.2)
can be expressed in an alternative form where both Y and X are expressed as
deviations from their mean values.
4. The residuals ui are uncorrelated with Xi ; that is, Sumui Xi = 0.
Answer 7
Assumption 1: Linear regression model. The regression model is linear in the
parameters, as shown in (2.4.2)
Yi = 1 + 2Xi + ui (2.4.2)
Assumption 2: X values are fixed in repeated sampling. Values taken by the
regressor X are considered fixed in repeated samples. More technically, X is assumed
to be nonstochastic
Assumption 3: Zero mean value of disturbance ui. Given the value of X, the mean,
or expected, value of the random disturbance term ui is zero. Technically, the
conditional mean value of ui is zero. Symbolically, we have
E(ui |Xi) = 0
Assumption 4: Homoscedasticity or equal variance of ui. Given the value of X, the
variance of ui is the same for all observations. That is, the conditional variances of ui
are identical.
Symbolically, we have
var (ui |Xi) = E[ui E(ui |Xi)]2
= E(ui2 | Xi ) because of Assumption 3
= 2
where var stands for variance.
Assumption 5: No autocorrelation between the disturbances. Given any two X
values, Xi and Xj (i _= j), the correlation between any two ui and uj (i _= j) is zero.
Symbolically,
cov (ui, uj |Xi, Xj) = E{[ui E(ui)] | Xi }{[uj E(uj)] | Xj }
= E(ui |Xi)(uj | Xj) (why?)
=0
where i and j are two different observations and where cov means covariance.
Assumption 6: Zero covariance between ui and Xi, or E(uiXi) = 0. Formally,
cov (ui, Xi) = E[ui E(ui)][Xi E(Xi)]
= E[ui (Xi E(Xi))] since E(ui) = 0
= E(uiXi) E(Xi)E(ui) since E(Xi) is nonstochastic (3.2.6)
= E(uiXi) since E(ui) = 0
= 0 by assumption
Assumption 7: The number of observations n must be greater than the number of
parameters to be estimated. Alternatively, the number of observations n must be
greater than the number of explanatory variables.
Assumption 8: Variability in X values. The X values in a given sample must not all
be the
same. Technically, var (X) must be a finite positive number.
Assumption 9: The regression model is correctly specified. Alternatively, there is
no
specification bias or error in the model used in empirical analysis.
Assumption 10: There is no perfect multicollinearity. That is, there are no perfect
linear
relationships among the explanatory variables.
Notable among the irrelevance-of-assumptions thesis is Milton Friedman. To him,
unreality of assumptions is a positive advantage: to be important . . . a hypothesis
must be descriptively false in its assumptions. One may not subscribe to this
viewpoint fully, but recall that in any scientific study we make certain assumptions
because they facilitate the development of the subject matter in gradual steps, not
because they are necessarily realistic in the sense that they replicate reality exactly. As
one
author notes, . . . if simplicity is a desirable criterion of good theory, all good theories
idealize and oversimplify outrageously.
Answer 8
Homoscedasticity
The first two Gauss-Markov conditions state that the disturbance terms u1, u2, ..., un
in the n observations potentially come from probability distributions that have 0 mean
and the same variance. Their actual values in the sample will sometimes be positive,
sometimes negative, sometimes relatively far from 0, sometimes relatively close, but
there will be no a priori reason to anticipate a particularly erratic value in any given
observation. To put it another way, the probability of u reaching a given positive (or
negative) value will be the same in all observations. This condition is known as
homoscedasticity, which means "same dispersion".
y = + x + u,
Heteroscedasticity
Here the variance of the potential distribution of the disturbance term is increasing as
x increases. This does not mean that the disturbance term will necessarily have a
particularly large (positive or negative) value in an observation where x is large, but it
does mean that the a priori probability of having an erratic value will be relatively
high. This is an example of heteroscedasticity, which means "differing dispersion".
If heteroscedasticity is present, the OLS estimators are inefficient because you could,
at least in principle, find other estimators that have smaller variances and are still
unbiased.
2nd Part:
Answer 17
Macro Economics
It suffers from small sample problem.
Inaccuracy brought in available data
due to frequent revisions necessitated
by estimated data differing from actual
ones.
Available data have low frequency
Data are assumed to follow normal
distribution.
Seasonability of data is not prominent.
Answer 18
Financial Econometrics
It doesnt suffer from small sample
problem.
It does not exist in Financial data.
process, and i be some point in time after the start of that process. (i may be an integer for a discretetime process or a real number for a continuous-time process.) Then Xi is the value (or realization)
produced by a given run of the process at time i. Suppose that the process is further known to have
defined values for mean i and variance i2 for all times i. Then the definition of the autocorrelation
between times s and t is
where "E" is the expected value operator. Note that this expression is not well-defined for all time
series or processes, because the variance may be zero (for a constant process) or infinite. If the
function R is well-defined, its value must lie in the range [1, 1], with 1 indicating perfect correlation
and 1 indicating perfect anti-correlation.
2. Autoregression
yt=c+1yt1+2yt2++pytp+et,
where c is a constant and et is white noise. This is like a multiple regression but
with lagged values of yt as predictors. We refer to this as an AR(p) model.
Autoregressive models are remarkably flexible at handling a wide range of different time
series patterns. The two series in Figure 8.5 show series from an AR(1) model and an AR(2)
model. Changing the parameters 1,,p results in different time series patterns. The
variance of the error term et will only change the scale of the series, not the patterns.
3. Partial
For a given stochastic process one is often interested in the connection between two
random variables of a process at different points in time. One way to measure a linear
relationship is with the ACF, i.e., the correlation between these two variables. Another
way to measure the connection between
and
is to filter out of
and
th order is defined as
and