Panel Data

The types of data that are generally available for empirical analysis,
namely, time series, cross section, and panel.

In time series data we observe the values of one or more variables
over a period of time (e.g., GDP for several quarters or years).
In cross-section data, values of one or more variables are collected for
several sample units, or entities, at the same point in time.
In panel data the same cross-sectional unit (say a family or a firm or a
state) is surveyed over time. In short, panel data have space as well
as time dimensions.
There are other names for panel data, such as pooled data (pooling of time
series and cross-sectional observations), combination of time series and
cross-section data, micropanel data, longitudinal data (a study over
time of a variable or group of subjects), event history analysis (e.g.,
studying the movement over time of subjects through successive states or
conditions), cohort analysis.
the following advantages of panel data
1. Since panel data relate to individuals, firms, states, countries, etc.,
over time, there is bound to be heterogeneity in these units. The
techniques of panel data estimation can take such heterogeneity
explicitly into account by allowing for individual-specific variables,
such as individuals, firms, states, and countries.
2. By combining time series of cross-section observations, panel data
give more informative data, more variability, less collinearity among
variables, more degrees of freedom and more efficiency.
3. By studying the repeated cross section of observations, panel data are
better suited to study the dynamics of change. Spells of
unemployment, job turnover, and labor mobility are better studied with
panel data.
4. Panel data can better detect and measure effects that simply cannot
be observed in pure cross-section or pure time series data. For
example, the effects of minimum wage laws on employment and
earnings can be better studied if we include successive waves of
minimum wage increases in the federal and/or state minimum wages.
5. Panel data enables us to study more complicated behavioral models.
For example, phenomena such as economies of scale and technological
change can be better handled by panel data than by pure cross-section
or pure time series data.
6. By making data available for several thousand units, panel data can
minimize the bias that might result if we aggregate individuals or firms
into broad aggregates.
BALANCED PANEL : If each cross-sectional unit has the same number of

time series observations, then such a panel (data) is called a balanced
panel.
If the number of observations differs among panel members, we call such a
panel an unbalanced panel.
In a short panel the number of cross-sectional subjects, N, is greater than the
number of time periods, T.
In a long panel, it is T that is greater than N.
1. Pooled OLS model. We simply pool total of all the observations and
estimate a grand regression, neglecting the cross-section and time
series nature of our data.
ThesimplestwayistopoolalltheobservationstogetherandruntheOLSregressionmodel
However,theproblemwiththisapproachisthatpooledOLSisignoringtheheterogeneityor
individualitythatexistsamongdifferentvariables.
2. The fixed effects least squares dummy variable (LSDV) model.
Here we pool total of all the observations, but allow each cross-section
unit (i.e., variable in our example) to have its own (intercept) dummy
variable.
3. The fixed effects within-group model. Here also we pool total of all
the observations, but for each variable we express each variable as a
deviation from its mean value and then estimate an OLS regression on such
mean-corrected or de-meaned values.
4. The random effects model (REM). Unlike the LSDV model, in which
we allow each variable to have its own (fixed) intercept value, we assume
that the intercept values are a random drawing from a much bigger
population of variables.
Stationary
A stochastic process is said to be stationary if its mean and variance are

constant over time and the value of the covariance between the two time
periods depends only on the distance or gap or lag between the two time
periods and not the actual time at which the covariance is computed.
Stochastic process is known as a weakly stationary, or covariance
stationary, or second-order stationary, or wide sense, stochastic
process.
In short, if a time series is stationary, its mean, variance,and autocovariance
(at various lags) remain the same no matter at what point we measure them;
that is, they are time invariant. Such a time series will tend to return to its
mean (called mean reversion) and fluctuations around this mean
(measured by its variance) will have a broadly constant amplitude.
If a time series is not stationary in the sense just defined, it is called a
nonstationary time series (keep in mind we are talking only about weak
stationarity). In other words, a nonstationary time series will have a time
varying mean or a time-varying variance or both.
a purely random, or white noise, process. We call a stochastic process
purely random if it has zero mean, constant variance 2, and is serially
uncorrelated
RANDOM WALK MODEL -- in stationary time series, one often encounters
nonstationary time series, the classic example being the random walk
model (RWM)
It is often said that asset prices, such as stock prices or exchange rates,
follow a random walk; that is, they are nonstationary.
two types of random walks: (1) random walk without drift (i.e., no constant or
intercept term) and
(2) random walk with drift (i.e., a constant term is present).
UNIT ROOT TEST tests whether a time series variable is non-stationary using
an autoregressive model. A well-known test that is valid in large samples is
the augmented DickeyFuller test. The optimal finite sample tests for a unit root in
autoregressive models were developed by Denis Sargan and
Alok Bhargava. Another test is the PhillipsPerron test. These tests use the existence
of a unit root as the null hypothesis.
1. Regression analysis based on time series data implicitly assumes that the underlying
time series are stationary. The classical t tests, F tests, etc. are based on this
assumption.
2. In practice most economic time series are nonstationary.
3. A stochastic process is said to be weakly stationary if its mean, variance, and
autocovariances are constant over time (i.e., they are timeinvariant).
4. At the informal level, weak stationarity can be tested by the correlogram of a time
series, which is a graph of autocorrelation at various lags. For stationary time series,
the correlogram tapers off quickly, whereas for nonstationary time series it dies off
gradually. For a purely random series, the autocorrelations at all lags 1 and greater
are zero.
5. At the formal level, stationarity can be checked by finding out if the time series
contains a unit root. The DickeyFuller (DF) and augmented DickeyFuller (ADF)
tests can be used for this purpose.
6. An economic time series can be trend stationary (TS) or difference stationary
(DS). A TS time series has a deterministic trend, whereas a DS time series has a
variable, or stochastic, trend. The common practice of including the time or trend
variable in a regression model to detrend the data is justifiable only for TS time
series. The DF and ADF tests can be applied to determine whether a time series is TS
or DS.
7. Regression of one time series variable on one or more time series variables often can
give nonsensical or spurious results. This phenomenon is known as spurious
regression. One way to guard against it is to find out if the time series are
cointegrated.
8. Cointegration means that despite being individually nonstationary, a linear combination
of two or more time series can be stationary. The EG, AEG, and CRDW tests can be used to
find out if two or more time series are cointegrated.
9. Cointegration of two (or more) time series suggests that there is a long-run, or
equilibrium, relationship between them.
10. The error correction mechanism (ECM) developed by Engle and Granger is a means
of reconciling the short-run behavior of an economic variable with its long-run behavior.
11. The field of time series econometrics is evolving. The established results and tests are in
some cases tentative and a lot more work remains.
An important question that needs an answer is why some economic time series are
stationary and some are nonstationary.
Forecasting
In general, forecasting is the act of predicting the future

In econometrics, forecasting is the estimation of the expected value
of a dependent variable for observations that are not part of the
same data set
In most forecasts, the values being predicted are for time periods in
the future, but cross-sectional predictions of values for countries
or people not in the sample are also common
To simplify terminology, the words prediction and forecast will be

used interchangeably in this chapter
Some authors limit the use of the word forecast to out-ofsample prediction for a time series
Econometric forecasting generally uses a single linear equation to
predict or forecast
Our use of such an equation to make a forecast can be summarized
into two steps:
1. Specify and estimate an equation that has as its dependent
variable the item that we wish to forecast:
2. Obtain values for each of the independent variables for the

observations for which we want a forecast and substitute them into our
forecasting equation:
The forecasts generated in the previous section are quite simple,

however, and most actual forecasting involves one or more
additional questionsfor example:
1.
Unknown Xs: It is unrealistic to expect to know the values for the
independent variables outside the sample
What happens when we dont know the values of the
independent variables for the forecast period?
2. Serial Correlation: If there is serial correlation involved,
the forecasting equation may be estimated with GLS
How should predictions be adjusted when forecasting
equations are estimated with GLS?
3. Confidence Intervals: All the previous forecasts were single
values, but such single values are almost never exactly right, so maybe
it would be more helpful if we
forecasted a confidence interval
instead
How can we develop these confidence intervals?
4. Simultaneous Equations Models: many economic and business
equations are part of simultaneous models
How can we use an independent variable to forecast a
dependent variable when we know that a change in value
of the dependent variable will change, in turn, the value of
the independent variable that we used to make the
forecast?
Conditional Forecasting (Unknown X Values for the Forecast

Period)
Unconditional forecast: all values of the independent variables are

known with certainty
This is rare in practice
Conditional forecast: actual values of one or more of the
independent variables are not known
This is the more common type of forecast
The careful selection of independent variables can sometimes help
avoid the need for conditional forecasting
This opportunity can arise when the dependent variable can be
expressed as a function of leading indicators:
A leading indicator is an independent variable the
movements of which anticipate movements in the dependent
variable
The best known leading indicator, the Index of Leading
Economic Indicators, is produced each month
The techniques we use to test hypotheses can also be adapted to
create forecasting confidence intervals
Given a point forecast,
all we need to generate a confidence
interval around that forecast are tc, the critical t-value (for the
desired level of confidence), and SF, the estimated standard error of
the forecast:
The critical t-value, tc, can be found in Statistical Table (for a two-tailed
test with T-K-1 degrees of freedom)
Lastly, the standard error of the forecast, SF, for an equation with
just one independent variable, equals the square root of the forecast
error variance:
)
where:
s2
=
T
=
XT+1 =
=
ARIMA
the
the
the
the
estimated variance of the error term

number of observations in the sample
forecasted value of the single independent variable
arithmetic mean of the observed Xs in the sample
1.
2.
ARIMA is a highly refined curve-fitting device that uses current

and past values of the dependent variable to produce often
accurate short-term forecasts of that variable
Examples of such forecasts are stock market price predictions
created by brokerage analysts (called chartists or
technicians) based entirely on past patterns of movement of
the stock prices
If ARIMA models thus essentially ignores economic theory (by
ignoring traditional explanatory variables), why use them?
The use of ARIMA is appropriate when:
little or nothing is known about the dependent variable being
forecasted,
the independent variables known to be important cannot be
forecasted effectively
all that is needed is a one or two-period forecast
The ARIMA approach combines two different specifications

(called processes) into one equation:
An autoregressive process (AR):
expresses a dependent variable as a function of past values of the
dependent variable
This is similar to the serial correlation error term function and
the dynamic model
a moving average process (MA):
expresses a dependent variable as a function of past values of
the error term
Such a function is a moving average of past error term observations
that can be added to the mean of Y to obtain a moving average of
past values of Y
the independent variables known to be important cannot be
forecasted effectively all that is needed is a one or two-period
forecast
An ARCH (AUTOREGRESSIVE CONDITIONALLY HETEROSCEDASTIC) model is a

model for the variance of a time series. ARCH models are used to describe a
changing, possibly volatile variance. Although an ARCH model could possibly
be used to describe a gradually increasing variance over time, most often it
is used in situations in which there may be short periods of increased
variation. (Gradually increasing variance connected to a gradually increasing
mean level might be better handled by transforming the variable.)
ARCH models were created in the context of econometric and finance
problems having to do with the amount that investments or stocks increase
(or decrease) per time period, so theres a tendency to describe them as

models for that type of variable.
An ARCH model could be used for any series that has periods of increased or
decreased variance.
A GARCH (GENERALIZED AUTOREGRESSIVE CONDITIONALLY
HETEROSCEDASTIC) model uses values of the past squared observations
and past variances to model the variance at time t. As an example, a
GARCH(1,1) is
2t=0+1y2t1+12t1
In the GARCH notation, the first subscript refers to the order of the y2 terms
on the right side, and the second subscript refers to the order of the
2 terms.
VAR models (VECTOR AUTOREGRESSIVE MODELS) are used for multivariate

time series. The structure is that each variable is a linear function of past
lags of itself and past lags of the other variables.
Uses of Dummy Variable

Panel Data

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Panel Data

Uploaded by

Copyright:

Available Formats

The types of data that are generally available for empirical analysis,

namely, time series, cross section, and panel.

BALANCED PANEL : If each cross-sectional unit has the same number of

A stochastic process is said to be stationary if its mean and variance are

In general, forecasting is the act of predicting the future

To simplify terminology, the words prediction and forecast will be

2. Obtain values for each of the independent variables for the

The forecasts generated in the previous section are quite simple,

Conditional Forecasting (Unknown X Values for the Forecast

Unconditional forecast: all values of the independent variables are

estimated variance of the error term

ARIMA is a highly refined curve-fitting device that uses current

The ARIMA approach combines two different specifications

An ARCH (AUTOREGRESSIVE CONDITIONALLY HETEROSCEDASTIC) model is a

(or decrease) per time period, so theres a tendency to describe them as

VAR models (VECTOR AUTOREGRESSIVE MODELS) are used for multivariate

Uses of Dummy Variable

You might also like