You are on page 1of 14

International Journal of Forecasting 22 (2006) 43 56

www.elsevier.com/locate/ijforecast

Short-term prediction of wind energy production


Ismael Sanchez*
Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911, Leganes, Madrid, Spain

Abstract

This paper describes a statistical forecasting system for the short-term prediction (up to 48 h ahead) of the wind energy
production of a wind farm. The main feature of the proposed prediction system is its adaptability. The need for an adaptive
prediction system is twofold. First, it has to deal with highly nonlinear relationships between the variables involved. Second,
the prediction system would generate predictions for alternative wind farms, as it is made by the system operator for efficient
network integration. This flexibility is attained through (i) the use of alternative models based on different assumptions about
the variables involved; (ii) the adaptive estimation of their parameters using different recursive techniques; and (iii) using an
on-line adaptive forecast combination scheme to obtain the final prediction. The described procedure is currently imple-
mented in SIPREOLICO, a wind energy prediction tool that is part of the on-line management of the Spanish Peninsular
system operation.
D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

Keywords: Adaptive estimation; Dynamic models; Forecast combination; Kalman filter; Nonparametric regression; Recursive least squares;
Wind energy

1. Introduction cannot be scheduled and, therefore, there will always


be uncertainty about the final production. As a con-
Wind energy has been, in the last decade, the sequence, the uncertainty caused by the connection of
fastest growing energy technology. In some countries many utilities to the grid can decrease the efficiency of
as Germany, Denmark, and Spain, wind power is the network system operation. Accordingly, both util-
widely used. In the case of Spain, more than 4% of ities and the system operator need accurate on-line
its electricity comes from this source of energy. How- forecasts of the wind power production. In a liberal-
ever, in spite of the noticeable benefits of wind energy, ized electricity market, such a forecasting ability will
this level of installed capacity can have unwanted help enhance the position of wind energy compared to
consequences. The reason for this is that wind energy other forms of energy.
This article describes a statistical forecasting sys-
tem for wind energy prediction based on the adaptive
* Fax: +34 916249430. combination of alternative dynamic models. The
E-mail address: ismael@est-econ.uc3m.es. main feature of the forecasting system is its flexibil-
0169-2070/$ - see front matter D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.ijforecast.2005.05.003
44 I. Sanchez / International Journal of Forecasting 22 (2006) 4356

ity. This flexibility is needed for two main reasons. (a)


3.5
x 104
First, it has to deal with highly nonlinear relation-
ships between the variables involved. Second, the 3
prediction system would generate predictions for al-
ternative wind farms of different characteristics, as it 2.5
is made by the system operator for efficient network

Output power
integration. 2

For a given wind farm, the input variables are


1.5
the meteorological predictions of wind (velocity and
direction) for the next 48 h and past values of 1
output power. The forecasting system has then to
supply, on an hourly basis, the predicted output 0.5
Period 1
power up to 48 h ahead. The prediction system Period 2
0
needs to operate on-line. Therefore, once some 0 5 10 15 20 25
preliminary off-line identification of the models is Velocity of wind
made using data from selected sites, the system (b)
needs to be flexible enough to adapt to (a) unfore- 3.5
x 104
seen changing relationships between the variables
involved and (b) alternative wind farms with mini- 3

mal or no calibration. This on-line flexibility also


2.5
requires recursive estimation procedures that must
Output power

be performed in reasonable time. For instance, the 2


Spanish system operator needs to calculate hourly
predictions for more than 200 wind farms. There is, 1.5

therefore, little time for doing the calculations for


1
each wind farm.
From a statistical point of view, wind energy data 0.5 Period 1
has some interesting features: (i) the relationship Period 2
between the velocity of the wind and the generated 0
0 2 4 6 8 10 12 14 16 18 20
power is highly nonlinear and, therefore, candidate
Velocity of wind
predictors have the risk of only being reliable within
certain ranges of data; and (ii) for a given velocity, Fig. 1. Hourly average wind speed and generated power in a wind
this relationship is time-varying because it depends farm in Spain. In each picture, periods 1 and 2 are consecutive.
on other variables such as wind direction, local air
density, local temperature variations, local effects of formance when applied to the next 100 points (peri-
clouds and rain, and so forth. Since some of these od 2) if no adaptation is allowed.
variables are difficult to foresee or even measure, In Fig. 1(a), due to stronger winds in period 2, the
they might not be appropriately included in a model. wind park has reached the rated capacity. In this
Fig. 1 shows some typical situations with wind situation, the output power of the rotor of the wind
energy data that help illustrate these features. In turbine is maintained at an approximately constant
this figure, both graphs (a) and (b) show 200 con- level, or should even be reduced in order to avoid
secutive hourly data points of velocity of wind damages. These changes in the relationship between
(hourly average) and generated power at a certain speed and power are regulated by a control system
wind farm in Spain. The first 100 points are marked that might not be the same across all windmills.
with a circle (o), whereas the last 100 points are Besides, this control technology is constantly evolv-
marked with a plus symbol (+). It can be seen in ing (see, for instance, Ackerman & Soder, 2002).
these pictures that a model estimated by using the Therefore, such behaviour should be learned from
first 100 points (period 1) will produce a poor per- data. The changes observed in Fig. 1(b) can be pro-
I. Sanchez / International Journal of Forecasting 22 (2006) 4356 45

duced by changes in the power control of the wind flows at speed v (m/s) through an intercepting area A
turbines, changes in wind direction or changes in (m2) is
other meteorological variables that affect the efficien-
1
cy of the wind turbine. In both examples, an adaptive p qAv3 ; 2
2
forecasting system is likely to yield a better perfor-
mance than a predictor based on a single model with where q is the air density (kg/m3), which, in turn,
constant parameters. depends on the air temperature and pressure, among
Let p t+h denote the output power of a wind farm at other factors. The real relationship between the power
C generated by the whole wind farm and the velocity of
time t + h and p t+h|t the generated prediction using
information up to time t. The proposed methodology the wind can, however, be more complex than just (2).
C Fig. 2 illustrates this point. Fig. 2(a) is the so-called
for obtaining p t+h|t is based on the following key
features: (a) the use of alternative nonlinear models machine power curve (deterministic output power as a
for obtaining a set of individual predictions; (b) the function of the input velocity of wind) of a particular
use of a highly adaptive estimation of the parameters; wind turbine inside a wind tunnel as it appears, for
and (c) the construction of the final prediction using instance, in the information supplied by its manufac-
an adaptive combination of the competing predictions. turer. From this machine power curve, we can see that
Then, the prediction can be written as the result of the below some minimum wind speed, called connection
following combination: speed, the wind turbine does not produce power. After
this connection speed, the power increases as the wind
X
K
h k speed does. The profile of this growing part of the
p Cthjt
p /tk p
p thjt ; 1 power curve follows a similar growing pattern to that
k1
shown in (2) and it also depends on the particular
where K is the total number of available predictions
(k)
obtained from the alternative models; p t+h|t , k =1, (a)
800
. . ., K are the individual predictions, and / (h)
tk is the
600
time-varying weight used for model k. The alternative
(k)
models used to obtain the individual predictions p t+h|t 400
kW

will be described in Section 2. Then, Section 3 dis- 200


cusses the alternative recursive estimation procedures.
0
In Section 4, the adaptive forecast combination
-200
scheme is presented. Section 5 illustrates the proposed 0 5 10 15 20 25 30
procedures with real data. Section 6 concludes. A m/s

version of the methods described here is currently (b) 4


3 x 10
implemented in a prediction tool called SIPREOLICO
(see Sanchez et al., 2002). SIPREOLICO is now 2.5
operating at the control centre of Red Electrica de
Espana (REE), the Spanish system operator. The pre- 2
dictions generated by SIPREOLICO are used for real
kW

time operation, to solve constraints in the daily and 1.5

intra-daily market, to forecast the wind power for each


1
distribution company, and to make wind power mar-
ket simulations. 0.5

0
0 5 10 15 20 25
2. The competing dynamic models m/s

Fig. 2. (a) Example of a typical machine power curve in a wind


We know from physics that the theoretical relation- tunnel and (b) empirical values of velocity of wind and output
ship between the energy (per unit time) of wind that power in a real wind farm (average hourly values).
46 I. Sanchez / International Journal of Forecasting 22 (2006) 4356

technology of the wind turbine (see, e.g., Ackerman & can explain why these systems can be improved on
Soder, 2002). When the speed increases and reaches by using a statistical forecasting system based on
the so-called nominal speed, the power reaches the dynamic models. This statistical approach of fore-
rated capacity of the wind turbine. After the nominal casting wind power is the idea of the also popular
speed, the output power is kept constant for some Zephyr system (Landberg et al., 2002; see also Gie-
range of the wind speed. There are also alternative bel, Landberg, Nielsen, & Madsen, 2001; Nielsen,
technical solutions to maintain this constant level as Madsen, & Tofting, 1999). The Zephyr system uses
the wind speed exceeds the nominal speed. Finally, a nonparametric model based on the local polynomi-
when the wind speed surpasses the so-called discon- al regression method of Cleveland and Devlin (1988)
nection speed, the wind turbine is disconnected to (see also Joensen, Madsen, Nielsen, & Nielsen,
prevent damages from excessive wind. 1999; Nielsen, Joensen, Madsen, & Landberg,
In spite of the deterministic relationship shown in 2000). This nonparametric model uses the informa-
Fig. 2(a), the empirical power curve obtained in the tion of the observed power and the predictions of
real operation of a wind farm is far from being wind speed and direction.
deterministic. Fig. 2(b) displays the empirical power In this article, a more sophisticated modelling ap-
curve for a whole wind farm over several months. In proach is presented. It is based on the use of several
this figure, each point represents the hourly average alternative models, instead of a single one. It should
wind speed and the resulting average power. Fig. be mentioned that some of the alternative models used
2(b) reveals that the observed power varies and that in the proposed system are similar to the model used
the time-varying influence of some other variables in the Zephyr system. The final prediction is made
can have a substantial effect that cannot be neglected. through an adaptive linear combination of the alter-
We can thus conclude that the relationship between native predictors, where the weights given to each
the wind velocity and the output power should be predictor are based on their actual forecasting perfor-
treated as a nonlinear and stochastic time-varying mance. In order to avoid overfitting, the combination
function of wind speed. There are many reasons for is performed using only the models with better recent
the empirical power curve to be stochastic. Some of performance. Therefore, since the combination is pre-
them are related to the technology of the wind tur- ceded by model evaluation and comparison, many
bines (Ackerman & Soder, 2002; Bianchi, Mantz, & alternative models could be proposed with the only
Christiansen, 2004). For instance, the power control restriction being the time needed for computation.
system of the wind turbine can cause the connection This combination scheme can be interpreted as a
and nominal wind speeds to vary, and also to differ model competition, where the winners are used to
from the remaining wind turbines of the park. Be- obtain the final prediction.
sides, the behaviour of the wind turbines when the To form the group of competing models, two dif-
wind speed increases is different from the behaviour ferent sets of models are proposed here, without dis-
when the wind speed decreases. Also, some other missing the possibility of proposing some others in
meteorological variables such as the wind direction the future. The first type of models are dynamic linear
or the temperature can affect the efficiency of the models where the relationship between power and
wind turbine. wind is made using polynomials of different degree,
Some authors have proposed wind power fore- from linear to cubic, and whose coefficients are esti-
casting systems based solely on the transformation of mated adaptively. The second type of models are
local wind predictions into power using the deter- nonparametric models based on local polynomial fit-
ministic machine power curves. The most popular ting in a similar fashion to the Zephyr system. Since
forecasting systems based on this deterministic ap- the prediction tool will be used in different wind
proach are the Previento system (Beyer, Heinemann, farms, an important argument in choosing a model
Mellingho, Monnich, & Waldl, 1999; Focken, is that it should be implemented without any previous
Lange, & Waldl, 2001) and the Prediktor system calibration.
(Joensen, Giebel, Landberg, Madsen, & Nielsen, To ease adaptability, different models will be esti-
1999; Landberg, 1994). A quick look at Fig. 2(b) mated for each prediction horizon; i.e., the h-step
I. Sanchez / International Journal of Forecasting 22 (2006) 4356 47

prediction constants, h N 1 are obtained separately for where v t+h|t is the forecasted speed of wind made at
each h by minimising a relevant mean squared error of period t for period t + h and where b i,t , i = 1, . . ., q, are
prediction for that horizon. We will call these models time-varying parameters. In M 2, we use q = 1; in M 3,
multi-period-ahead models. Several authors have ar- q = 2; and in M 4, q = 3. Models M 5, M 6, and M 7 use
gued that the use of these multi-period-ahead models the same parameterisation as M 2, M 3, and M 4, respec-
can help improve the forecasting performance, espe- tively, plus the information of the forecasted wind
cially in those cases where a dtrueT model can be direction. The wind direction has valuable informa-
regarded as implausible and where any model can tion. First, wind farms usually have some dominant
only be seen as a useful approximation (see, i.e., directions where velocity often moves in a narrow
Bhansali, 1996; Kang, 2003; and references therein). range. Then, the direction of the wind can be a
We now include a brief description of the alternative predictor of the speed. Second, the performance of a
models, denoted as M 1 to M 9. wind turbine depends on the direction. This direction
The model M 1 is an univariate multi-period-ahead dependence is due to the effect of the terrain and also
autoregression of the form to the so-called shadow effect of surrounding wind
1 M
turbines. These multi-period-ahead models are
pth a0t Pt k; c ethjt ; 3
m M
pth a0;t Pt k; c Wthjt q Dthjt ethjt ;
where
m 5; 6; 7; 7
X
k
Pt k; c ai;t pt1i ak1;t pthc ; 4 with
i1 ! !
2p/thjt 2p/thjt
(M 1)
e t+h|t is the h-step ahead prediction error of this Dthjt c1;t sin c2;t cos ;
360 360
model M 1; and a it , i = 0, 1, . . ., k are time-varying
parameters. A precise notation would also use a 8
forecast horizon index, but for the sake of simplicity
where / t+h|t is the forecasted wind direction
it is omitted. The term p t+hc in (4) intends to and c 1,t and c 2,t are time-varying parameters.
capture the daily cycle. If h b 24  k, then c = 24, and As in (6), the factor W t+h|t (q) uses q = 1 in model
thus p t+h24 is the generated power 24 h before the M 5, q = 2 in M 6 and q = 3 in M 7. Models M 5, M 6, and
time of prediction. If h z 24  k, then c = h + k, and the M 7 will have a competitive performance when the
model is just a multi-period-ahead AR(k + 1). The available prediction of the speed of the wind is of
experience with Spanish data suggests k = 3. This low quality.
simple model can be considered as an extension of Finally, models M 8 and M 9 use the same parame-
the so-called persistence model (p t+h|t = p t ). This terisation as the autoregression M 1 plus a nonpara-
model is of special interest if, for some reason, mete- metric predictor based on the daily cycle, and another
orological information is not available or it is of low nonparametric predictor based on wind prediction.
quality. The idea is similar to the conditional parametric
Models M 2, M 3, and M 4 use the same parameter- ARX models of Nielsen et al. (2000). The functional
isation as M 1 plus the information of the forecasted form of M 8 is
wind speed. These multi-period-ahead predictors have M
v h 8
the form pth a0;t Pt k; c Fthjt Fthjt ethjt ; 9
m
pth a0;t Pt k; c Wthjt q ethjt
M
; m 2; 3; 4; with
X
I
5 v
Fthjt ai;t hth1i v; and 10
i1
with
X
q
X
J
Wthjt q bi;t vv ithjt ; 6 H
Fthjt bj;t hth1j H ; 11
i1 j1
48 I. Sanchez / International Journal of Forecasting 22 (2006) 4356

where h t+h (d ) are nonparametric functions based on 3. The recursive adaptive estimation
local polynomials as proposed by Cleveland and Dev-
lin (1988) and Fan and Gijbels (1996). The local All the parameters involved in the above nine
polynomial fitting is implemented recursively as in proposed models need to be estimated recursively.
Vilar-Fernandez and Vilar-Fernandez (1998). The Such an estimation procedure is discussed in this
parameters a i,t and b j,t are time-varying parameters. section. There are two main reasons for using adaptive
In (10), h t+h (v) is the result of a nonparametric local estimation for wind power forecasting. First, the non-
linear regression where the output power p t+h is linear behaviour can cause any proposed model to be
explained as a linear function of wind speed v t+h|t , a valid approximation only in a certain span of data as
whereas in (11) the regressor h t+h (H) is the nonpara- illustrated in Fig. 1(a). Second, the power curves are
metric prediction of p t+h using local linear regressions changing through time in response to, for instance,
with the hour at time t + h as the only regressor. The meteorological changes, as seen in Fig. 1(b). Thus, the
functional form of M 9 is parameters cannot be regarded as constants and
v/ H 9 M should be adapted as new information is available,
pth a0;t Pt k; c Fthjt Fthjt ethjt ; 12
as proposed by Grillenzoni (1994) and Sanchez (in
where press). In the proposed forecasting system, and in
order to adapt to diverse situations in a better way,
v/
X
I
the recursive estimation is performed using two alter-
Fth di;t hth1i v; /; 13 native procedures: recursive least squares (RLS) and
i1
the Kalman filter (KF). RLS is performed in such a
with h t+h (v, /) being a nonparametric function similar way that the adaptability of the estimates is larger
to (10) based on local linear regressions using the when the system is changing quickly and is smaller
wind speed and wind direction as regressors, and when the parameters are changing slowly or remain
d i,t being time-varying parameters. Experiments with constant. On the other hand, KF is performed assum-
Spanish data recommend I = 3 and J = 2. The compu- ing that parameters are always evolving very slowly.
tation of the nonparametric functions h t+h (v), h t+h (H), Hence, at the end of the estimation process we will
and h t+h (v, /) is very time consuming. Therefore, in have doubled the number of competing predictors. A
order to do a feasible implementation, they are only generic time-varying model for the dependent variable
evaluated at some specified grid of fitting points and p t can be written as
then interpolated for the remaining points. This spec-
ification, together with some other estimation aspects, pt zt Vbt at 14
such as bandwidths and some smoothing factors,
with b t a k  1 vector of time-varying parameters and
make this method site-dependent. To solve this de-
z t a k  1 vector of input variables that can be either
pendency, models M 8 and M 9 can be calibrated from
stochastic or deterministic. The RLS estimator is
the analysis of some selected sites, but their optimal
implementation would need a specific analysis at each
bt
b
RLS
b
RLS
b t1 & RLS zt aa t ; 15
site using older data. t
As mentioned above, one of the motivations for the
previous models is to build a flexible prediction sys- with a t = p t  zVb (RLS)
t t1 being the prediction error. The
tem to be used in different locations. If the goal is to matrix & t (RLS)
is the so-called gain matrix or weighted
build a prediction tool to be used only in a single wind covariance matrix, and is a measure of the dispersion
farm, more site-dependent models like neural net- of the estimate b (RLS)t . This matrix can be obtained
works could also be proposed (Dutton, Kariniotakis, recursively using the well-known result (see, e.g.,
Halliday, & Nogaret, 1999; Kariniotakis, Stavrakakis, Grillenzoni, 1994)
& Nogaret, 1996). Neural networks can be an efficient
procedure for dealing with nonlinearities. These pro- !
RLS RLS
RLS 1 RLS & zt zt V & t1
cedures, however, would need long periods of time to & t & t1  t1 RLS
: 16
kt kt zt V& t1 zt
tune the algorithms to specific local conditions.
I. Sanchez / International Journal of Forecasting 22 (2006) 4356 49

The parameter k t is the so-called forgetting factor estimation of our forecasting system will be per-
and holds 0 b k t V 1. The above RLS algorithm mini- formed with the following adaptive forgetting factor:
mizes the weighted criterion S2t (b) = ( p t  bVz t )2 +
2
k t S t1 (b), where it can be seen that the sequence of kCook
t minmax0:7; St ; 0:999 :
forgetting factors, k t , is the key feature of this adaptive For the second estimation procedure, based on KF,
procedure. From this objective function we can observe we will assume that the parameter vector b t evolves
that the smaller the value of k t , the lower the influence slowly, following a random walk with small variance;
of past data in the estimation. Typically, the choice of that is
the forgetting factor is a compromise between the
ability to track changes in the parameters and the bt bt1 et :
need to reduce the variance of the prediction error.
The choice of the forgetting factor is very important
since it has a substantial effect on the efficiency of the with E(et ) = r2e I m , where I m is the identity matrix and
predictions. Most applications use a constant forgetting r e2 a small positive constant. With this assumption, it
factor, typically inside the range 0.950 V k V 0.999. is known that KF gives a linear, unbiased, and min-
Here we will use an adaptive forgetting factor where imum error variance recursive algorithm to optimally
the speed of adaptation of the estimates is related to the predict the new parameter value. This algorithm can
characteristics of the data. The adaptive forgetting be written as
factor will be the so-called kCook t proposed by Sanchez
(in press), who proved that it provides better adaptation bt
b
KF KF
b t1 jt pt  ztVb t1 KF
b
features than some other popular alternatives. In par-
ticular, it is able to adapt to common situations occur-  
ring with wind energy data, such as those described in KF zt ztV & KF zt 1
jt & t1 t1
Fig. 1(a) and (b). The proposed kCook t is based on
Cooks distance to measure the influence of the new
data and translate such influence into an adaptive for- KF KF
getting factor. From Sanchez (in press), we have that KF  & t1 zt ztV & t1 r2 Im :
& KF &
t t1 e
Cooks distance for the time-varying model (14) can be 1 ztV & KFzt
t1
written as
 2 Experience with Spanish data suggests r 2 = 10 20.
RLS RLS
ztV& t1 zt pt  ztVb
b t1 The proposed RLS is more adaptive than this KF
Ct   ; 17 model, since it does not assume any structure for
RLS zt
r 2t1 1 ztV &
r the evolution of the parameters. Besides, the speed
t1
of adaptation in the RLS is also time-varying
2 according to the information in the data. However,
where rt1 is a consistent estimator of E(a2t ), for it is sensible to also justify the use of the proposed
instance KF for wind power forecast. The reason is that it
 2
P
t 1
RLS is not uncommon for meteorological variables to be
pi  ztVb i
quite stable in some regions and for some periods.
r 2t1 i1
r : For instance, low-pressure systems, which are usu-
t1
ally associated with high winds, can affect a region
Then, to evaluate the value of C t in (17), it can be from 2 to 3 days. On the other hand, high-pressure
compared with the v 2 distribution with m degrees of systems, usually associated with lighter winds, can
freedom. Let us denote the survivor function of the v m2 last even longer. Meteorological changes are thus
distribution as S t u S t (C t ), that is S t (C t ) = P(v m2 N C t ). so slow that 1 h can be a very small measurement
Several adaptive forgetting factors can be proposed unit. In these cases, a model based on hourly data
using this survivor function. Based on the empirical that assumes an evolution in the form of a random
performance reported in Sanchez (in press), the RLS walk with small variance can be a parsimonious
50 I. Sanchez / International Journal of Forecasting 22 (2006) 4356

and efficient alternative to models that are prepared Since the number of alternative predictors is large
for any kind of contingency. Each of the above- and the relative performance of them can be very
mentioned models M 1 to M 9 are estimated, for different, we will only combine a subset of them.
each horizon, using both procedures. Therefore, There is an agreement between practitioners that
we are using 18 alternative predictors for each poor forecasts should not be included in the combi-
horizon. The final prediction will be made through nation (see, i.e., Bunn, 1985). The intuitive reason is
some linear combination of these competing pre- that any predictor can have some good performance at
dictions. This combination is discussed in the next some time just by chance. Thus, even the poorest
section. predictor will have a non-zero weight in the combi-
nation, to the detriment of better predictors, and caus-
ing a loss of efficiency. Therefore, by combining only
4. Adaptive forecast combination and the final the important procedures, we can reduce the variabil-
prediction ity of the combined forecast, leading to a much better
performance (Yang, 2004). The subset of selected
When several candidate models are available to predictors will be chosen according to their recorded
forecast a single variable, we can either select the recent performance. We will only combine the d best
best model or combine them. Regarding model se- predictors, where d can be time-varying (see Swanson
lection, alternative selection procedures have been & Zeng, 2001, for alternative procedures for doing
proposed in the literature, both based on selection this selection when the number of predictors is small
criteria, like the popular AIC (Akaike, 1974) or BIC and d fixed). Then, some of the / tk(h) in (18) will be
(Schwartz, 1978), or on testing procedures (see, e.g., zero. If we always combine the same subset of pre-
Chen & Yang, 2002 and references therein). On the dictors, we could build a recursive combination
other hand, forecast combination is also a popular scheme using a regression with time-varying coeffi-
and important tool in forecasting time series analysis, cients, as described in Section 3 (see also Diebold &
and there is a vast body of literature that demon- Pauly, 1987; Sessions & Chaterjee, 1989; and Terui &
strates its usefulness (see, e.g., Clemen, 1989; Yang, van Dijk, 2002, for alternative recursive combinations
2004). Forecast combination is especially advised with d fixed). The idea of using as many predictors as
when there are doubts about the existence of a possible and then selecting which of them will enter
dbest modelT. into the final combination is similar to the nonpara-
In this section, we will use the theory of forecast metric model proposed by Kohn et al. (2001). These
combination to produce a combination of our authors consider building a nonparametric regression
K = 18 alternative predictions to obtain the final using linear combinations of basis functions (polyno-
prediction of p t+h . This final prediction can be mials, splines, etc.). To ensure flexible estimates, the
written as regression should include a large number of basis
functions. Then, to avoid overfitting, they select the
X
K
h k functions that will have a non-zero weight in the
p Cthjt
p /tk p
p thjt ; 18
k1
regression. The Bayesian hierarchical method pro-
posed in Kohn et al. (2001) to solve the problem is,
where / (h)tk is the time-varying weight given to however, computationally expensive for on-line oper-
model k and K is the total number of competing ation, since it needs to be solved by Monte Carlo
predictors. Our forecast combination will be adap- simulation.
tive; i.e., the weight given to each model will In order to propose a feasible adaptive combi-
evolve through time. Note that, through the combi- nation procedure with a time-varying d, we will
nation of the competing models, we not only look first define the recursive on-line measurement of
for a dynamic adaptation, but also ease the adap- forecasting performance. Once we have access to
tation of the prediction tool to alternative wind the new wind power production p t , we can com-
(i) (i)
farms, since the relative performance of the com- pute the prediction errors e t|th = p t  p t|th of pre-
peting models can depend on the location. dicting p t from period t  h using the model i = 1,
I. Sanchez / International Journal of Forecasting 22 (2006) 4356 51

2, . . ., K. Then, we will use the following weight- where c d is a vector of ones of length d. In the
ed sum of products of prediction errors: second option, the i-th element, i = 1, . . ., d, of the
[d]
vector b t|th of combining coefficients is
 1 X
d  1
i;j i j i;j d i l
Stjth ee tjth ee tjth kSt1jth1 bb i;tjth vv tjth vv tjth ; 22
l1
X
t
i j
kts ee sjsh ee sjsh ; i; j 1; N ; K; 19 where v (i)
t|th , i = 1, . . ., d, are the diagonal elements
s1 [d]
of V t|th ; that is, the EWMSPE of each selected
predictor. It should be noted that, in order to com-
bine a set of predictors to predict p t+h from period
where 0 b k b 1 has the same interpretation as the t, the last available estimated EWMSPE is X t|th .
above-mentioned forgetting factor. We could even Therefore, assuming that the best prediction of Xt+h|t
use an adaptive forgetting factor, k = k t . We can then is Xt|th , which is equivalent to assuming a random
estimate the matrix of mean squared prediction errors walk evolution of such a random variable, the com-
(MSPE) using an exponentially weighted moving av- bination of the best d predictors will be
erage MSPE (EWMSPE). Let us denote this EW
X
d
MSPE matrix as Xt|th u Xt|th (k), where the (i, j) C d
pp thjt
d i
bb i;tjth pp thjt ;
element is i1

[i]
!1 where p t+h|t , i = 1, . . ., d, is the predictor cor-
  X
t [d]
i;j responding to row i of the EWMSPE matrix V t|th .
X tjth
X Stjth kts ; i; j 1; ::; K: [d] (h)
t|th. We will denote as d t the optimal number of
i; j s1
20 these predictors used in computing the final predic-
C
tion p t+h|t . This number d(h) t will be estimated using
the prediction performance of the different combin-
This matrix X t|th is similar to the covariance matrix ing alternatives; that is, the performance of using
proposed in Granger and Newbold (1986, p. 274). Let d = 1, . . ., K. In order to evaluate the performance of
us denote by V t|th the EWMSPE matrix with the the K different combinations, we will use the same
same information as X t|th but sorted by its diagonal definition of EWMSPE as in (20) but now the
elements in increasing order; i.e., the first row corre- competing predictors are p t+h|t C[d]
, d = 1, . . ., K. The
sponds to the best predictor and so on. Then, the prediction errors of these K combinations are
[d]
information of the best d predictors is in V t|th , denoted by e t|thC[d] C[d]
= p t  p t|th , d = 1, . . ., K. Using
which is the first d  d submatrix of V t|th . Note that, a recursion similar to (19), we can compute the
[d]
if d u d(t), the size of V t|th and the predictors involved EWMSPE of these prediction errors as
can vary through time. We will adopt here the classical
approach for optimal forecast combination, using !1
d C d
X
t
weights that sum to one, and which are built using ts
w tjth
w Stjth k ; i; j 1; ::; K; 23
[d]
the elements of V t|th . There are typically two options s1
for building the vector of combining coefficients:
[d] C[d] C[d] 2 C[d]
using the full matrix V t|th or using only the diag- where S t|th = (e t|th ) + kS t1|th1 . Hence, the es-
onal (ignoring the correlation between the predic- timated optimal combination will use the d(h) t best
[d]
tors). In the first option, the vector b t|th of predictors, where d(h) t = arg min d (w [d]
t|th ). The final
combining coefficients will be prediction is then

h
dt 
h 1 i.h  1 i X h
dd t i
d
b tjth
b
d
V tjth
V cd cdV V
d
V tjth cd ; 21 pp Cthjt bb i;tjth pp thjt : 24
i1
52 I. Sanchez / International Journal of Forecasting 22 (2006) 4356

[d]
Note that, as with X t+h|t , we have used the random mance changes very quickly, then w t|th would be a
[d] 2
walk assumption for the evolution of (e t+h|t ) , and poor predictor of the performance of the combination
therefore the best estimation of E[(e t+h|t ) ] is w[d]
[d] 2
t|th . of the best d(h)t predictors. On the other hand, if the
As mentioned in Section 3, the random walk assump- position of the competing predictors in the ranking is
[d]
tion for the evolution of our random variables has a stable, w t|th will be a good predictor of the perfor-
physical justification, since atmospheric changes are mance of the combination, and the coefficients b i,t|th
very slow. From a practical point of view, this random in (24) will lead to an efficient combination. Fig. 3
walk assumption does not seem to be restrictive when illustrates this point, showing a high stability across
estimating d(h)
t and the optimal weights (22), since time in the ranking of predictors. Fig. 3(a) shows the
data suggest that the position of the predictors in the evolution of values of d(h) t for h = 6 using data from a
[d]
sorted matrix V t|th also evolves very slowly. If the wind farm in Spain over time (the details of the
position of the predictors in the ranking of perfor- estimation are described in the next section), whereas
Fig. 3(b) shows the evolution, over the same period,
(a) Evolution of optimal d for h=6 of the EWMSPE of the alternative 18 predictors
18
(dotted line) at h = 6. In order to ease visualization,
16 the alternative EWMSPE have been divided by the
14
EWMSPE of model M 1 estimated with RLS. The
solid line in Fig. 3(b) is the EWMSPE of the final
12
combination (24) using a similar estimation of
Optimal d

10 MSPE as in (20). Fig. 3 shows that the position


8
of the alternative predictors (dotted lines) in the
ranking of the best d(h) t predictors is very stable
6
across time. Thus, when d(h) t remains constant, it
4 tends to be based on the same predictors. The
2
relative performance of the competing models also
has a slow evolution, leading to a high stability in
0
200 400 600 800 1000 1200 1400 1600 1800 2000 d(h)
t .
Time
(b)
Evolution of the EW-MSPE of the alternative predictors (relative to M1 with RLS)
1.3 5. Some results
1.2
This section shows an application of the method
1.1
using data from a wind farm in Spain. These results
should only be considered as illustrative, since the
Relative EW-MSPE

0.9
performance can depend on many factors that here
will be fixed: location of the wind farm, characteris-
0.8
tics of the wind turbines, calibration of the nonpara-
0.7 metric models (M 8 and M 9), and the quality of the
0.6
wind predictions. The accuracy of the wind predic-
tions is the most influential factor in wind power
0.5
forecasting. We will therefore use both real wind
0.4 measurements, obtained from the anemometer of the
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time wind farm, and wind predictions supplied by the
Spanish Meteorological Institute.
Fig. 3. (a) Evolution of the optimal numbers of predictors to The data corresponds to hourly average wind speed
combine at h = 6. (b) Evolution of the relative EWMSPE of the
competing predictors (dotted lines) and the final combination (solid
and direction, and average hourly power measured
lines). The values of EWMSPE are relative to the EWMSPE of from January to April 2002. There is a total of 2800
model M 1 estimated by RLS. data points when anemometer measurements are used
I. Sanchez / International Journal of Forecasting 22 (2006) 4356 53

and 2052 when wind predictions are used. The first ison, this figure also displays the MSPE of models M 7
100 observations were used to obtain initial estimates and M 9 estimated by RLS (RLS and KF have very
in order to further apply RLS and KF. The on-line similar performance in this data set), which are the
adaptive combination of the 18 alternative predictors models that encompass all the characteristics of the
were made by evaluating the EWMSPE matrix (20) proposed models. Models M 7 and M 9 can then be seen
and (23) with k = 0.985, which is equivalent to an as the alternatives to a combination strategy. It can be
asymptotic memory length of 24 h. The combination seen in both Fig. 4(a), based on real wind measure-
coefficients have been obtained using (22). ments, and Fig. 4(b), based on wind predictions, that
Fig. 4 shows the empirical MSPE of the proposed the final predictions have better overall performance
combination procedure at each horizon. For compar- than those individual models. Fig. 3 shown above is
based on real wind measurements. This figure illus-
trates how a different number of predictors is com-
bined as the relative accuracy of the competing
(a) predictors changes across time.
MSPE of optimal combination and some individual models
Wind data: anemometer
x 106 In order to understand the role of the alternative
8
models in the final combination, Fig. 5 shows the
7
average combination coefficients of each model
M9
when real wind measurements are used. Since we
6 M7 are using real wind data, the relative performance of
Combination model M 1 is, as expected, very poor and, conse-
5
quently, its average combination coefficient is close
MSPE

to zero. Therefore, we do not show its results here.


4
Fig. 5(a) displays the average coefficients of the
3 models that use the information of the wind (M 2 to
M 9). In this figure, we have merged the coefficients
2 corresponding to both estimation procedures, RLS
and KF. We have also merged the coefficients of
1
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 the models in which the use of the wind direction is
Prediction horizon the only difference. This means summing up the
(b) coefficients of M 2 and M 5 (models with a linear
MSPE of optimal combination and some individual models
Wind data: predictions function of the velocity), M 3 and M 6 (models with
x 106
15
M7
a quadratic function of the velocity), M 4 and M 7
M9 (models with a cubic function of the velocity), and
M 8 and M 9 (nonparametric models). Fig. 5(a) shows
Combination
that, out of the fully parametric models (M 2 to M 7),
10 the models with a cubic power of the wind (M 4 + M 7)
have the largest weight, which is equivalent to say-
MSPE

ing that they are better predictors. This result is in


agreement with (2). It can also be seen that these
5 models (M 4 + M 7) have performance comparable to
the nonparametric models (M 8 and M 9, first solid
line from the bottom). However, when we consider
all the parametric models (M 2 to M 7, first solid line
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 from the top), the aggregated combination coefficient
Prediction horizon surpasses that of the nonparametric models. This
Fig. 4. MSPE of the optimal combination and models M 7 and
allows us to conclude that, although nonparametric
M 9 estimated with RLS using (a) real wind data and (b) wind models are especially suited to dealing with nonlinea-
predictions. rities, the combination of different polynomials of the
54 I. Sanchez / International Journal of Forecasting 22 (2006) 4356

(a) the estimation method. It can then be seen that the


Average combination coefficient of each model
1
combination coefficient is only null for some short
0.9 periods, and for most of the time the coefficient
Average combination coefficient

0.8 varies between 0.10 and 0.15. Since we are dealing


0.7 with 18 alternative predictors, a weight as large as
0.6
Parametric
M2 to M7 0.15 shows a significant contribution to the final
combination.
0.5
Nonparametric
0.4 M8+M9

Param. with v3
0.3 M4+M7 6. Concluding remarks
Param. with v 2
0.2 M3+M6

0.1 Param. with v1 Through the design of a very flexible recursive


M2+M5

0
forecasting system, we have obtained a useful short-
0 5 10 15 20 25 30 35 40 45 term prediction tool for wind power production (and
Prediction horizon
perhaps with some other applications). The system is
(b) based on an on-line time-varying forecast combina-
Evolution of combination coefficient of model M9, at h=12, using RLS
0.18 tion, where both the number of predictors and their
0.16
weights are time-varying. Both the competing models
and the estimation procedures have been selected in
0.14
such a way that a wide range of real situations can be
Combination coefficient

0.12 covered.
0.1 This statistical forecasting tool can also be ac-
companied by a set of numerical rules that intro-
0.08
duce non-random information about the wind park
0.06 which can affect predictions. An example of this
0.04 kind of information is the disconnection speed,
0.02
beyond which the wind turbines are disconnected
for safety reasons. Another example of non-random
0
0 200 400 600 800 1000 1200 1400 1600 1800 changes that can affect predictions are changes in
Time the nominal power of the wind park, due to
Fig. 5. (a) Average combination coefficient of competing models.
changes in the number of wind turbines, mainte-
The dotted lines correspond to the models based on a polynomial of nance, and so on.
the velocity of wind with different orders. The solid lines are the The described system has a modular framework
aggregation of all the parametric models M 2 to M 7 and the non- (competing modelsestimation proceduresfinal
parametric models M 8 and M 9, respectively. (b) Subsample of the combination) that allows further independent re-
evolution of the combination coefficient of model M 9 at h = 12.
search to be made in every specific part of the
system and that will undoubtedly improve its
velocity of wind is also an efficient way to model the performance.
nonlinearity of this system and can make a significant
contribution to the final combination.
The combination coefficients for each model are Acknowledgments
time-varying, aimed at adapting to the data. The
coefficients shown in Fig. 5(a) are just their average The author is grateful to the referees for their useful
values along the observed data points. Fig. 5(b) comments. The author is also grateful to Carlos
illustrates the evolution of one of these combination Velasco for his computational assistance with the
coefficients. This figure shows a portion of the evo- nonparametric models. Some parts of this research
lution of the combination coefficient of model M 9 have been presented in the following seminars:
when used to predict at horizon h = 12 using RLS as 2002-IEA Symposium on Wind Forecasting Techni-
I. Sanchez / International Journal of Forecasting 22 (2006) 4356 55

ques (Norrkiping), the World Wind Energy Confer- Giebel, G., Landberg, L., Nielsen, T. S., & Madsen, H. (2001). The
ence and Exhibition (Berlin), the 2002 European Zephyr project. The next generation prediction system. Proceed-
ings of the EWEC 2001 (pp. 777 780).
Wind Energy Conference (Paris), the 17th Internation- Granger, C. W. J., & Newbold, P. (1986). Forecasting economic
al Workshop on Statistical Modelling (Chania), and time series. San Diego7 Academic Press.
the XXI SEIO Meeting (Baeza). The author is grateful Grillenzoni, C. (1994). Optimal recursive estimation of dynamic
to the attendants of the above-mentioned seminars for models. Journal of the American Statistical Association, 89,
777 787.
their useful comments. This research has been partly
Joensen, A., Giebel, G., Landberg, L., Madsen, H., & Nielsen, A.
supported by Red Electrica de Espana and the ANE- (1999). Model output statistics applied to wind power predic-
MOS project (ENK5-CT-2002-00665), funded by the tion. Proceedings of the EWEC 1999 (pp. 1177 1180).
European Commission and grant SE 2004-03303 Joensen, A., Madsen, H., Nielsen, H. A., & Nielsen, T. S. (1999).
from Ministerio de Educacion y Ciencia. Any remain- Tracking time-varying parameters using local regressions. Auto-
ing error is the authors responsibility. matica, 36, 1199 1204.
Kang, I.-B. (2003). Multi-period forecasting using different mod-
els for different horizons: An application to U.S. economic
time series data. International Journal of Forecasting, 19,
References 387 400.
Kariniotakis, G. N., Stavrakakis, G. S., & Nogaret, E. F.
Ackerman, T., & Soder, L. (2002). An overview of wind energy (1996). Wind power forecasting using advanced neural net-
status 2002. Renewable and Sustainable Energy Reviews, 6, works models. IEEE Transactions on Energy Conversion, 11,
67 128. 762 767.
Akaike, H. (1974). A new look at the statistical model identi- Kohn, R., Michael, S., & Chan, D. (2001). Nonparametric regres-
fication. IEEE Transactions on Automatic Control, AC-19, sion using linear combinations of basis functions. Statistics and
716 723. Computing, 11, 313 322.
Beyer, H. G., Heinemann, D., Mellingho, H., Monnich, K., & Waldl, Landberg, L., (1994). Short-term prediction of local wind con-
H. P. (1999). Forecast of regional power output of wind turbines. ditions. PhD-Thesis, Riso National Laboratory. Roskilde,
Proceedings of the EWEC 1999 (pp. 1070 1073). Denmark.
Bhansali, R. J. (1996). Asymptotically efficient autoregressive Landberg, L., Giebel, G., Madsen, H., Nielsen, T. S., Jrgensen, J.
model selection for multistep prediction. Annals of the Institute U., Laursen, L., et al. (2002). Wind farm production predic-
of Statistical Mathematics, 48, 577 602. tionthe Zephyr model. Technical report. Roskilde, Denmark:
Bianchi, F. D., Mantz, R. J., & Christiansen, C. F. (2004). Power Riso national Laboratory.
regulation in pitch-controlled variable-speed WECS above rated Nielsen, T. S., Madsen, H., & Tofting, J. (1999). Experiences with
wind speed. Renewable Energy, 29, 1911 1922. statistical methods for wind power prediction. Proceedings of
Bunn, D. W. (1985). Statistical efficiency in the linear combi- the EWEC 1999 (pp. 1066 1069).
nation of forecasts. International Journal of Forecasting, 1, Nielsen, T. S., Joensen, A., Madsen, H., & Landberg, L.
151 163. (2000). Tracking time-varying coefficient functions. Interna-
Chen, Z., & Yang, Y. (2002). Time series models for forecasting: tional Journal of Adaptive Control and Signal Processing, 14,
Testing or combining? Manuscript, Iowa State University. 813 828.
Clemen, R. T. (1989). Combining forecasts: A review and annotated Sanchez, I. (in press). Recursive estimation of dynamic models
bibliography. International Journal of Forecasting, 5, 559 583. using Cooks distance, with application to wind energy forecast.
Cleveland, W. S., & Devlin, S. J. (1988). Locally weighted Technometrics.
regression: An approach to regression analysis by local Sanchez, I., Usaola, J., Ravelo, O., Velasco, C., Domnguez, J.,
fitting. Journal of the American Statistical Association, 83, Lobo, M., et al. (2002). SIPREOLICOa wind power pre-
596 610. diction system based on flexible combination of dynamic
Diebold, F. X., & Pauly, P. (1987). Structural change and the models. Application to the Spanish power system. Proceed-
combination of forecasts. Journal of Forecasting, 6, 21 40. ings of the World Wind Energy Conference and Exhibition
Dutton, A. G., Kariniotakis, G., Halliday, J. A., & Nogaret, E. 2002.
(1999). Load and wind power forecasting methods for the Schwartz, G. (1978). Estimating the dimension of a model. Annals
optimal management of isolated power systems with high of Statistics, 6, 461 464.
wind penetration. Wind Engineering, 23, 69 87. Sessions, D. N., & Chaterjee, S. (1989). The combining of forecasts
Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its using recursive techniques with nonstationary weights. Journal
applications. London7 Chapman & Hall. of Forecasting, 8, 239 251.
Focken, U., Lange, M., & Waldl, H. P. (2001). Previento Swanson, N. R., & Zeng, T. (2001). Choosing among compet-
a wind power prediction system with an innovative ing econometric forecasts: Regression-based forecast combi-
upscaling algorithm. Proceedings of the EWEC 2001 nation using model selection. Journal of Forecasting, 20,
(pp. 826 829). 425 440.
56 I. Sanchez / International Journal of Forecasting 22 (2006) 4356

Terui, N., & van Dijk, H. K. (2002). Combined forecasts from linear Ismael Sanchez is Associate Professor of Statistics in the Politech-
and nonlinear time series models. International Journal of nic School at Universidad Carlos III de Madrid. His main research
Forecasting, 18, 421 438. interests are time series analysis, forecasting, and statistical process
Vilar-Fernandez, J. A., & Vilar-Fernandez, J. M. (1998). Recur- control. He has published in several leading journals, including the
sive estimation of regression functions by local polynomial Journal of the American Statistical Association, Technometrics, and
fitting. Annals of the Institute of Statistical Mathematics, 50, the Journal of Forecasting. He actively participates in a multidisci-
729 754. plinary team concerning the real-time prediction of wind energy
Yang, Y. (2004). Combining forecasting procedures: Some theoret- production for the Spanish Peninsular system.
ical results. Econometric Theory, 20, 176 222.

You might also like