You are on page 1of 21

B.

TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

B.Tech Project Report: 2011 Indian Institute of Technology, Gandhinagar (IIT-GN)

SHORT TERM WIND SPEED FORECASTING USING TIME SERIES MODEL

Project Guide Prof. Naran.M.Pindoriya Department of Electrical Engineering

Submitted by Gurrapu Naveen (0800233) Department of Electrical Engineering

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

CERTIFICATE
This is to certify that this work and report titled Short term wind speed forecasting using time series model is carried out by Mr. Gurrapu Naveen under the guidance and project supervisor Prof. Naran.M.Pindoriya. Also, this work has not been published anywhere.

Project Guide Prof. Naran.M.Pindoriya Department of Electrical Engineering IIT-Gandhinagar

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

ACKNOWLEDGEMENT

This work would not have been possible without the help of my project guide Prof. Naran.M.Pindoriya at Indian Institute of Technology, Gandhinagar and I could like to express my sincere gratitude for giving me the opportunity to work on this project. Also, for his continuous support, patience, motivation, enthusiasm, and immense knowledge in forecasting models. His constant guidance and his time has had great impact on this project and me. I could not imagine this work and report without his contribution. This work has not been published anywhere, as this work has been done solely by me under the guidance of Prof. Naran.M.Pindoriya.

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

CONTENTS

1. Certificate 2. Acknowledgement 3. Acronyms 4. List of tables 5. List of Figures 6. Abstract 7. Introduction 8. Overview of different forecasting methodologies 9. Different time horizons and views on universal model. 10. Plot of data set 11. ACF 12. PACF 13. Simulation of time series models 14. Best model chosen 15. Assumptions 16. Conclusion 17. References

2 3 5 6 6 7 8 9 11 12 13 14 15 19 19 20 21

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

ACRONYMS

AR: MA: ARMA: ARIMA: CARIMA: ACF: PACF: NWP: NN: MAPE: MW: KW: GW: GWN:

Auto Regressive Moving Average Auto Regressive Moving Average Auto Regressive Integrated Moving Average Controlled Auto Regressive Integrated Moving Average Auto Correlation Function Partial Auto Correlation Function Numerical Weather Prediction Neural Networks Mean Absolute Percentage Error Megawatt Kilowatt Giga watt Gaussian White Noise

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

LIST OF TABLES
1. 2. 3. 4. 5. Difference in time series model......................................................................... Different threshold values of AR and corresponding AR orders...................... AR (20), AR (13), and AR (4) equations........................................................... Different threshold values of ARMA and corresponding ARMA orders........ ARMA (20, 1), ARMA (13, 1), ARMA (4, 1) equations................................... 11 15 16 17 18

LIST OF FIGURES
1. 2. 3. 4. 5. Plot of an actual wind speed data for 15 days................................................ Plot of ACF function for 15 days with 72 lags.................................................. Plot of PACF function for 15 days with 72 lags................................................ Plot of different AR forecasting models for 24 hours....................................... Plot of different ARMA forecasting models for 24 hours................................ 12 13 14 17 19

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

ABSTRACT:

The world is facing a huge energy security problem with demand for energy going to triple by 2050 and with increasing restrictions on fossil fuels so as to resist the climatic change, thereby giving rise to renewable resources for generation of electricity. Wind energy being most fluctuating of the renewable resources due to its dynamic nature, uncertainties and intermittent power it generates creating the issue of storage facility, integration to the grid. Wind energy is one of the rapid growing sector in power industry with world-wide capacity reaching 1, 96,630 MW in 2011, leading the installed capacity by china [1]. As the capacity of wind increases, higher penetration of wind power into the grid is essential to increase a country GDP and protect energy security of a country. The past decade has seen an intensive research and development in wind forecasting with many new models emerging with higher accuracy and sophisticated models. This report gives an insight view of the different methodologies in time series forecasting and comparing the approaches or models for short term period in terms of accuracy, MAPE and variance.

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

1. INTRODUCTION

Wind prediction means predicting the amount of wind power or wind speed that can be available at the next instant or in the next duration or horizon of days. Forecasting is used in many industries like in business industry for stock markets, weather prediction, meteorology, earthquake prediction. Forecasting in power industry is mainly required to have an edge in day ahead scheduling of energy market trading, maintain reserve requirement, storage capacity and to make unit commitment & for dispatch decisions. Precise forecasting is required so as to reduce penalties, imbalance charges. Wind power on a site depends on various characteristics like temperature, pressure at the wind site, speed of rotation of wind turbine, direction of rotation, humidity at the site, density of air, position of terrain, time of usage. The relation between wind speed and wind power, equation: where , which a non-linear

is density of air, A is the surface area of the wind turbine blade.

The un-certainties in wind speed forecast errors are mostly due to severe temperature, high precipitation, furious speed etc. Wind speed prediction is done rather than wind power prediction as the errors associated in wind speed forecasting can be easily identified and resolved to a maximum extend, whereas the source of errors in wind power forecasting cannot be identified easily. So, task of analysing the source of errors and correcting will be a difficult task. Moreover, after knowing the wind speed, wind power can be calculated from the equation (1) Also, by the usage of wind power curve, we can calculate power output. Analysing the above equation we can derive that (2) assuming all the other parameters in the equation (1.1) are constant. So, a change or error of 2% in wind speed will have an effect of 6% change or error in wind power. For most practical case in power systems, the wind power error should be less than or equal to 20% [6]. This report presents about the time-series forecasting which a subclass of statistical approach. Statistical approach models will implement a relationship between input and the output of the model (here, number of the inputs to

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

the model can be more than one) by developing a statistical estimation of the parameters to predict the future events. A time series a time ordered sequence of observation values of a physical or financial variable made at equally spaced time intervals , represented as a set of discrete values ...etc. Traditionally, time

series analysis is one of the statistical approaches and also a branch of statistics that generally deals with the structural dependencies between the observation data of random phenomena and the related parameters. [3] Time series is further classified into two approaches time-domain approach and frequency domain approach. Time domain approach is widely used because of the presence of Box-Jenkins approach. Seasonality, linearity, trend, stationarity are the characteristics of time series models.

2. OVERVIEW

OF DIFFERENT METHODOLOGIES OR FORECASTINGS:

There are already many existing models for the required task of operation for a particular time horizon and every model has its own limitations & implications for its use. 2.1. Persistence model: This is one of the simplest models in the forecasting models. It is very effective for very-short term and short term forecasting. This model is treated as a bench model for very-short term and short term forecasting. This model is based on the principle that under similar conditions the next forecast data point will almost be identical or constant to present data point value. It is popular not just because of simplicity but also cost-effective to implement and reduced complexity with less response time for predicting values. This model has a drawback which is that as the time horizon of the prediction range increases the error associated with it also increases with the time horizon.

2.2. NWP: This is one of the physical models; it used the weather parameters as inputs like height of land above sea level, temperature, topography, and wind speed, precipitation etc to predict the wind power output. This model is mostly used for long term forecasting. The main disadvantage of using this model is that the cost of implementation, complexity involved in it, takes huge time in processing (some times days) and training of the model.

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

2.3.1Time series models: It is the sub-class of statistical method, the definition is discussed earlier. The most important models in this series are: AR, MA, ARMA, ARIMA and CARIMA [3]. Out of which, CARIMA will not be discussed in this paper. 2.3.1.1. Auto Regressive: (3) where are auto-regressive parameters, is white noise and is order of the model order. This model is stable only for parameters values within a certain range ( .The validity of this time series is only when series is made stationary. In compact form, it is written as ; (4) where, at is white noise with (5) 2.3.1.2. MA (Moving Average): In this series, the error of the past terms of the data set is written as infinite weighted linear sum. Its equation is (6) where, is white noise; ,... is moving average parameters and q is the order of MA Its compact from is (7) where, . This is valid for univariate time series. This model is stable only for parameters within a certain range ( ). 2.3.1.3.ARMA: The combination of the AR and MA models make up the ARMA model. The order of ARMA is represented as (p, q). Its compact form is written as ; where (9) (10) . (11) The presence of both AR and MA terms in the ARMA model enables the representation of complex time series with fewer parameters that would be needed using a corresponding AR model. (8)

10

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

2.3.1.4. ARIMA: Its structure is (p, d, q) where p=order of AR; d=number of differencing done on time series; q=order of MA. The differencing is done to make the series stationary. This method is used when the model has higher fluctuation and the d is generally not greater than 2 for simplicity. Here is changed to . (12)

S.NO 1. 2. 3. 4.

DESCRIPTION Model in terms of Model in terms of a =weights Stationary condition

AR

MA

ARMA

Infinite series Finite series Infinite series Roots of Always Roots of lie stationary lie outside the outside unit unit circle circle Table 1(above): shows the difference in the Time series model [3]

2.4. Time horizon: It implies the time range ahead forecasting in the future. Classifying the wind forecasting methods based on time scale is vague. It is classified into veryshort term, short term, medium term and long term forecasting[7] Very-short term: few seconds to 30min ahead. Its implications are in electricity market clearing. Short-term: 30min to 6hours ahead. Its implications are in Economic load dispatch planning. Medium-term: 6hour to 1day ahead. Its implications are in Generator online/offline decisions Long-term: 1day to 1 week ahead. Its implications are for unit commitment decisions and reserve requirement decisions. The above time horizons are strictly limited and some relaxation is possible based on the application of forecasting model. Different governing bodies, countries will have different standards & protocols. So, there is strict limit on the range of each horizon. There is no universal forecasting model for all time horizons mentioned above because there exists no model which can be applicable to all time horizons, as each model either it is statistical or neural networks or any other has its own restrictions and implications for each time horizon. The restrictions can be accuracy, complexity, cost, time for processing, number of input variables,

11

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

training the data. All the above features cannot be set right for all the time horizons with feasibility and in practical perspective. In the case of my report, short term forecasting implies forecasting 24 hours future values.

3. SIMULATION AND RESULTS

3.1 Wind data set: The data for the analysis is for the unknown location in India from the CWET (centre for wind energy technology) with turbine height of 50m for unknown duration in the year.

Fig.1 (above): shows the plot of the dataset for 360 hours (15 days)

There is a trend that can be seen from fig.1, the graph is fluctuating rapidly with trend of decreasing and then increasing. The wind speed is swiftly changing with respect to time; it is due to the dynamic nature of wind. There is high volatility in fig.1 following a certain trend that can be observed in fig.2. This trend can be due to seasonality effect. However, the trend can be removed by differencing the actual data (also called stationarity). The mean of the data set is 8.2(m/s). 3.2 ACF for the input dataset: ACF describes how well a signal correlates with itself under conditions where the signal is displaced with respect to itself in all possible directions. It is also known as the cross-correlation of the signal with itself. ACF is the similarity between observations as a function of the time separation between them.
12 PAGE 01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

(13) where, is a scalar function. The ACF range is between + . Higher the value of ACF more is the corelation. In the case of my models, the ACF going outside the range +1 to-1, the model using this ACF will not be stable. The values closer to +1 or-1 indicates high or strong correlation. For ACF values above +1 or -1, the model will lose its characteristics. If there are sharp peaks for initial lags (but less than +1 or -1) and remaining ACF values are less compared to the sharp peaks; then using the initial lags with high peak, the rest of lags can be expressed in terms of initial lags. Then order of the model using ACF will be the number of initial high peaks above the threshold. Since, there is an obvious trend: the trend can be removed by differencing the data (which is known as stationarity). Since, all the values of ACF in below graph are less than +1 or -1: the model using ACF is stable.

Fig.2 (above): shows ACF for the data set upto 72 lags

From the graph, it can be observed that ACF is strictly decreasing with length. This is due to the decreasing co-relation of data points as the lags increase with the previous data points. Also, there is high co-relation in the starting of ACF for lags (1, 2, 3, 4) and there is also significant correlation between (22, 23, 24), (46, 47, 48), (72, 73). The significant co-relation is observed at the ends of each 24 lag period. The above graph is not decaying rapidly and also there are no cut-off points for lags upto 72 because of the presence trend in ACF. We are neglecting the points below y=0 and less than threshold (differs for a particular model) so as to make our calculation easier and reduce the dynamics of the model. ACF is used to calculate the order of AR model and its co-efficient.
13 PAGE 01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

3.3. PACF for input dataset: PACF is conditional co-relation between same variables when the effects of one or more related variables are removed. Partial autocorrelation at lag k is the auto correlation between and that is not accounted by the lags 1 through k-1.It is the co-relation between the same variable. The PACF at lag k is the auto correlation between and .For the time series models, PACF has range between +1 to-1. Values outside this range, makes the model unstable.

Fig.3 (above): shows PACF for the dataset upto lag 72

We can see that there is a high correlation at lag 1 and all other lags from 2 to 72 are within bound of +0.2 to -0.2. The value at lag1 is 0.8867. So, using value at lag 1, all the other lags can be explained. There is no trend in the above plot. This graph also explains that lag 1 is sufficient to explain rest of lags upto 72. PACF is used to calculate the order of MA and its co-efficient. 3.4. MAPE: It is a measure of accuracy in fitted time series value. The absolute values of all errors are summed and the average is computed. It expresses accuracy as a percentage. It has the same units as the original data. It can take range of value between zero to infinity. (14) Where the actual value is observations. is the actual value and n is the total number of

14

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

3.5. Variance: It is the average of the squared differences from the mean and the square root of variance is standard deviation. It is also known as a measure of variability. It has the units of the mean of the data. (15) (16) is the predicted value and N is the

here, Where is the variance, number of observations.

is the mean,

3.6. Simulation for Auto-regressive model: AR model for forecasting wind speed is calculated using equation (3). The AR series does not include the error for each data points. The AR co-efficient are calculated from ACF plot Vs lags (which is shown above) and depending on the threshold, the order of the AR model is decided. S.NO THRESHOLD 1 0.25 LAGS
1, 2, 3, 4, 5, 20, 21, 22, 23, 24, 25, 26, 27, 46, 47, 48, 71, 72, 73, 74 1, 2, 3, 4, 5, 21, 22, 23, 24, 25, 26, 72, 73 1, 2, 3, 4

ORDER 20

AR MODEL AR(20)

MAPE 16.05

VARIANCE
0.0186

0.3

13

AR(13)

16.21

0.0203

0.4

AR(4)

16.40

0.0285

Table 2. Shows the different threshold values of AR and corresponding AR orders Using equation (4), and also using equation (5),

S.NO 1

THRESHOLD 0.25

EQUATION

ORDER AR(20)

15

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

0.3

AR(13)

0.4

AR(4)

Table 3(above): shows AR (20), AR (13), AR (4) equations From the above tables, we can infer that the number of lags (or co-efficient) is equal to the order of the model. As the order of the model increases, the MAPE decreases; but the decrease is small compared to lesser order. Trend in fig.2 is higher for initial lags and trend is decreasing as the lags increase. So, initial lags contribution to the forecasting output is higher. Also, there is not much change in the MAPE value as the order increases. The AR (20) has least MAPE and least variance, as there are 20 co-efficient which are leading to higher accuracy. But in case of AR (4), there are only 4 co-efficient with MAPE and variance slightly higher than AR (20) because the four co-efficient contribute maximum to the predicting output and remaining 16 co-efficient in AR (20) contribute to decrease MAPE and variance by 0.35 units and 0.0099 units respectively. Also, AR (13) has lesser MAPE and variance compared to AR (4) because the first four coefficient contribute a maximum to the predicting output and the remaining nine co-efficient contribute very little to the predicting output, but the decrease in MAPE and variance is meagre amount of 0.19 units and 0.0082 units respectively. So, higher order greater than 20 will have very less effect on the MAPE andvariance. Also, for a higher order the complexity, fitting of model problems arise. From fig.4, no predicted model is following the trend of 24 hours for actual data.

16

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

Fig.4: Shows the plot of AR models with different orders

So, from the tables above and the fig.4, we can go for AR(4) among AR models due to the minimum order and has a MAPE,variance slightly higher than AR(13), AR(20) ( refer table 2.)

3.7. Simulation for ARMA: ARMA nodel forecasting is done using equation (9),will have the regresion part from AR model and error part from MA model. The order of ARMA is decided from ACF and PACF. The order of AR is p and the order of MA is q. So, the order of ARMA is (p,q).

S.NO THRESHOLD LAGS ORDER ARMA MAPE VARIANCE (AR) MODEL 1, 2, 3, 0.0216 1 0.25 (20,1) ARMA(20,1) 16.87
4, 5, 20,21, 22, 23, 24, 25, 26, 27, 46, 47, 48, 71, 72, 73, 74 1, 2, 3, 4, 5, 21, 22, 23, 24, 25, 26, 72, 73 1, 2, 3,

0.3

(13,1)

ARMA(13,1)

17.94

0.0614

3
17 PAGE

0.4

(4,1)

ARMA(4,1)

18.35

0.0409
01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

Table 4 (above). :ARMA order with different threshold values

S.NO 1

ARMA THRESHOLD 0.25

EQUATION

ORDER ARMA(20,1)

0.3

ARMA(13,1)

0.4

ARMA(4,1)

Table 5(above): Shows ARMA equations for AR(20), AR(13), AR(4).

The order of MA is found by PACF fig.3, which has a co-efficient at lag 1 with value 0.8667. From table 4: for a higher order, MAPE is less but variance is more compared to smaller order. The difference in MAPEs of ARMA (4,1) and ARMA (20,1) is 1.48 units and corresponding variance difference is 0.0193 units. The change of 1.48 MAPE is due to incorporating more terms in ARMA (20,1) which are contributing to predicting output. The rise in variance of ARMA (13,1) is because the data points are away from the mean. So, out of ARMA (20,1), ARMA(13,1) and ARMA(4,1): ARMA(20,1) is chosen beacuse of less variance and MAPE.

18

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

Fig.5: shows the plot of ARMA models with different order.

3.8 .RESULTANT BEST MODEL APLLICABLE: AR model is chosen above ARMA model beacuse of less MAPE and variance. AR model has lesser MAPE compared to ARMA MAPE (table 2 and table 4). Also, AR model has lesser variance compared to ARMA model (by looking at the two tables mentioned above). So, AR model is chosen above ARMA on the basis of variance and MAPE. Now within the AR model, there is not much change in MAPE for AR(4), AR(13), AR(20) model which is above 16 but less than 17 of the MAPE. But there is a change in variance: the variance of AR(20) is less than AR(4) by 1.5 times. Although, the AR(4) has more varianvce to ARMA(4,1) but this increased variance is very less 0.0069. So, considering MAPE and variance, AR(4) has marginally above MAPE and marginally above variance than the rest of AR model. AR(4) will give higher accuracy (or almost the marginally same acuracy with less complexity) compared to other AR and ARMA models.

ASSUMPTIONS:
The assumptions taken in the forecasting models are as follows: a) Only wind speed is taken as input but in actual & practically other variables effecting wind speed should also be taken as input. b) Seasonality effects have not been considered. I mean the season for which is taken is not known and also the location of the turbine in india is not known, so is the height of the turbine above sea level. c) The PACF plot of data shows un-expected behaviour after 180 lags, the actual reason is known so that the errors can be improved. d.Having only 336 hourly information to predict next 24 hours is not enought
19 PAGE 01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

because of the dynamic nature of wind.

CONCLUSION:
From the above results: we can see that by using time series models has MAPE around 16. This MAPE =16 is mostly due to rapid spikes in input in a short duration and also, there are less data points (336 hours) to forecast the next day. The time series model is one of the basic methods of forecasting methids. But, by using neural networks or hybrid models the MAPE can be reduced drastically, as there are better forecasting models for short-tem wind forecasting. Also, we know that there is no universal model for all the time horizons with resonable accuracy due to the models limitation and validity. Global change will have a huge impact on wind speed, as there will be rapid fluctuating in climate having extreme climates during a season. That will lead to un-predicatable wind speed with huge variations.

20

PAGE

01-12-2011

B.TECH PROJECT

UNDERGRADUATE THESIS

IIT GANDHINAGAR

REFERENCES:

1. Lange. M, Focken. U., "New developments in wind energy forecasting," Power and Energy Society General Meeting - Conversion and Delivery of Electrical Energy in the 21st Century, 2008 IEEE, vol., no., pp.1-8, 20-24 July 2008 2. Rajagopalan.S, Santoso S.,"Wind power forecasting and error analysis using the Autoregressive moving average modeling," Power & Energy Society General Meeting, 2009. PES '09. IEEE, vol., no., pp.1-6, 26-30 July 2009. 3. Palit. A.K, Dobrivoje .P., Computational intelligence in Time series Forecasting (Theory and Engineering Applications) 4. Hill. D. C, McMillan. D, Bell, K.R., Infield, D., "Application of Auto-regressive Models to UK Wind Speed Data for Power System Impact Studies," Sustainable Energy, IEEE Transactions on, vol.PP, no.99, pp.1, 0 5. Campbell P.R.J., "Short-Term Wind Energy Forecasting," Electrical Power Conference, 2007. EPC 2007. IEEE Canada, vol., no., pp.342-346, 25-26 Oct. 2007. 6. The State-Of-The-Art in Short-Term Prediction of Wind Power by project Anemos. Available online at:http://anemos.cma.fr/download/ANEMOS_D1.1_StateOfArt_v1.1.pdf,2003 7. Soman S.S., Zareipour. H., Malik, O., Mandal, P., "A review of wind power and Wind speed forecasting methods with different time horizons," North American Power Symposium (NAPS), 2010, vol., no., pp.1-8, 26-28 Sept. 2010. 8. ACF, PACF and ARMA models by Olivier Scaillet (University of Geneva and Swiss Finance Institute).Available online at http://www.hec.unige.ch/scaillet/coursstats/ACF.pdf 9. Identifying the numbers of AR or MA terms. Available online at Identifying%20the%20orders%20of%20AR%20or%20MA%20terms.htm 10. Energy Security in the 21st Century. Available online at www.americanprogress.org/kf/energy_security_report.pdf

21

PAGE

01-12-2011

You might also like