Professional Documents
Culture Documents
Access Details: [subscription number 792960683] Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
To cite this Article Charhate, S. B., Deo, M. C. and Londhe, S. N.(2009)'Genetic programming for real-time prediction of offshore
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
Introduction The knowledge of magnitude and direction of wind plays a vital role in the planning, design and operation of coastal and ocean engineering facilities. Information on wind further enables one to obtain the same of wind-generated waves and currents. Forecasting of wind speed and direction in real-time and over future time-steps helps in planning engineering works in the coastal and offshore region, scheduling aircraft operations, predicting output of wind turbines and issuing warnings for shing, recreational or similar activities in the ocean. For a large number of ocean locations of the world data on wind speed and direction are routinely collected through oating wave-rider buoys, which are essentially designed to obtain measurements of waves, but additionally and simultaneously provide supplementary information like wind and temperature parameters. If such wind observations, made at a sampling interval of 1 hr or 3 hr are available on real-time basis in the form of a time series, then the problem of prediction of wind speed for future time-steps can be tackled as a uni-variate time series-forecasting issue. Investigators like More and Deo (2002), Cadenas and Rivera (2007) had previously shown that the real-time prediction of wind speed observed by wave buoys can be satisfactorily done by the soft computing tool of articial neural networks (ANN) and further such predictions could be more advantageous than the traditional stochastic modeling methods of Auto Regressive Moving Average (ARMA) or Auto Regressive
Integrated Moving Average (ARIMA). Zhang (2003) suggested that a combined ANN and ARIMA model would work marginally better for this purpose than individual ANN or ARIMA approach. Investigators working in the eld of hydraulic or hydrologic engineering (e.g. Drecourt, 1999; Muttil and Liong, 2004) have found that another soft approach, namely, genetic programming (GP) worked either equally well or some times even better than the ANN. This has provided motivation to authors to apply the technique of GP to the problem of online wind prediction. A specialty of this work is that it provides application of GP to the task of temporal regression as against the earlier studies dealing with evaluation of causal or spatial relationships with the help of GP.
Genetic Programming and Past Applications The GP is similar to genetic algorithms (GA) in concept but unlike the latter, which has a set of numbers, it provides its solution in the form of a computer programme or an equation. Basically in GP a random population of individuals (equations or computer programmes) is created, the tness of individuals is evaluated and then the parents are selected out of these individuals. The parents are then made to yield offspring by following the process of reproduction, mutation and cross-over. (See Appendix 1 to know how these operations are carried out.) The creation of offspring continues (in an iterative manner) util a specied number
ISSN: 1744-5302 print / 1754-212X online Copyright C 2009 Taylor & Francis DOI: 10.1080/17445300802492638 http://www.informaworld.com
78
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
Figure 1. Coastline of India showing wave buoy locations at stations DS1 and DS7.
of offsprings in a generation are produced and further util a specied number of generations are created. The best or the ttest resulting offspring at the end of all this process, i.e. an equation or a computer programme, is the solution to the problem. The GP thus transforms one population of individuals into another in an iterative manner by following natural genetic operations like reproduction, mutation and cross-over. More details of the working of GP can be found in Koza (1992). Applications of GP in coastal engineering are rare, unlike in the eld of water resources and hydraulics that started around 1999. (For example, see Drecourt 1999; Whigham and Crapper 2001; and Muttil and Liong 2004). Some of these authors have also presented comparisons with other models. Drecourt (1999) reported that GP handles peak ows better while ANN takes care of noise efciently. Muttil and Liong (2004) found the performance of GP marginally better than ANN. Most recently Charhate et al. (2007) made a forecast of coastal currents in a tidedominated area off the Gulf of Khambhat and found that GP predictions were more satisfactory than ordinary ANN and ARIMA schemes. Kalra and Deo (2008) reported the usefulness of GP in in-lling gaps in wave-height-time series. The Database and Model Calibration The National Institute of Ocean Technology (NIOT) at Chennai, India has been collecting oceanographic data since recent past through a series of wave-rider buoys deployed along the coastal and offshore belt of India. These
oating buoys have wind anemometers tted at the standard height of 3 m above sea level through which wind-speed observations are routinely collected every 3 hrs. The present study is based on such observations collected by NIOT and these pertain to two deep-water locations off Goa (station: DS1) and off Minicoy (station: DS7), respectively. Figure 1 depicts sites of the data collection. Station DS7 is open to both Arabian as well as Bay of Bengal and large variations in the wind are expected at this place. The duration of data collection varied from April 2004 to July 2005 and from June to September 2006 at site DS1, while at station DS7 data collected belonged to the three-years period of 2004 to 2006. This study involved time-series forecasting over leadtime steps of 3, 6, 9, 12 and 24 hrs based on the previous segment of wind measurements. By trial it was noticed that a sequence of three preceding observations was sufcient for the GP model to recognise an unknown hidden pattern in these and use the same to produce the forecasted value. Two different types of models were developed to see the more advantageous of the two. These included prediction of wind speed alone and of speed and direction together. In case of station DS1 the measurements of the rst 11 months were used to calibrate the GP model while the remaining 4 months observations were used for verifying or testing the developed models. The results obtained using these models at DS1 were further validated with the observation for a period of June 2006 to September 2006. For station DS7 the rst 24 months observations were used for
79
20
DS13 hr ahead GP prediction
Exact fit
15
10
Output layer
5
R =0.95
0
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
training and the remaining 12 months were used to test the calibrated model. While applying GP a typical choice of control parameters used is as follows: population size = 2048; No. of generations = 405; mutation frequency = 80%; cross-over frequency = 52%. The programmes were generated with the help of Discipulus software (Fancone, 1998), and were further processed for application to new data sets using TurboC in the C++ environment. The statistical measures of correlation coefcient (R ), root mean square error (RMSE) and mean absolute error (MAE) were used to compare the GP predictions with actual observations and were evaluated by using MATLAB, which also facilitated generation of the scatter plots between the target output and the one obtained through GP. The coefcient of correlation (R ) measures the linear association of two given variables. However, it cannot detect any complex non-linear dependency between them, if present. It is also very sensitive to deviations at larger observations. The root mean square error indicates an overall agreement (without any upper bound) between the observed
10
15
20
Figure 4. Observed versus GP-predicted wind speed at station DS1 (lead-time: 3 hr; testing data).
and modeled datasets and is assumed to be good for predictions that are iteratively arrived at. But it gives only an overall picture of errors. The mean absolute error also gives only an overall agreement between the observed and modeled datasets, but it is useful for practical interpretations. It is not weighted towards high- or low-magnitude events, but instead evaluates all deviations from the observed values in an equal manner and regardless of sign. The MAE is non-negative, has no upper bound and is hence advantageous, but provides no information about under-estimation or over-estimation. It is comparable to the total sum of absolute residuals. It may thus be seen that a given measure of error statistics is associated with some advantages and some drawbacks and hence a combination of multiple-error
4
Observation Prediction
0 1
03-04-2005
101
201
301
401
501
601
701
801
901
26-07-2005
Time in hr
Figure 3. Observed versus GP-predicted wind speed at station DS1 (lead-time: 3 hr; testing data).
80
20
Observation Prediction
16 Wind speed (m/s) 12 8 4 0 1 300 599 898 1197 1496 1795 2094
2393
2692
26-12-2006
01-01-2006
Time (hr)
Figure 5. Observed versus GP-predicted wind speed at station DS7 (lead-time: 6 hr; testing data).
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
measures accompanied by a physical examination of scatter and time-series plots, as done in this study, can give a reasonable idea of performance of the model t. The same problem of predicting wind speed and direction was also solved by using a 3-layer feed-forward articial neural network (Figure 2), trained using the most appropriate form of the error-back propagation algorithm. A similar division of training and testing of data like the earlier GP was maintained. It is to be noted that applications of ANN to derive the direction of wind are too sparse; a few noticeable among them are, Thiria et al. (1993) who evaluated the wind direction using simulated data as well as employing a spatial
input context, and, Cornford et al. (1999), who estimated wind vectors from the scatterometer data. Attempts to forecast the wind speed and direction based on their current direct measurements did not yield good results. Hence, a different training scheme was adopted. According to it, if U = resultant wind-speed vector and = angle made by U with the North direction, then the two orthogonal wind components along the NorthSouth and EastWest directions would be given by u and v , where u = U cos and v = U sin . From the measured values of U and , u and v components were obtained as above and the same were thereafter separately predicted using two separate GP models further and such predictions were v combined using U = u2 + v 2 and = tan1 u
Figure 6. Observed versus GP-predicted wind speed at station DS7 (lead-time: 6 hr; testing data).
Results The best programme obtained at the end of the GPcalibration process and developed using the training portion of the sample was tested for the observations not involved in the training exercise. The resulting predictions of speed and direction over the time-steps of 3, 6, 9, 12 and 24 hr were compared against their target values. A similar process was followed in the case of ANN as well. Figures 3 and 4 show typical comparisons between the GP-predicted 3-hr wind speeds with corresponding actual observations at DS1 during testing in the form of time series and scatter plots. Figures 5 and 6 depict similar plots for 6hr predictions at the other location, i.e. DS7. These gures pertain to the case when the predictions were not based on wind-vector resolution. For the latter case (predictions on the basis of vector resolution), the model forecasted versus observed speeds are shown in Figures 7 to 10 for site DS1 and in Figures 11 to 14 for station DS7 as examples. These gures pertain to the cases of higher lead-time varying from 9 to 24 hr. The time-series plots of observed versus
81
10
Time (hr)
Figure 7. Observed versus GP-predicted wind speed at station DS1 (predictions based on components; lead-time: 12 hr; testing data).
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
predicted directions for the 24-hr lead-time at DS1 and 12-hr lead-time at DS7 are shown in Figures 15 and 16, respectively. It may be noticed that the technique of GP carried out the intended task well and even higher levels of wind speed were also well-predicted by this method. The third column in Tables 1 and 2 shows the testing performance of GP for wind-speed predictions in terms of the error statistics when the observed total speed was not broken into components to build the model along with a similar comparison based on the ANN model. The fourth and the fth columns in Tables 1 and 2 show the performance of GP and ANN models when the u v components
20 Combined uv components
15
10
R = 0.87
0 0 5 10 15 20
Figure 8. Observed versus GP-predicted wind speed at station DS1 (predictions based on components; lead-time: 12 hr; testing data).
were used to build the model and predict the resultant wind speed and direction. It appears that the training of GP and ANN models made on the basis of splitting the wind vector into two orthogonal components was useful, although a separate model for wind speed alone (the third column in Tables 1 and 2) would be preferable for better performance in case only the wind speed is desired. The tables and gures referred above indicate that the GP is able to recursively recognize an unknown hidden pattern in the preceding sequence of wind measurements and make a satisfactory forecast of future values accordingly. An excellent performance of the GP in this problem of wind-speed prediction based on temporal correlations with preceding observations is thus noteworthy. Although it is difcult to specify any cut-off level for a good model t, R > 0.80 can be normally regarded as an indication of satisfactory model performance. The forecasting at station DS1 can be therefore seen to be satisfactory even over the longer lead-time of 24 hr as reected in high values of R and low values of RMSE and MAE. However, the same at station DS7 was good only up to the lead-time of 9 hr. This is due to highly open nature of this site that is exposed to both the Arabian Sea as well as an Indian Ocean. Tables 1 and 2 also show how an equivalent ANN is performed in comparison with the GP. These tables along with Figures 3 to 16 indicate that the results of GP surely rival those of the much researched and established tool of ANN for all the lead-times. Although the relatively small differences in error statistics between GP and ANN methods may not necessarily mean statistically signicant deviations between them, it may be noted that GP marginally but consistently yielded more attractive statistics. This was true at the open location DS7 where wider variations in the wind speed and direction can be expected. The present study may prompt additional research to know if GP could
82
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
Table 2. Wind speed and direction forecasting at station DS7. 1 2 3 Speed prediction without using components Lead-time 3 hr 6 hr 9 hr 12 hr 24 hr Method GP ANN GP ANN GP ANN GP ANN GP ANN R 0.86 0.87 0.81 0.82 0.78 0.75 0.75 0.72 0.69 0.67 RMSE m/s 1.30 1.39 1.42 1.48 1.60 1.66 1.62 1.70 1.83 1.88 MAE m/s 0.96 1.04 1.10 1.15 1.23 1.29 1.21 1.38 1.42 1.48 4 Speed prediction using u and v components R 0.83 0.81 0.78 0.77 0.75 0.73 0.74 0.71 0.67 0.66 RMSE m/s 1.52 1.58 1.75 1.78 1.86 1.93 1.95 2.04 2.12 2.21 MAE m/s 1.31 1.39 1.41 1.48 1.72 1.84 1.69 1.80 1.86 1.93 R 0.78 0.77 0.72 0.70 0.68 0.65 0.65 0.63 0.62 0.60 5 Wind direction Mean deviation (degrees) 21 23 23 28 26 30 29 32 34 38
20
Observation Prediction
15
10
Time (hr)
Figure 9. Observed versus GP-predicted wind speed at station DS1 (predictions based on components; lead-time: 24 hr; testing data).
83
15
15
10
10
5
0
R =0.75
R = 0.86
0
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
10
15
20
20
Figure 10. Observed versus GP-predicted wind speed at station DS1 (predictions based on components; lead-time: 24 hr; testing data).
Figure 12. Observed versus GP-predicted wind speed at station DS7 (predictions based on components; lead-time: 9 hr; testing data).
be really more useful in situations where the input variations are large and random or if GP can deal with input noise more efciently than ANN. It may also inspire more work to understand if the capability of GP to generate innumerable new values and assess their usefulness efciently gives it an edge over the ANN. In general the long-interval predictions were less accurate than the corresponding short-interval forecasting for both techniques. This could be attributed to the highly unpredictable dependency between the values separated by longer intervals in general. The direction prediction using
both approaches was found to be quite good with mean errors within 15 to 26 at DS1 and 21 to 34 at DS7. It has been widely reported in many water-related applications (e.g. Thirumalaiah and Deo, 1998; Hong and Rao, 2003; More and Deo, 2003) that soft tools like ANN are generally more benecial than traditional statistical regressions and hence in this work such a comparison with regression methods was not studied again. A forecasting model can be said to work well if it shows better performance than a persistence model in which the current observation is given as the forecasted value. This can be veried from Tables 3 and 4, which show, as examples, error statistics during the testing of GP and persistence model. The underlying forecasting is made over different
20
Observation Prediction
15
10
0
1 132 263 394 525 656 787 918 1049 1180 1311 1442 1573
Time (hr)
Figure 11. Observed versus GP-predicted wind speed at station DS7 (predictions based on components; lead-time: 9 hr; testing data).
84
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
Table 4. Wind speed forecasting at station DS7. 1 2 3 Speed prediction without using components Lead-time 3 hr 6 hr 9 hr 12 hr 24 hr Method GP Persistence model GP Persistence model GP Persistence model GP Persistence model GP Persistence model R 0.86 0.73 0.81 0.70 0.78 0.65 0.75 0.59 0.69 0.55 RMSE m/s 1.30 1.62 1.42 1.81 1.60 1.98 1.62 2.10 1.83 2.31 MAE m/s 0.96 1.28 1.10 1.42 1.23 1.68 1.21 1.79 1.42 1.82
20
Observation Prediction
15
10
0 1 117 233 349 465 581 697 813 929 1045 1161 1277 1393 1509
01-05-2006 16-12-2006
Time (hr)
Figure 13. Observed versus GP-predicted wind speed at station DS7 (predictions based on components; lead-time: 12 hr; testing data).
85
15
10
R =0.74
0
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
10
15
20
Figure 14. Observed versus GP-predicted wind speed at station DS7 (predictions based on components; lead-time: 12 hr; testing data).
implemented in the eld through an integrated platform or a graphical user interface (GUI). The 3-hr measurements of wind speed and direction collected at a number of buoy locations in the Arabian Sea (Figure 17) are sent by telemetry to a server located at NIOT Chennai in India. Currently such observations are made available to registered clients of NIOT through a Web-based service. The GUI under consideration connects the models developed in this study to this Web server. An intelligent approach of generation of necessary input les is used and this involves in-lling missing data by spatial correlation with neighbouring stations through the GP method. The user has to click on the station (Figure 17) where he wants to have the predictions. This will generate the screen having a Load button. Clicking on this Load button will bring appropriate input les into the picture, which in turn will be linked to the executable wind-prediction programme developed in this study. The predictions of speed and direction will be made after clicking on the button Show forecasts and will be displayed as shown in Figure 18.
lead-times and for stations DS1 and DS7, respectively. A signicant difference in the relative-error statistics, especially at longer intervals, clearly reveals that the GP (and hence the ANN performing at a similar level to GP) worked better than the persistence model in this study.
Implementation of the Developed Models into the Fields The studies described so far for the prediction of wind speed and direction based on GP are currently being
Conclusions The preceding sections described the development of models based on genetic programming to obtain predictions of wind speed and its direction over lead-time varying from 3 hrs to 24 hrs. The performance of the GP models during testing was found to be satisfactory as judged by the error statistics of correlation coefcient, root mean square error and mean absolute error. The relatively new soft computing tool of GP was able to successfully recognize a hidden pattern in the preceding 3-hr wind-speed observations and make predictions for future time steps accordingly in
400 350
300 250 200 150 100 50 0 1 101 201 301 401 501 601 701 801 901 Observed wind direction Predicted wind direction
R = 0.79
86
400 350 Wind direction (degree) 300 250 200 150 100 50 0 1
R = 0.65
101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 Time interval (hr)
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
Figure 16. Observed versus GP-predicted wind direction for 12-hr ahead forecast at DS7.
a satisfactory manner. A comparison with corresponding predictions yielded by articial neural networks showed that the GP-based predictions rivaled those of more traditional and established ANNs. Further for the open ocean
location DS7, where wider variations in the adjacent wind measurements can be expected, the GP showed possibility of marginally better performance than the ANN even at the 24-hr ahead prediction.
87
Downloaded By: [ABM Utvikling STM / SSH packages] At: 09:43 21 July 2009
The preceding section also showed that the training of GP and ANN models developed on the basis of splitting the wind vector into two orthogonal components was useful in predicting both wind speed and its direction simultaneously over different time intervals, although a separate model for wind speed alone would be preferable for better performance in case only wind speed is desired. The success of the technique of GP noticed in the current study dealing with temporal regression may inspire other applications of GP in coastal and ocean engineering such as a spatial mapping or a cause-effect modeling.
Appendix 1 Examples of genetic operations Generating population A programme [(q + ( )1/2 ) / 3 p] is given in Figure 19 in the form of a tree structure. A population of random trees representing the programmes is initially constructed and genetic operations are performed on these trees to generate individuals with the help of two distinct sets: the terminal set T and the function set F. For Figure 19, { + /} F and {, 3, p, q } T.
In order to generate a random tree one has to pick randomly from T F , until all branches end up in terminals.
Cross-over Two random nodes are selected from inside such programme (parents) and thereafter the resultant sub-trees are swapped, generating two new programmes as in Figure 20.
88
/ + + v C ro s so v e r P a re n t 2 / + _ v 2 / 2 _ / +
* 5
P a re n t 1
p q
*p
q
+ _ v +
q
C h ild 2
C h ild 1
p q
/ M utation + 5 + v 2
p q
*p
+ v
Reproduction This means an exact duplication of the programmme if it is found to be acceptable by the tness criteria. References
Cadenas E, Rivera W. 2007. Wind speed forecasting in the South Coast of Oaxaca, Mexico. Journal of Renewable Energy. 32:21162128; Elsevier.