Professional Documents
Culture Documents
JANUARY 2012
iii
DEDICATION
iv
ACKNOWLEDGEMENT
Assalammualaikum w.b.t.
Alhamdulillah, all praise to Allah S.W.T for the gift of life and what I have achieved
today.
Appreciation goes to my family for their prayers, moral and financial support. May
Allay reward you abundantly.
My sincere and deepest gratitude goes to my supervisor, Dr. Sobri Harun for his
guidance, encouragement and support in completing this master project.
My gratitude to Dr. Muhammad Askari for his invaluable suggestions, guidance, and
encouragement.
Last but not least, to all my lecturers, classmates and friends, their help and supports are
really appreciated and will be remembers forever, InsyaALLAH. Thank you all
ABSTRACT
Streamflow forecasting plays important roles for flood mitigation and water
resources allocation and management. Inaccurate forecasting will cause losses to water
resources managers and users. The suitability of forecasting method depends on type and
number of available data. Thus, the objective of this study are to propose the streamflow
forecasting methods using Markov and ARIMA models and to inspect the accuracy of
Markov and ARIMA models in forecasting ability. Streamflow data of Sungai Bernam,
Selangor was used. Minitab and Microsoft Excel were used to model ARIMA and
Markov respectively. Criteria performance evaluation procedure that being used in this
study were Mean Absolute Percentage Error (MAPE), Root Mean Squared Error
(RMSE) and Chi-square test of Normality to inspect the forecasting accuracy of the
different models. The tentative model that best fits the criteria and meets the requirement
for ARIMA model is ARIMA (1,1,1)(0,1,1)12. From the criteria performance evaluation
procedure, ARIMA model has better performance of model for forecasting than Markov
model in this study. Therefore, ARIMA model has the ability to accurately predict the
future monthly streamflow for Sungai Bernam.
vi
ABSTRAK
Peramalan aliran sungai memainkan peranan yang penting untuk kawalan banjir
dan pengurusan air. Peramalan yang tidak tepat akan menyebabkan kerugian kepada
pihak pengurusan sumber air dan juga kepada pengguna. Kesesuaian kaedah peramalan
bergantung kepada jenis dan jumlah data yang tersedia. Maka, objektif kajian ini adalah
untuk mencadangkan kaedah peramalan aliran sungai dengan menggunakan model
Markov dan ARIMA dan untuk memeriksa ketepatan model Markov dan ARIMA dalam
membuat peramalan. Data aliran sungai Sungai Bernam telah digunakan. Minitab
digunakan untuk memodelkan model ARIMA dan Microsoft Excel digunakan untuk
memodelkan model Markov. Prosedur penilaian prestasi kriteria yang digunakan dalam
kajian ini ialah Mean Absolute Percentage Error (MAPE), Root Mean Squared error
(RMSE) dan ujian Chi-Squared untuk memeriksa ketepatan peramalan model-model
yang berlainan. Tentatif model yang terbaik sesuai dengan kriteria dan memenuhi
kehendak untuk model ARIMA ialah ARIMA (1,1,1)(0,1,1)12. Dari prosedur penilaian
prestasi kriteria, model ARIMA mempunyai prestasi yang lebih baik dalm membuat
ramalan berbanding dengan model Markov. Justeru, model ARIMA mempunyai
keupayaan untuk meramalkan dengan tepat aliran sungai di masa hadapan untuk Sungai
Bernam.
vii
TABLE OF CONTENTS
CHAPTER
TITLE
DECLARATION
DEDICATION
ACKNOWLEDMENT
ABSTRACT
ABSTRAK
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
LIST OF APPENDICES
LIST OF ABBREVIATIONS
PAGE
ii
iii
iv
v
vi
vii
x
xi
xii
xiii
INTRODUCTION
1.1
Background of study
1.2
Problem Statement
1.3
1.4
1.5
Scope of Study
LITERATURE REVIEW
2.1
Introduction
2.2
2.3
2.4
10
2.4.1
11
Markov Model
viii
2.4.2
ARIMA Theory
12
2.4.3
ARIMA Algorithms
13
2.4.3.1
AR Model
14
2.4.3.2
MA Model
14
2.4.3.3
ARMA Model
15
2.4.3.4
ARIMA Model
16
2.5
17
2.6
18
2.7
Concluding Remarks
19
METHODOLOGY
20
3.1
Introduction
20
3.2
Markov Model
21
3.2.1
21
3.2.2
Identification of Distribution
23
3.2.3
24
3.2.4
24
3.3
ARIMA Model
25
3.3.1
26
3.3.2
Model Assumptions
3.3.1.1
Data Stationarity
26
3.3.1.2
Normal Distribution
27
3.3.1.3
Outlier
28
3.3.1.4
Missing Data
28
Model Procedure
29
3.3.2.1
Model Identification
29
3.3.2.2
Parameter Estimation
31
3.3.2.3
Diagnostic Checking
31
ix
3.3.3
3.4
32
33
35
4.1
Introduction
35
4.2
36
4.3
Markov Model
38
4.3.1
39
4.3.2
Identification of Distribution
40
4.3.3
43
4.3.4
45
4.3.5
46
4.4
3.4
Minitab Procedure
ARIMA Model
48
4.4.1
Model Identification
49
4.4.2
Parameter Estimation
53
4.4.3
Diagnostic Checking
55
4.4.4
58
4.4.5
59
60
65
5.1
Conclusion
65
5.2
Recommendations
66
REFERENCES
APPENDICES A-G
68
72 - 81
LIST OF TABLES
TABLE NO.
TITLE
4.1
4.2
PAGE
40
42
4.3
45
4.4
46
4.5
47
4.6
51
models
4.7
54
(1,1,1)12
4.8
54
(0,1,1)12
4.9
55
56
56
4.12
58
4.13
60
4.14
62
xi
LIST OF FIGURES
FIGURE NO.
2.1
TITLE
Value of time series with forecast function at 50%
probability limits
PAGE
9
3.1
29
4.1
36
4.2
37
4.3
38
4.4
39
4.5
41
4.6
42
4.7
43
Distribution
4.8
47
4.9
48
4.10
50
4.11
50
4.12
51
4.13
52
4.14
52
4.15
53
4.16
59
4.17
Model Comparison
61
4.18
63
xii
LIST OF APPENDICES
APPENDIX
TITLE
PAGE
72
73
74
75
76
78
80
xiii
LIST OF ABBREVIATIONS
ACF
Autocorrelation Function
AD
Anderson Darling
AR
Autoregressive
ARIMA
DF
Degree of Freedom
K-S
Kolmogorov-Smirnov
LSE
MA
Moving Average
MAPE
PACF
RMSE
R2
Coefficient of Determination
Standard Deviation
SE
Standard Error
Sg.
Sungai
Chi-square
CHAPTER 1
INTRODUCTION
1.1
Background of Study
2
Most forecasting problems involve the use of time series data. In this study, time
series is used to prepare forecasts. Time series is formed from measurements of a
variable taken at regular intervals over time. It is a stochastic process which amounts to
a sequence of random variables. The hydrologic data of streamflows fall under the
category of time series (Gupta, 1989). Time series can be used in application of
forecasting of future values of a time series from current and past values, and can be
used to forecast streamflow (Box and Jenkins, 1976). Time series plots can reveal
patterns such as random, trends, level shifts, periods or cycles, unusual observations, or
a combination of patterns.
Streamflow forecasting plays important roles for flood mitigation and water
resources allocation and management. In water management, the high quality
streamflow forecast and efficient use of this forecast can give considerable economic
and social benefits. Short-term forecasting like hourly and daily forecasting is crucial for
flood warning and defense while long-term forecasting which is based on monthly,
seasonal or annual time series is very useful for reservoir operation, irrigation
management decision, drought mitigation and managing river treaties (Shalamu, 2009).
Recently, due to the increase in data availability from metering stations, real time
data retrieval and increasing computational capability with the development of more
robust methods and computer techniques, time series models have become quite popular
in streamflow forecasting (Wang, 2006). A considerable number of forecasting models
and methodologies have been developed and applied in streamflow forecasting due to
importance of hydrologic forecasting. In this study, Markov and ARIMA model have
been used in the modeling of monthly streamflow processes.
3
The Markov process considers that the value of streamflow at one time is
correlated with the value of the streamflow at an earlier period (i.e. a serial or
autocorrelation exists in the time series). In a first-order Markov process, this correlation
exists in two successive values of the events (Gupta, 1989).
The first order Markov model states that the value of a variable x in one time
period is dependent on the value of x in the preceding time period plus a random
component. Thus, the synthetic streamflow represent a sequence of numbers, each of
which consists of two parts, which are deterministic and random parts (Gupta, 1989).
4
1.2
Problem Statement
There are many time series forecasting methods can be used to predict the
streamflow. However, not all of these methods can produce accurate forecasts.
Inaccurate forecasting will cause losses to water resources managers and users. The
suitability of forecasting method depends on type and number of available data. ARIMA
and Markov models must be inspected to determine the ability of this method to provide
accurate and reasonable monthly streamflow forecasting. Through statistical methods,
the accuracy of both models for forecasting monthly streamflow will be tested and
evaluated. ARIMA modeling approach and Markov model was employed to the data set
to further investigate the behavioral change in the streamflow. The result of the study
can be used as a reference guideline to the flood control as Markov and ARIMA models
best suited for short-term forecasting.
1.3
5
1.4
The aim of this paper is to forecast streamflow by using appropriate time series
modeling approach. To achieve this aim, the following objectives have been identified:
1.5
Scope of Study
In this study, two models of time series are used which are Markov model and
ARIMA model to predict the behavior of streamflow. Streamflow data of Sungai
Bernam, Selangor for the period of 1960 to 2010 were used for the application of the
model. The study area that located in southeast Perak and northeast Selangor is semi
developed area and the size is 186km2.
Streamflow data were obtained from station Sg. Bernam at Tanjung Malim
(Station No. 3615412). The data which is monthly streamflow were collected from the
Department of Irrigation and Drainage, Kuala Lumpur. Computer program that being
used for ARIMA model is Minitab 15 and Microsoft Excel is used for Markov model.
CHAPTER 2
LITERATURE REVIEW
2.1
Introduction
Generally, surface water hydrology is the basis to engineering design and sources
of water. High streamflow may cause disaster like flood and erosion. Short-term
forecasting is needed to control this. Meanwhile, low streamflow can disrupt water
supply to domestic user, industrial, generation of hydroelectric power and irrigation.
Here, long-term forecasting is useful to prevent this problem. Therefore, ability to
generate streamflow forecasting accurately can be used in water flow management and
flood control.
Modeling and forecasting time series has long been practiced by using different
statistical methods. Forecasting models of time series that are commonly used are
ARIMA, moving average, exponential smoothing, regression analysis, and Fourier series
analysis. In this study, Markov and ARIMA model are used to predict monthly
streamflow.
7
2.2
8
The analysis of a time series in the frequency domain is done by the spectral
density that identifies the cyclic nature or periodicity in the series. The density indicates
the cycle in the deterministic data. In a purely random process it oscillates randomly.
The purpose of streamflow synthesis, however is not to analyze a time series but to
generate the data based on the series. This does not require the decomposition of the
time series by the analysis above but an understanding of its statistical properties to
reproduce series of similar statistical characteristics (Gupta, 1989).
2.3
Most forecasting problems involve the use of time series data. Montgomery et al.
(2008) stated that forecasting problems are often classified as short-term, medium term,
and long-term. Short-term forecasting problems involve predicting events only a few
time periods (days, weeks, months) into the future. Medium-term forecasts extend from
one to two years into the future, and long-term forecasting problems can extend beyond
that by many years. Short-term and medium-term forecasts are used for operations
management and development of projects while long-term forecasts can be used for
strategic planning.
In this study, we try to use Markov and ARIMA for long-term forecasting. As we
know, Markov and ARIMA models are best for short-term forecasting. Normally, shortterm and medium-term forecasts are based on identifying, modeling, and extrapolating
the patterns found in historical data. These historical data usually exhibit inertia and do
not change very drastically. Therefore, statistical methods are very useful for short-term
and medium-term forecasting (Montgomery et al., 2008).
9
The use at time t of available observations from a time series to forecasts its
value at some future time can provide a basis for (1) economic and business planning,
(2) production planning, (3) inventory and production control, and (4) control and
optimization of industrial processes (Box et al., 1994). As originally described by Brown
(1962), forecasts are usually needed over a period known as the lead time, which varies
with each problem. Usually, forecasts are made at time t by taking the current month Yt
and previous months Y1, Y2,,Yt-1, to forecast at some future time Ft+1, Ft+2,, Ft+m from
Y value forward.
Figure 2.1: Value of time series with forecast function at 50% probability limits
(Source: Box et al., 1994)
10
2.4
Stochastic modeling of hydrologic time series has been widely used for planning
and management of water resources systems such as for reservoir sizing and forecasting
the occurrence of future hydrologic events. For example, stochastic models are used to
generate synthetic series of water supply that may occur in the future which are then
utilized for estimating the probability distribution of key decision parameters such as
reservoir storage size. Furthermore, stochastic models can be used for forecasting water
supplies and water demands in days, weeks, months and years in advance (Fortin et al.,
2004).
The previous rainfall and streamflow records can be utilized as model inputs for
forecasting the next time step ahead of the streamflow (Mohd Shafiek et al., 2005). This
study employs the previous streamflow records to forecast the streamflow discharge of
the following month.
There are some stochastic models that can be utilized for synthetic generation
and forecasting of hydrological process. Hydrologic processes such as monthly
streamflow may be well represented by stationary linear models such as Markov process
11
or autoregressive (AR) and autoregressive integrated moving average (ARIMA) models.
These models are usually capable of preserving the historical annual statistics, such as
the mean, variance, skewness and covariance (Fortin et al., 2004). In this study, Markov
and ARIMA models are used to predict future monthly streamflow.
The Markov process considers that the value of an event (i.e. streamflow) at one
time is correlated with the value of the event at an earlier period (i.e. a serial or
autocorrelation exists in the time series). In a first-order Markov process, this correlation
exists in two successive values of the events. The first order Markov model, which
constitutes the classic approach in synthetic hydrology, states that the value of a variable
x in one time period is dependent on the value of x in the preceding time period plus a
random component. Thus the synthetic flow for a stream represent a sequence of
numbers, each of which consists of two parts:
(2.1)
where
is flow at ith time (ith number of a time series); di(t) is deterministic part at ith
time; and ei is random part at ith time. The values of ei are tied up with the historical data
by ensuring that they belong to the same frequency distribution and posses similar
statistical properties (mean, deviation, skewness) as the historical series (Gupta, 1989).
The various forms and combinations of deterministic and random component are
recognized as different models. Single season (annual) flow model of lag 1 is the
12
simplest model which assumes that the magnitude of the current flow is significantly
correlated with the previous flow value only. In the other hand, multiple-season models
divide the yearly flow into seasons or months (Gupta, 1989).
First order Markov Model has been successfully applied to many problems.
Examples include modeling sequential data using Markov chains, and solving control
problems posed in the Markov decision processes (MDP) framework. If the Markov
models parameters are estimated from data, the standard maximum likelihood estimates
consider the first order (single step) transitions only. But for many problems, the first
order conditional independence assumptions are not satisfied as a result of the higher
order transition probabilities can be poorly approximated by the learned model
(Joomizan, 2010).
The assumption of first order Markovian processes for representing the inflow
process of a reservoir has generally been considered in the literature as adequate for
most purposes. The development of models incorporating other approaches result in
extremely complex transition probability matrices (Wurbs, 2005).
13
uncorrelated variable is called a white noise series which many useful models can be
constructed from it.
The ARIMA modeling is essentially an exploratory data-oriented approach that
has the flexibility of fitting an appropriate model which is adapted from the structure of
the data itself. The stochastic nature of the time series can be approximately modeled
with the aid of autocorrelation function and partial autocorrelation function; from which
information such as trend, random variables, periodic components, cyclic patterns and
serial correlation can be discovered. As a result, forecasts of the future values of the
series, with some degree of accuracy can be readily obtained (Ho and Xie, 1998).
14
2.4.3.1 AR Model
AR(p) model expressed the current value of time series as a linear combination
of p previous values and a white noise term (random shock). Bell (1984) expressed the
current value of time series of AR(p) model as:
Yt = 1Yt-1 + + pYt-p + at
(2.2)
where 1,, p are AR(p) parameters, the at is the random shock in normal distribution
with zero mean and variance at time t, and p is the order of AR(p).
(2.3)
Or
2.4.3.2 MA Model
MA(q) model expressed the current value of a time series as a linear combination
of a current and q previous values of a white noise process. The (purely) moving average
(MA) model is (Bell, 1984):
Or
Yt = at - 1at-1 - - qat-q
(2.4)
Yt = (1- 1B - - qBq) at
(2.5)
15
Yt = (B) at.
Or
where q is the order of MA(q), and coefficients are MA(q) model parameters.
To increase flexibility when fitting actual time series, both autoregressive and
moving average operators are combined to give the ARMA (p,q) model (Bell, 1984):
Yt = 1Yt-1 + + pYt-p + at - 1at-1 - - qat-q
(2.6)
(2.7)
The mixed type of series which are explained both by its own lagged values and
by lagged noise terms is called Autoregressive Moving-Average models of order (p,q).
This systematic class of stationary time series models carries great importance and
usefulness especially in real-life situations. If the process is stationary, a suitable ARMA
model can be used to represent the data. If it is nonstationary, differencing is applied to
make the model become stationary and this leads to ARIMA model (Akgun, 2003).
16
2.4.3.4 ARIMA model
The first of these conditions implies that the series Yt following (2.6) is
stationary. In practice Yt may well be nonstationary, but with stationary first difference,
Yt - Yt-1 = (1-B) Yt.
If (1-B) Yt is nonstationary, we may need to take the second difference,
Yt - 2Yt-1 + Yt-2 = (1-B) [(1-B)Yt]
= (1-B)2 Yt.
In general, we may need to take the dth difference (1-B)d Yt (although rarely is d
larger than 2). Substituting (1-B)d Yt for Yt in (2.7) yields the ARIMA (p,d,q) model
(Bell, 1984):
(1- 1B - - pBp) (1-B)d Yt = (1- 1B - - qBq) at
Or
(2.8)
(2.9)
17
(B)(Bs) (1-Bs)D Yt = (B)(Bs)at.
Or
where D is the order of seasonal differencing, (Bs) and (Bs) are the seasonal AR(p)
and MA(q) operators respectively, which are defined as:
(Bs) = 1- 1Bs - - PBPs
(Bs) = 1- 1B - - QBQs
where 1,, p are the seasonal AR(p) parameters and 1,, p are the seasonal
MA(q) parameters.
To illustrate forecasting with ARIMA models, we shall use (2.9) written as:
Yt+l = 1Yn+l-1 + + p+dYn+l-p-d + an+l - 1an+l-1 - - qan+l-q
(2.10)
for t = n + l. We shall assume we want to forecast Yn+l for l = 1, 2, using data Yn, Yn1,
. For simplicity, we are assuming for now that the data set is long enough so that we
2.5
Naadimuthu and Lee (1982) proposed first order or lag one serially correlated
inflow. This means that the inflow of each month is dependent only on the inflow of the
previous month, forming a Markov chain. Markov chain method is stochastic method
that can be used to produce new time series of discharge of inflows based on available
time series of data (Adib and Majd, 2009).
18
According to Heiko (2000), Markov chains are stochastic processes that can be
parameterized by empirically estimating transition probabilities between discrete states
in the observed systems. The Markov chain of the first order is one for which each next
state depends only on immediately preceding one. Markov chains of second or higher
order are the processes in which the next state depends on two or more preceding ones.
2.6
Tang et al. (1991) stated that ARIMA model is only good for short term
forecasting since it builds its forecast on previous observations. ARIMA model needs
long memory series, which are more inputs to provide more accurate forecasts. For long
memory series, more training patterns results in more accurate forecasts. This BoxJenkins model does not work well or does not work at all for short input series.
Ho and Xie (1998) proved that ARIMA model is a viable alternative that give
satisfactory results for repairable system reliability forecasting. Ayob and Amat (2004)
used ARIMA to represent water use behavior at Universiti Teknologi Malaysia. ARIMA
modeling method also can be applied to analyses the water quality and rainfall-runoff
data for Johor River recorded for a long period (Hasmida, 2009).
19
Maia et al. (2008) demonstrated that ARIMA exhibited a satisfactory
performance in forecasting interval series with either a linear or non-linear behavior and
are useful forecasting alternative to interval-valued time series. However, the hybrid
model using ARIMA and artificial neural network had better average performance.
2.7
Concluding Remarks
CHAPTER 3
METHODOLOGY
3.1
Introduction
Various stochastic processes are used for generating the hydrologic data of
streamflow. The models either developed or used in order to carry out this study are of
different types in terms of their purposes, capabilities, interfaces, inputs, and outputs.
These mainly include water balance model, reservoir simulation, and stochastic models.
21
3.2
Markov Model
Gupta (1989) stated that the general Markov procedure of data synthesis comprises:
Four parameters that are important in a synthetic study are mean flow, standard
deviation, coefficient of skewness and correlation coefficient. The sample mean flow is
(Gupta, 1989):
(3.1)
Where,
mean observed (historical) flow
total numbers (values) of flow
ith number of observed flow
22
The sample estimate of the variance or standard deviation, S, which is a measure
of the variability of the data is given by (Gupta, 1989):
(3.2)
(3.3)
(3.4)
The one-lag serial coefficient, in which the current flow is affected only by the
previous flow can be obtained by substituting K = 1. The additional lags should be
included as long as they produce a model that explains more about the pattern of flows
than one with fewer lag does (Fiering and Jackson, 1971).
23
3.2.2
Identification of Distribution
Generally, the distributions used in streamflow generation are normal, lognormal and gamma families. The bell-shaped, or normal, distribution is most extensively
used in statistical applications because the sum of variables derived from any
distribution tends to be distributed normally according to the central limit theorem. To
test normality, the historical values of flow are plotted against the percentage of values
in the record that are equal to or greater than the plotted value. The flows are arranged in
descending order. For each value xi, the percent is computed by 100(n i + 1) / n where
i is the rank of value xi and n is the number of historic values. If the plot is a straight
line, the distribution is normal. The coefficient of skewness also should be close to zero,
since the normal distribution has no skewness (Gupta, 1989).
24
3.2.3
Gupta (1989) stated that the source of random numbers can be generated either
by the computer-based pseudorandom-number generator or the random number tables.
The random number should belong to the same distribution to which the historical
record belongs for the generated flow to have similar characteristics. Normal random
numbers have a zero mean and one standard deviation while Log-Normal random
numbers have both mean and standard deviation equal to one.
3.2.4
(3.5)
where
A model on the same lines for monthly flows, developed by Thomas and Fiering
has the following form (Maass et al., 1962):
(3.6)
25
Where,
qi,j
flow in ith month from the beginning, for jth month of the year
qi-1,j-1 =
bj
3.3
Sj
ti,j
ARIMA Model
ARIMA models as become common practice for specification of stationary timedependent input processes since the work of Box and Jenkins (1970). ARIMA models
are usually used as discrete-time processes (Leemis, 1998) and hence the data from a
trace is interpreted as a count process for ARIMA fitting. There are some assumptions
that were made for performing ARIMA model. Besides, this model has specific
procedures to be followed for fitting ARIMA models to time series.
26
3.3.1
Model Assumptions
Before performing the ARIMA modelling, some assumptions were made such
that (Hasmida, 2009):
27
The transformation process might be required for the non stationary series and
this can be done using differencing method (Box et.al., 1994) and (Shumway, 1988).
This process has been considered in ARIMA modelling approach as the I (Integrated)
component or represent as d in ARIMA notation. The level of differencing is highly
depending on the level of stationarity of the data. The level of differencing might be 0, 1,
2 or higher than 2. 0 levels means that the differencing process is not perform to the
data. Then level 1 represent the first differencing process needed and second
differencing level needed for level 2. Higher level of differencing might be applied to
the nonstationary and complex data (Hasmida, 2009).
Data with normal distribution have a pattern of data distribution which follows a
bell shaped curve. The bell shaped curve has several properties such that the curve
concentrated in the center and decreases on either side. This means that the data has less
of a tendency to produce unusually extreme values, compared to some other
distributions. Besides, the bell shaped curve is symmetric. This tells that the probability
of deviations from the mean is comparable in either direction (Hasmida, 2009).
28
3.3.1.3 Outlier
Yafee and McGee (2000) suggested that data should be replaced by a theoretical
defensible algorithm if some data values are missing is observed in the data series. A
crude missing data replacement method is to plug in the mean for the overall series. A
less crude algorithm is to use the mean of the period within the series in which the
observation is missing. Another algorithm is to take the mean of the adjacent
observations. Missing value in exponential smoothing often applies one step ahead
forecasting from the previous observation. Other form of interpolation employs linear
spines, cubic splines, or step function estimation of the missing data.
In order to handle missing data for this study, linear regression between flow of
study area station and flow of adjacent station is used. If data still cannot be obtained,
regression between streamflow and rainfall for that station is used to get the missing
data.
29
3.3.2 Model Procedure
The ARIMA modeling procedure for fitting ARIMA models to time series,
which was developed by Box and Jenkins (1976), consists of three iterative steps: model
identification; parameter estimation; and diagnostic checking. Figure 3.1 depicts the
process of ARIMA modeling. The procedure is itemized as follows:
Original
Streamflo
Model
Identificatio
Parameters
Estimation
No
Diagnostic
Checking
Is
adequate?
Yes
Streamflo
w
Figure 3.1: Flowchart of ARIMA modeling (Lee and Ko, 2011)
30
the pattern. Most series should not require more than two difference operations or
orders. Be careful not to overdifference. If spikes in the ACF die out rapidly, there is no
need for further differencing.
Next, examine the ACF and PACF of your stationary data in order to identify
what autoregressive or moving average models terms are suggested. Some general
guidelines (SPSS, 1993) using graphical method was applied in the identification
process:
i.
Nonstationary series have an ACF that remains significant for half a dozen or
more lags, rather than quickly declining to 0. Difference must be done for such a
series until it is stationary before it can be identified.
ii.
iii.
Moving average processes have spikes in the first one or more lags of the ACF
and an exponentially declining PACF. The number of spikes indicates the order
of the moving average.
iv.
Mixed (ARMA) processes typically show exponential declines in both the ACF
and the PACF.
At the identification stage, the sign of the ACF or PACF and the speed with which
an exponentially declining ACF or PACF approaches 0 are depend upon the sign and
actual value of the AR and MA coefficients (SSPS, 1993).
31
3.3.2.2 Parameter Estimation
Once the tentative model is formulated, the related model parameters are
estimated using the least squares scheme. Parameters are estimated to have zero gradient
of forecasting errors to the historical load data. The primary objective of this parameter
estimation is to minimize the forecasting error and determine both the model and its
parameters (Lee and Ko, 2011). Each ARIMA tentative model parameter can be tested
using t-values and p-values. Dividing the coefficient by its standard error calculates a tvalue.
Then, diagnostic test was conducted to ensure that the essential modeling
assumptions are satisfied for a given model. When the parameters have been well
estimated, the tentative model accuracy is validated by examining the ACF and PACF
residuals. The residuals should simulate the white noise process. Furthermore, the Qstatistics test is applied to confirm the tentative model (ODonovan, 1983). If the
calculated value Q exceeds the critical value of 2 obtained from the chi-square tables,
the tentative model is inadequate (Lee and Ko, 2011).
Furthermore, for this stage, Ljung-Box is used for testing white noise residual.
Hypothesis null is that residual should be white noise. In other word, the residual series
should be independent, homoscedastic (having constant variance), and normally
distributed. We can reject hypothesis null if p-value in Chi-Square statistic greater than
alpha of 5%.
32
These steps are repeated until an adequate model is identified. When the steps in
ARIMA modeling are completed, a specific ARIMA model is applied to predict the
future monthly streamflow for 1 year ahead.
For modeling ARIMA model, a statistical software has been uses, which is called
Minitab version 15. By using Minitab, ARIMA model step can be summarized as
follows:
If seasonal pattern of ACF and PACF is still found from step No. 6, then go to
step No. 5
33
7. Apply the rest of procedures which are estimation, diagnostic check and
forecasting according to step No. 6until obtaining the best forecasting pattern.
3.4
(3.7)
(3.8)
3. Chi-Squared Test:
(3.9)
34
where,
Yi = the observed flow
Fi = the forecasted flow
CHAPTER 4
4.1
Introduction
This chapter consists of detail description on analysis of time series data using
both Markov and ARIMA modeling method for streamflow forecasting. Most of
computation work for ARIMA and Markov models are carried out by using Minitab
Microsoft Excel, respectively. Both of the methods will be used to model the streamflow
of Sungai Bernam at Tanjung Malim, Selangor (Station No. 3615412). The models will
be checked to get an adequate model for streamflow forecasting.
Data from January 1960 to December 2010 was used in deriving stochastic and
forecasting models. Data of 552 months from January 1960 to December 2005 are used
as calibration set for both model. Another 60 months data from January 2006 to
December 2010 is used as validation set.
36
4.2
Some of data values are missing in the data series for Sungai Bernam
streamflow at Tanjung Malim (Station No. 3615412). In order to handle missing data for
this study, linear regression between flow of study area station and flow of adjacent
station is used. Regression line is determined as the best way to predict y from x. As
there was missing data of streamflow for Sungai Bernam at Tanjung Malim, streamflow
data of adjacent station at Jam. Skc (Station No. 3813411) is used. For example, there is
missing data of January 1962, February 1962 and March 1962. Some adjacent
observations month of streamflow data (previous and forward month) of both station are
used to get the regression line to estimate the missing data. This is shown in Figure 4.1.
Missing month data of Station Tanjung Malim for January, February and March
1962 can be completed by using equation of linear regression y = 0.126x + 2.513 with
coefficient of determination, R2 of 0.845, which y and x represented flow of Station
Tanjung Malim (m3/s) and Jam. Skc (m3/s), respectively.
37
If data still cannot be obtained may be because the adjacent streamflow station
also had missing data for that month, rainfall data for adjacent station can be used to get
the regression equation to estimate the missing streamflow data. For example there is
missing data from February 1993 to May 1993 for both station of Tg. Malim and
Jam.Skc. Some adjacent observations month of rainfall data (previous and forward
month) of Station Ldg. Katoyang at Tg. Malim (Station No. 3714152) are used to get the
regression equation with flow data of Station Jam. Skc as shown in Figure 4.2. The
equation of the linear regression was found to be y = 0.146x + 10.43 with coefficient of
determination, R2 of 0.603, which y represented flow for Station Jam. Skc (m3/s) and x
represented rainfall for Station Ldg. Katayong (mm).
After we know the streamflow data for February 1993 to May 1993 at Station
Jam. Skc, we can use that data to estimate the missing data of Station Tg. Malim from
the regression equation of both streamflow by using equation of linear regression y =
0.112x + 3.673 with coefficient of determination, R2 of 0.892, which y and x represented
flow of Station Tanjung Malim (m3/s) and Jam. Skc (m3/s), respectively. Figure 4.3
showed the regression line for the equation.
38
After replacing all the missing data with appropriate estimation data from the
linear regression method, streamflow data of Sungai Bernam is shown in Appendix A.
4.3
Markov Model
39
4.3.1 Statistical Parameters of Historical Data
The sample mean flow for 612 month of data is 9.75 m3/s. Then, the sample
standard deviation, S is 4.66, skewness is 1.2, standard error is 0.18863 and coefficient
of variance is 0.47828. These statistical parameters can be calculated using Microsoft
Excel or can be obtained from EasyFit software. The result of the descriptive statistics
using EasyFit is shown in Figure 4.4.
40
Table 4.1: Parameters of Monthly Historical Data
i
qj
S2
Sj
Rj
Sj-1
bj
qj-1
Jan
0.049549 9.07979E-05
0.001
0.06
Feb
0.04537
0.00943
0.4901265 3.639813919
0.001
0.05
Mac
0.046522 9.69723E-05
0.002
0.05
Apr
0.05187
9.10128E-05
0.00954
0.408
3.69337796
0.001
0.05
May
0.054888 5.21161E-05
0.007219
0.303
3.822355866
0.001
0.05
6.94571E-05
0.008334
0.515
2.990121105
0.001
0.05
July
0.046073 7.22414E-05
0.008499
0.541
3.349038581
0.001
0.05
Aug
0.047227 7.71759E-05
0.008785
0.585
3.27283605
0.002
0.05
Sep
0.053852 7.21758E-05
0.008496
0.406
3.447681936
0.001
0.05
Oct
0.059644 7.62886E-05
0.008734
0.369
3.761513315
0.001
0.05
Nov
0.065038 6.89806E-05
0.008305
0.294
4.175448792
0.001
0.06
Dec
0.059643 0.000101211
0.01006
0.3699155 4.738293291
0.001
0.07
Jun
4.3.2
0.0488
8.89268E-05
Identification of Distribution
In this study, statistical test is used for estimating the parameters of a probability
distribution. Kolmogorov-Smirnov (K-S) test, Anderson Darling (AD) test and Chisquared test can be used as statistical test. K-S test has being used as preference as it is
more powerful and robust. By using EasyFit application, the best-fitting distribution can
be found. K-S goodness of fit test for normal distribution is 0.13466 at ranking 42 while
for Lognormal distribution is 0.05954 at ranking 2. For AD goodness of fit test for
normal distribution is 139.43 at ranking 41 while for lognormal distribution is 34.169 at
ranking 6. Best-fitting distribution for the streamflow data of Sungai Bernam is
Lognormal Distribution (Figure 4.5 and Figure 4.6).
41
0.3
0.28
0.26
0.24
0.22
0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
2
10
12
14
16
18
20
22
24
26
28
30
Flow, q (m3/s)
Histogram
42
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
2
10
12
14
16
18
20
22
24
26
28
30
Flow, q (m3/s)
Sample
As the distribution is log-normal, use the logarithm of the values and finally
convert back the flows. For an example, observed streamflow data in logarithmic values
for 1960 until 1970 is shown in Table 4.2, while other data for year (1971-2005) can be
found in Appendix B. These data as act calibration set to get the parameter of historical
data in order to model the future streamflow.
Jan
0.056
0.052
0.059
0.050
0.056
0.046
0.058
0.065
0.040
0.054
0.054
Feb
0.051
0.044
0.046
0.045
0.047
0.044
0.048
0.054
0.037
0.047
0.034
Mac
0.058
0.045
0.056
0.045
0.050
0.050
0.052
0.047
0.031
0.040
0.036
Apr
0.064
0.051
0.057
0.044
0.052
0.066
0.058
0.060
0.041
0.046
0.045
May
0.055
0.055
0.055
0.044
0.056
0.068
0.045
0.059
0.059
0.060
0.052
Jun
0.046
0.051
0.046
0.046
0.045
0.050
0.052
0.043
0.050
0.050
0.038
Jul
0.057
0.046
0.045
0.045
0.057
0.039
0.053
0.043
0.043
0.037
0.043
Aug
0.049
0.051
0.049
0.053
0.048
0.043
0.054
0.044
0.043
0.044
0.045
Sep
0.058
0.058
0.056
0.060
0.060
0.053
0.057
0.055
0.055
0.042
0.054
Oct
0.057
0.056
0.069
0.070
0.058
0.069
0.072
0.060
0.057
0.059
0.055
Nov
0.063
0.060
0.075
0.079
0.065
0.067
0.077
0.076
0.058
0.056
0.058
Dec
0.065
0.066
0.058
0.066
0.064
0.072
0.072
0.055
0.060
0.053
0.061
43
4.3.3
(4.1)
44
(4.3)
As log-normal random numbers have both mean and standard deviation equal to
one. Therefore, the Equation 4.3 becomes:
(4.4)
(4.5)
45
Table 4.3: Generation of Random Number for Year 2006
i
RAND ( )
erf -1
ti,j
January
0.699645
0.399289
0.370085
1.523379
February
0.45481
-0.090379
-0.08027
0.886483
March
0.063732
-0.872536
-1.0558
-0.49313
April
0.224711
-0.550577
-0.53482
0.243657
May
0.236038
-0.527923
-0.50847
0.280915
June
0.471912
-0.056176
-0.04983
0.929536
July
0.999341
0.998683
1.443813
3.041859
August
0.533139
0.066278
0.058805
1.083163
September
0.095672
-0.808656
-0.91763
-0.29772
October
0.044674
-0.910651
-1.15355
-0.63136
November
0.997494
0.994989
1.429319
3.021363
December
0.407816
-0.184368
-0.16487
0.766834
The Markov model for monthly flows, developed by Thomas and Fiering is
using the following form (Maass et al., 1962):
(4.6)
46
We will use Equation 4.6 to develop Markov model for monthly flows. Flow in
ith month from the beginning, for jth month of the year can be modeled by adding mean
of flow of jth month of the year (January to December) with deterministic and random
component.
Deterministic Component
qi-1,j-1
qj+bj(qi-1,j-1-qj-1)
Random Component
ti,j
Sjti,j(1-rj2)
0.049549
0.063
0.053
0.043
0.054
0.057
0.055
0.068
0.055
0.052
0.055
0.089
1.523379
0.886483
-0.49313
0.243657
0.280915
0.929536
3.041859
1.083163
-0.29772
-0.63136
3.021363
0.766834
0.049541669
0.045386033
0.04653475
0.051865643
0.054889433
0.048803168
0.046082272
0.04726108
0.053859993
0.059642058
0.065034911
0.059661808
0.013
0.007
-0.004
0.002
0.002
0.007
0.022
0.008
-0.002
-0.005
0.024
0.007
Model flow
qi,j (Log)
qi,j (m3/s)
0.063
0.053
0.043
0.054
0.057
0.055
0.068
0.055
0.052
0.055
0.089
0.067
13.533
9.077
5.641
9.604
10.807
10.210
16.422
10.014
8.642
9.821
32.326
15.849
The model streamflow by using Markov model is compared with the observed
streamflow that have been set as validation set for 60 monthly data from January 2006 to
December 2010. Graphically, from Figure 4.8, we can say that Markov model cannot
work well for streamflow forecasting for Sungai Bernam because it not match well with
the actual streamflow.
47
Markov
model
MAPE
53.66
RMSE
7.29
Chi-square test
250.99
48
4.4
ARIMA Model
In this study, an appropriate ARIMA tentative model for Sg. Bernam streamflow
is investigated. Examination of the autocorrelation function (ACF) and partial
autocorrelation function (PACF) provides a thorough basis for analyzing the system
behavior under time independence, and will suggest the appropriate parameters to
include in the model.
These tentative models will be checked and best tentative model will be selected
for streamflow forecasting of ARIMA model. As mentioned in previous chapter, the
ARIMA modeling follows three important stages that can be figured in flow diagram of
Box-Jenkins methodology (Figure 4.9).
1. Tentative Identification
No
2. Parameter Estimation
-Testing parameters
3. Diagnostic Checking
[Is the model adequate?]
Ye
4. Forecasting
-Forecast calculation
49
4.4.1 Model Identification
The most common method to check stationary is through examining the time
series plot of the data. Stationary means that data fluctuate around a constant mean. If
the time series plot is found to be non stationary, differencing needs to be applied.
Figure 4.10 showed that the data is non-stationary. The data need to be applied with nonseasonal difference (d = 1, lag, k = 1). Based on graphical examination, Figure 4.11
showed that the data is stationary at the level of the data after applying non-seasonal
difference.
50
12
30
1111
Streamflow, Yt (m3/s)
25
11
20
15
10
9
1111
11
10310
11
8 12
246 10
10
7
12
11
10
9 12
11
1212
10 9
12
4 12
12
11
10
5
1
10
10
12 11
11
611
8
10
11
5
1
11
7
12
12
12
9 5 8126 3
12
12 12 411 1
11
10
11
35
12
1010
11 10
11
4 1 11
10
10
11
12
411
115
4610 8
12 11
11
11
510 14
10 11 79 79
10
4
11
10
5
11
105
9 6 4510 125 11510 13
11
410 12510 12 10
11
1 99
3411
2
4 5813 2
4 59
5
310
10 14 5 511 119 5 510 3459
9 9 412
11
4
1246
711
17
9 7 9 1011
9 511
5 51035 15
12
10
9
5
5 125 5 93
9
6
9
10
12 1 9
4115
26 5102
12
9 9 5 5 13
2
6
8 4 93782 1 12
469
6
10
1
8
12
8 9 9 9 594
6 49
8 4 2 2 41281278 67 12127 7
12
16 8 125 9 11
281468 81 3 36 6 6 6 5 382
11
13 8 8
1 12
3410 1 2 9 49 10236
9 12
11
12 1010 1610 12
7 6 36126812
8 2
72 169
1
9
12
1
9
2
7
4
4
6
1
9
5
2
12
10
7
4
7 2683
8 16 4 127 2 68
6 3727
6 67 612 5 3 248 48210 1 3 10
27 5
10
5 5 34 3 1693 289 8
4
68
1
4
2 235
68 78 9 7 4 78
7
7
2
3
1
5
8 7
8
7
4
10 9 6 4
3 5 1 1267
8 7836
83
14 3 6 67 237
5 2 9 928 34 378137 57 8 78 36 16
7
1
7
7
1
2
2
7
681347
2 73
78 6713 12 2 136
2
1
4
8
3
1
9
2
42
3 238
7
4
3
2 238
2
2
11
0
Month Jan
Year 1960
Jan
1967
11
12
Jan
1974
Jan
1981
Jan
1988
Jan
1995
Jan
2002
10
15
11
12
10
12
10
5
0
-5
-10
-15
9 10
11
11
11
10
11
410
10 11
9
10
11 1010
5
5 10 11
410 11
4
10 11
4
11
411 10
5
11
10
10
4
11
4
12
9
115 9
812
9
1
7
4
11
9
8
9 5
7 11
5 412
1211 2 11
3 11 10
8 12
9
3
4 2 49 925 310 9
9 9 9 99
3
8 10
9 8
9 12
11 269 9 10 9
9 10 126 410
9 9 89 11 121110
10
9
3411
46 4 48 45 5 8 4 7 8 5 94 10
8 10511510 9 4 611 349 5 10812
34
10 35
9 57 12
4511
3
5
3
95
8
4
12 4
4 4 358
88
11
511
3 69 5 39 5 39378
10257 10683 6 89
8 39 10 711
8 5 5 511
12
4 117
9
7
12 1310
9
12
12
4
3
2
11
8
4
4
3
5
7
3
5
5
10
9
11
7
5
3 4 356 34 5 78 78 8
2
7
8
7
8
4
5
58
1 3 4
46
724 8
3
5
2 4 9
23 48 57 3 2813 3 3
10107 47 12
26116712
9 72382 93 12 5 18 3836
11
8 7 238 6
29 47 34
6 2 2 4 38278 37
5 10211 5 23 11
8
10
11
2
12
11
9
2
8
10
3
1 102
63 2 7
2
67
11
11
2
1 128 79
237 2 1 1 10
7 6 1 4 256
716 7
3 46 6 6126 7
2 2 2
10
3 712
7 37 6
1682 6
8 7 5 581 247 512 6
6
12712
71
1
68 161 268 7 12
917
5 6
1 127 612
11
2 1 6 67 6 10 71 82 126
5
10
116 1107
6
1
2 6
6
11
52 1
12
12
12
1
10
12 6 1
2 1
6
1 11
612 11
1
2
1
1
12
12
6
1 1
1
1
1212
12 6
10 12
1
1212
12
1
1
2
1 12
12
12
1
12
Month Jan
Year 1960
Jan
1967
Jan
1974
Jan
1981
Jan
1988
Jan
1995
Jan
2002
The next step is to identify the values of p and q which are the AR (p) and MA
(q) components for both seasonal and non-seasonal series. For this purpose, the ACF and
51
PACF coefficient are computed. The following Table 4.6 gives general theoretical for
identification of the likely model:
ACF
Cut off after lag q
PACF
Dies down
Dies down
Dies down
AR(p) or MA(q)
No order AR or MA
(White Noise or Random process)
No spike
No spike
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
10
15
20
25
30
35
Lag
40
45
50
55
60
65
52
Partial Autocorrelation
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
10
15
20
25
30
35
Lag
40
45
50
55
60
65
As we can see from the Figure 4.12 and 4.13, ACF and PACF die down
gradually. Based on the pattern, the respective values of p, d, q was determined for
ARIMA is: ARIMA (1, 1, 1). From ACF correlogram, seasonal pattern of the data is
identified. As ACF is indicating seasonal pattern, seasonal difference (D = 1, lag, k =
12) needs to be applied.
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
10
15
20
25
30
35
Lag
40
45
50
55
60
65
53
Partial Autocorrelation
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
10
15
20
25
30
35
Lag
40
45
50
55
60
65
After applying seasonal difference, we can see from the Figure 4.14, ACF cuts
off after lag 12 while in figure 4.15, PACF dies down. For seasonal ARIMA, the general
notation is ARIMA (p, d, q) (P, D, Q)S. Based on the pattern, the respective values of P,
D, Q was determined for ARIMA is: ARIMA (0, 1, 1)12. However, in order to make sure
that we have identified the right model, we suggest another tentative model which is
ARIMA (1, 1, 1)12.
Each ARIMA tentative model parameter can be tested using t-values and pvalues. Dividing the coefficient by its standard error calculates a t-value. The standard
error (SE) of coefficient is the standard deviation of the estimate of a regression
coefficient. It measures how precisely your data can estimate the coefficients unknown
value. Its value is always positive, and smaller values indicate a more precise estimate.
The standard error of a coefficient helps determine whether the value of the coefficient
54
is significantly different than zero. If the p-value associated with this t-statistic is less
than alpha level, we can conclude that the coefficient is significantly different from zero.
From Table 4.7, the standard error of MA 1 coefficient is large relative to the
value of the coefficient itself, so the t-value of 1.26 is too small to declare statistical
significance. We reject hypothesis null if |t|> t/2,df
= n-np.
(=1.26) < ttable (=2.25). The resulting p-value also is much greater than common alpha
level. Therefore, hypothesis null cannot be rejected. So we can conclude this coefficient
not differs from zero. Table 4.8 which estimates parameters for ARIMA (1,1,1)(0,1,1)12
have |tcalc|> ttable (= 2.25) and p-value is less than alpha level. Hence, hypothesis null can
be rejected, and we can conclude that the coefficient is significantly different from zero.
Coefficient
SE Coefficient
AR 1
0.2782
0.0520
5.35
0.000
SAR 12
0.0589
0.0467
1.26
0.208
MA 1
0.8765
0.0256
34.24
0.000
SMA 12
0.9537
0.0206
46.25
0.000
Coefficient
SE Coefficient
AR 1
0.2894
0.0516
5.61
0.000
MA 1
0.8788
0.0248
35.41
0.000
SMA 12
0.9553
0.0184
51.98
0.000
55
4.4.3
Diagnostic Checking
The next step of model identification method of time series modeling approach is
diagnostic checking. It is aimed at examining the accuracy of the chosen tentative model
in ensuring that the modeling assumptions are satisfied. Several procedures can be
applied to check the adequacy of the model as to whether the model satisfies the stability
or stationary condition, as required in stochastic modeling works (Ayob and Amat,
2004).
For this stage, Ljung-Box is used for testing white noise residual. Hypothesis
null is that residual should be white noise. In other word, the residual series should be
independent, homoscedastic (having constant variance), and normally distributed. We
can reject hypothesis null if p-value in Chi-Square statistic greater than alpha of 5%.
In this study, both ARIMA tentative models have p-value less than alpha level.
Table 4.9 and Table 4.10 showed p-value for both tentative models. So, the hypothesis
null cannot be rejected and we can conclude that residual is significantly white noise for
both tentative models.
12
24
36
48
Chi-Square
21.2
61.8
82.7
98.1
DF
20
32
44
p-Value
0.007
0.000
0.000
0.000
56
Table 4.10: Modified Box-Pierce (Ljung-Box) Chi-Square statistic
for ARIMA (1,1,1)(0,1,1)12
Lag
12
24
36
48
Chi-Square
23.1
62.2
82.7
97.9
DF
21
33
45
p-Value
0.006
0.000
0.000
0.000
Besides that, the best tentative model can be determined through test of Least
Square Error (LSE) and Root Mean Square Error (RMSE). The result for the test on the
tentative model is summarized in Table 4.11. The best fit in the least-squares sense
minimizes the sum of squared residuals, a residual being the difference between an
observed value and the fitted value provided by a model. RMSE also is a good measure
of accuracy. The smaller the value of LSE and RMSE, the tentative model is more
accurate.
Table 4.11: LSE and RMSE Test for ARIMA Tentative Model
ARIMA
Test
ARIMA
12
(1,1,1)(1,1,1)
(1,1,1)(0,1,1)12
1798
1760
5.5
5.4
So, from two tentative models possible, the model that best fits the criteria and
meets the requirement is model ARIMA (1,1,1)(0,1,1)12. Forecasting is made based on
the chosen model. The model we identified as best-fit model for Sg. Bernam streamflow
is:
(1 - 1B)(1-B)(1-B12)Yt = (1- 1B)(1- 2B12)at
(4.7)
57
Rewriting the model, we have the following:
(1 - 1B)(1-B12-B+B13)Yt = (1- 2B12- 1B + 12B13)at
(1 - 1B)(1-B12-B+B13)Yt = (1- 2B12- 1B + 12B13)at
(1-B12-B+B13- 1B+ 1B13+ 1B2- 1B14) Yt = (1- 2B12- 1B + 12B13)at
(1 - B12 (1+ 1)B + (1+ 1)B13 + 1B2 - 1B14) Yt = (1- 1B - 2B12 + 12B13)at
Yt (1+ 1)Yt-1 + 1Yt-2 Yt-12 + (1+ 1)Yt-13 - 1Yt-14 = at - 1at-1 2at-12 + 12at-13
Yt = (1+ 1)Yt-1 - 1Yt-2 + Yt-12 - (1+ 1)Yt-13 + 1Yt-14 + at - 1at-1 2at-12 + 12at-13
Noted that,
AR1, 1
0.2894
MA1, 1
0.8788
SMA 12 2
0.9553
Yt = (1+ 0.2894) Yt-1 0.2894Yt-2 + Yt-12 - (1+ 0.2894) Yt-13 + 0.2894Yt-14 + 0.2894Yt-14
+ at 0.8788at-1 0.8788at-12 + (0.8788x0.9553)at-13
Yt = 1.2894 Yt-1 0.2894Yt-2 + Yt-12 - 1.2894Yt-13 + 0.2894Yt-14 +
at 0.8788at-1 0.9553at-12 + 0.8395at-13
Yt = Yt-12 + [1.2894 Yt-1 - 1.2894Yt-13 - 0.2894Yt-2 + 0.2894Yt-14] +
[at 0.8788at-1 0.9553at-12 + 0.8395at-13]
(4.8)
Equation (4.8) can be used for streamflow forecasting of ARIMA model. From
Equation 4.8 also, its explained that the forecast for time period t is the sum of (1) the
value of the time series in the same month of the previous year, (2) a trend component
determined by the difference of previous months value and last years previous months
value and difference of last years previous two months value and previous two months
value; (3) the effects of random shocks (or residuals) of period t, t-1, t-12 and t-13 on the
forecast.
58
In this study, we will use Minitab to develop Markov model for monthly flows.
As an example, develop monthly streamflow model using Minitab for year 2006 to 2007
is shown in Table 4.12, while the streamflow model for other year (2008-2010) can be
found in Appendix F.
Actual Flow
(m3/s)
13.08
8.12
6.11
29.72
29.22
17.82
7.94
9.95
28.05
17.63
17.72
11.23
9.05
6.80
7.62
13.46
12.05
11.38
13.06
8.95
9.36
14.33
14.26
8.24
Model Flow
(m3/s)
9.6732
7.1884
7.2612
9.0165
9.9281
7.6110
6.7046
7.0851
9.5168
12.2889
15.2005
12.3581
7.9227
6.6970
7.1341
8.9949
9.9369
7.6286
6.7248
7.1060
9.5379
12.3101
15.2217
12.3794
Residual
Fit
Coefficient
*
*
*
*
*
*
*
*
*
*
*
*
*
-1.57988
-1.39072
-1.05700
-0.14946
1.10867
-1.04180
1.04920
0.26505
-2.99026
-3.58500
4.03841
*
*
*
*
*
*
*
*
*
*
*
*
*
7.5299
7.6507
9.4570
10.2195
7.3913
7.8818
7.2208
11.0050
13.4603
15.6250
11.4816
0.289364
0.878761
0.955283
59
4.4.5
The model streamflow by using ARIMA model is compared with the observed
streamflow that have been set as validation set for 60 monthly data from January 2006 to
December 2010. Graphically, from Figure 4.16, we can say that ARIMA model may
works quite well for streamflow forecasting for Sungai Bernam because many data from
model match well with the actual streamflow. The ability of ARIMA model in
streamflow forecasting is inspected using some forecast evaluation measures.
Like in Markov models validation, the forecast evaluation measures like Root
Mean Square Error (RMSE), Chi-square Test and Mean Absolute Percentage Error
60
(MAPE) are used to examine the accuracy of ARIMA model. The result of inspection is
summarized in Table 4.13 and the details of the calculation can be found in Appendix G.
4.5
Performance
Evaluation Procedure
ARIMA
model
MAPE
27.50
RMSE
5.41
Chi-square test
191.11
61
term forecasting, like hourly and daily forecasting in order to give more accurate flood
warning.
Meanwhile, if the forecasts streamflow has the lower value from the actual data,
we cannot estimate the flood occurrence. Lower streamflow forecasts is needed in some
of agriculture field to make sure that plants have sufficient water and grow well.
For short period, ARIMA model can obtain the exact or similar pattern with the
actual ones. ARIMA cannot forecast accurately for longer period as it is best used for
short-term forecasting. Usually, it will tend to become flat for sufficiently long period.
Actually, ARIMA model which is good at short-term forecasting can also be used to
control flood.
62
In order to inspect the forecasting accuracy of the different models, criteria
performance evaluation procedures which are MAPE, RMSE and Chi-square test for
both Markov and ARIMA models are compared. Table 4.14 shows the result of model
comparison of MAPE, RMSE and Chi-Square test for each model.
Markov
model
ARIMA
model
MAPE
53.66
27.50
RMSE
7.29
5.4156
Chi-squared test
250.99
191.11
The minimum value of MAPE, RMSE and Chi-squared methods indicates that
the model is the best for streamflow forecasting. From the result of the performance
evaluation procedure, it showed that ARIMA has less value for all methods used to find
the accurate model. Therefore, in this study, the best performance of model for
streamflow forecasting between these two models is ARIMA model.
In this study, one factor that ARIMA model is better than Markov model because
the historical data for Sg. Bernam is non stationary. If the historical data is stationary,
Markov may has advantage because it is propagating the probability method which
transition from state to another state is depend on probability. Markov model cannot
remove non stationary data but the advantage of ARIMA model is it can transform non
stationary data to stationary data.
ARIMA model selected as best fit as it has minimum mean squared forecast error
and therefore it often used in statistical practice. Therefore, for forecasting one period
ahead, which is Yt+1, the equation is as follows:
63
(4.9)
By using Minitab, we can easily do streamflow forecasting for the future values
of time series from current and past values. Figure 4.18 shows the comparison of pattern
of streamflow for actual and model streamflow for Sungai Bernam. The first 5 years
from Jan 2006 to December 2010 is the calibration process. This time series plot reveal
pattern of cycles of ARIMA model. We can see that, the model flows follow the pattern
of observed streamflow quite well although the data is nonstationary for several years.
30
4
5
Variable
Yt-actual
Yt-model
Streamflow, Yt (m3/s)
25
20
6 10
11
11
11
11
11
11
11
11
11 3
11
4
47
1
4 10
11
1012
1012
1012
12
1012
1012
10
1012
1012 5 1012 6 1012
12 6 1 7 12
12
5
5 9
5 9
5
5 9
5 9
5
5 9
5
1 5 89
9 345 9 12 4 89 4 911 4 9
4 9
4
4
1 4 89
4
4
5
2 7 1 3 6 12
1 6 1 6
1 6 1 6
8 1 6
1 6 8 1 67 10 1 6
34 6789 3 8 23 78 23 78 23 78 23 78
36
23 6 8
3
7
23 78 2 78 2 78 1212 7 10 2 7
5
3
23
11
15
10
11
10
11
Month Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan
Year 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
The next 5 years is the forecast streamflow using ARIMA model which is 60
months from January 2011 to December 2015. We can see from the figure, the model
64
can forecast well but the pattern of streamflow is repeated the same pattern for longer
period. This is because ARIMA model is only good and best suited for short term
forecasting since its forecast on previous observations. For short term forecasting, BoxJenkins model can nicely reproduce the details of the original series. ARIMA cannot
forecast accurately for longer period.
CHAPTER 5
4.1
Conclusion
This study has fulfilled the objectives of the study to propose the streamflow
forecasting methods using Markov and ARIMA models and then inspect the accuracy of
both models in forecasting ability. The Box-Jenkins or ARIMA model is one of the most
popular time series forecasting methods. Markov model has its own advantage in
forecasting ability.
In this study, the tentative model that best fits the criteria and meets the
requirement is model ARIMA (1,1,1)(0,1,1)12. By analyzing the forecasted value using
the performance evaluation procedure, it is found that use of ARIMA model for
forecasting Sg. Bernam streamflow is better than Markov model. From the result of the
performance evaluation procedure, it showed that ARIMA has less value for all methods
used. Therefore, ARIMA model has the ability to predict accurately the future monthly
streamflow for Sungai Bernam.
66
The critical part in modeling using ARIMA is identification of best tentative
model. The tentative model that has been identified will be tested and checked to clarify
that the model is the best fit.
Markov also has some advantage because it forecasts with higher streamflow
compare to actual streamflow. Higher streamflow can cause disaster like flood.
Therefore, Markov model can be used for flood control.
Both Markov and ARIMA models are good for short term forecasting. From the
result, we can see that both models can forecast well for earlier period. But, for longer
period, they cannot forecast accurately.
Although both models good for short-term forecasting and not good for longterm forecasting, comparison between the two model shows that ARIMA is better in
giving accurate forecasts.
4.2
Recommendations
Based on the result, both Markov and ARIMA model can be used for streamflow
forecasting. However, there are some weaknesses that can be overcome. Here are some
recommendations that can be used to increase the accuracy for streamflow forecasting:
67
1. The amount of data, or equivalently the number of training patterns also affects
the forecast performance. For long memory series, more training patterns results
in more accurate forecasts. To forecast accurately, use long input series.
2. To control flood efficiently, we can use Markov model for short-term forecasting
because short-term forecasting is very useful for control flood.
3. Use ARIMA model for short-term forecasting only including for streamflow
forecasting.
4. Compare the streamflow forecasting with other forecasting methods of time
series such as exponential smoothing, regression analysis or Fourier series
analysis.
5. Do the forecasting time series after removing the outliers.
6. Use hybrid model using ARIMA and artificial neural network in streamflow
forecasting.
68
REFERENCES
Ayob, K. and Amat, S. D. (2004). Water Use Trend at Universiti Tekologi Malaysia:
Application of Arima Model. Jurnal Teknology, 41 (B): 47-56
Box, G. E. P. and Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control.
Holden Day, San Francisco.
Box, G. E. P. and Jenkins, G. M. (1976). Time Series Analysis, Forecasting and Control.
Holden Day, San Francisco.
69
Dalphin, R. J. (1987). Markov-Weibull Model of Monthly Streamflow. Journal of Water
Resources Planning and Management, Vol. 113, No. 1.
Fortin, V., Perreault, L. and Salas, J. D. (2004). Retrospective Analysis and Forecasting
of Streamflows Using a Shifting Model. Journal of Hydrology, Vol. 296,
135-163.
Hasmida, H. (2009). Water Quality Trend at The Upper Part of Johor River in Relation
to Rainfall and Runoff Pattern. Universiti Teknologi Malaysia.
Heiko, B. (2000). Markov Chain Model for Vegetation Dynamics. Ecological Modeling,
Vol. 126, pp. 139-154.
Ho, S. L. and Xie, M. (1998). The Use of ARIMA Models for Reliability Forecasting
and Analysis. Computers ind. Engng, Vol. 35, Nos 1-2, pp. 213-216.
Joomizan, N. (2010). Reservoir Storage Simulation and Forecasting Models for Muda
Irrigation Scheme, Malaysia. Universiti Teknologi Malaysia.
Lee, C. and Ko, C. (2011). Short-term Load Forecasting Using Lifting Scheme and
ARIMA Models. Expert Systems with Applications, Vol. 38, pp. 5902-5911.
70
Conference, ed. D. J. Medeiros, E. F. Watson, J. S. Carson, and M. S.
Manivannan, 1522. Piscataway, New Jersey: Institute of Electrical and
Electronics Engineers, Inc.
Maass, A., Hufschmidt, M. M., Dorfman, R., Thomas, H. A., Marglin, S. A., Fair and G.
M. (1962). The Design of Water-Resource Systems. Harvard University Press,
Cambridge, Mass., pp 467
Modarres, R. (2007). Streamflow Drought Time Series Forecasting. Stoch Environ Res
Risk Assess.
Nazuha, M., Ruzaidah, S. and Zamzulani, M. (2010). Malaysia Crude Oil Production
Estimation: an Application of ARIMA Model. International Conference on
Science and Social Research (CSSR 2010)
71
ODonovan, T. M. (1983). Short Term Forecasting: An Introduction to the Box-Jenkins
Approach. New York: Wiley.
Shalamu, A. (2009). Monthly and Seasonal Streamflow Forecasting in the Rio Grande
Basin. New Mexico State University
Tang, Z., Almeida, C. and Fishwick, P. A. (1991). Time Series Forecasting Using
Neural Networks vs. Box-Jenkins Methodology. Simulation.
Yafee, R. and McGee, M. (2000). Introduction to Time Series Analysis and Forecasting
with Application of SAS and SPSS. Academic Press, Inc., New York.
72
APPENDIX A
Jan
10.62
8.72
11.98
8.06
10.31
6.65
11.41
14.99
4.87
9.61
9.69
17.40
8.09
6.27
3.62
8.49
7.88
6.84
5.14
4.08
4.05
8.07
4.95
4.02
5.86
9.55
7.95
4.38
4.81
4.56
5.80
3.89
7.37
7.10
10.66
10.85
12.29
16.24
15.95
9.18
14.05
13.58
6.04
7.45
8.08
3.87
13.08
9.05
11.29
9.73
6.83
Feb
8.38
5.95
6.66
6.35
7.08
6.12
7.38
9.69
4.12
6.95
3.49
6.48
8.04
5.26
7.44
7.38
4.36
4.90
3.54
4.33
3.95
6.99
3.93
2.47
9.04
8.80
5.79
4.07
7.86
2.91
3.14
3.51
6.28
8.00
10.16
9.37
11.50
19.71
16.02
9.77
11.67
9.33
4.35
6.91
6.60
2.73
8.12
6.80
6.76
9.67
4.86
Mac
11.23
6.26
10.28
6.37
7.90
7.96
8.96
6.91
2.94
4.83
3.83
8.20
7.84
5.29
6.80
10.91
5.59
3.34
3.36
3.87
4.91
4.38
5.26
3.84
8.07
9.26
4.94
3.76
6.48
6.13
2.65
5.83
5.56
7.56
10.32
11.74
12.12
20.96
14.69
11.69
15.70
8.05
4.84
5.85
6.67
4.38
6.11
7.62
9.58
15.10
4.36
Apr
14.08
8.40
11.06
6.02
8.73
15.45
11.25
12.13
5.14
6.75
6.48
5.99
9.31
9.88
8.65
11.46
6.05
3.80
6.39
5.70
5.01
10.73
8.94
3.18
8.10
6.99
8.87
5.67
6.22
11.96
3.46
7.66
7.18
11.41
10.59
13.89
18.45
20.22
14.42
11.82
12.87
13.36
11.45
7.34
8.68
4.51
29.72
13.46
12.86
13.72
7.18
May
10.04
10.07
10.07
6.20
10.35
16.39
6.45
11.64
11.72
12.50
8.71
7.06
11.60
11.09
9.52
11.50
4.92
5.61
7.93
5.86
10.07
11.90
9.40
5.42
9.83
11.31
6.31
6.03
9.92
11.79
10.15
11.97
9.65
12.62
10.87
14.36
15.91
17.51
14.90
13.56
9.26
10.84
12.75
9.23
11.12
6.26
29.22
12.05
9.73
8.75
6.17
Jun
6.68
8.50
6.62
6.76
6.44
8.23
8.70
5.89
8.16
8.24
4.41
5.06
9.16
8.87
7.24
8.29
7.92
6.51
3.67
5.24
9.37
7.37
6.94
3.65
9.93
5.53
4.32
4.51
12.09
8.73
5.94
9.24
5.64
7.69
10.78
14.15
16.73
20.14
15.72
9.54
7.45
7.30
7.89
6.69
4.13
6.07
17.82
11.38
12.28
7.31
7.51
Jul
11.01
6.84
6.36
6.45
11.09
4.64
9.38
5.79
5.64
4.26
5.73
4.77
5.88
5.04
7.54
10.70
5.94
4.10
3.99
5.30
6.51
4.83
4.84
4.39
5.98
5.34
3.21
4.75
8.45
8.27
4.55
5.59
6.94
8.96
8.43
13.34
13.12
19.05
16.14
7.52
4.21
5.72
6.99
7.32
6.49
4.56
7.94
13.06
10.89
8.05
7.45
Aug
7.87
8.27
7.79
9.29
7.52
5.81
9.49
5.94
5.91
6.21
6.49
8.17
5.75
7.74
7.79
6.90
8.74
3.73
2.99
5.01
8.79
4.62
6.55
5.68
5.07
4.85
3.68
9.03
8.57
5.09
3.31
5.09
6.01
6.15
10.89
16.59
14.24
15.69
20.16
8.99
9.25
5.05
7.26
6.71
4.44
5.63
9.95
8.95
7.83
9.03
8.04
Sep
11.11
11.27
10.54
12.09
12.39
9.32
11.04
9.91
9.91
5.52
9.79
11.53
8.74
7.42
8.43
11.33
7.55
4.38
4.72
8.86
9.64
9.23
7.73
10.02
5.64
7.14
7.39
12.35
18.19
8.02
6.83
7.83
6.43
9.60
16.08
12.99
13.39
18.91
23.75
10.50
8.85
8.66
8.96
8.80
11.62
3.53
28.05
9.36
9.85
10.08
7.16
Oct
10.83
10.47
16.97
17.82
11.24
16.98
19.17
12.40
11.05
11.84
9.96
6.94
11.88
10.81
7.11
6.14
14.43
18.96
7.86
7.79
12.48
7.44
10.01
5.57
7.82
12.75
14.19
18.64
9.24
12.06
16.42
13.20
7.65
12.40
14.22
13.99
20.88
21.15
19.72
13.37
9.31
6.43
15.31
13.25
14.57
14.99
17.63
14.33
13.14
7.99
6.30
Nov
13.83
12.04
21.14
23.87
14.58
15.76
22.08
21.74
11.56
10.30
11.16
14.11
16.21
9.60
8.15
10.71
12.72
13.17
20.75
12.88
8.48
15.02
18.93
7.60
14.58
21.16
13.57
12.26
11.01
17.90
15.36
13.55
12.25
12.87
13.97
17.08
17.08
26.92
27.30
11.77
14.58
10.47
16.94
19.15
21.65
16.39
17.72
14.26
16.74
12.73
9.56
Dec
14.62
15.52
11.29
15.37
14.11
18.94
18.80
9.84
12.16
9.08
12.61
18.31
9.22
7.13
8.68
13.69
7.66
8.54
5.60
6.31
19.07
10.03
8.03
8.18
19.78
15.96
8.74
9.13
7.39
9.49
11.83
8.27
8.49
16.91
16.15
16.12
29.78
18.46
18.59
16.19
19.88
7.83
7.30
9.72
8.07
18.46
11.23
8.24
10.96
6.88
11.01
73
APPENDIX B
Jan
0.056
0.052
0.059
0.050
0.056
0.046
0.058
0.065
0.040
0.054
0.054
0.069
0.050
0.045
0.035
0.051
0.049
0.046
0.041
0.037
0.037
0.050
0.040
0.036
0.043
0.054
0.050
0.038
0.040
0.039
0.043
0.036
0.048
0.047
0.056
0.057
0.060
0.068
0.067
0.053
0.064
0.063
0.044
0.048
0.050
0.036
0.050
Feb
0.051
0.044
0.046
0.045
0.047
0.044
0.048
0.054
0.037
0.047
0.034
0.045
0.050
0.041
0.048
0.048
0.038
0.040
0.034
0.038
0.036
0.047
0.036
0.029
0.053
0.052
0.043
0.037
0.049
0.031
0.032
0.034
0.045
0.050
0.055
0.053
0.058
0.073
0.067
0.054
0.059
0.053
0.038
0.047
0.046
0.030
0.045
Mac
0.058
0.045
0.056
0.045
0.050
0.050
0.052
0.047
0.031
0.040
0.036
0.050
0.049
0.041
0.046
0.057
0.042
0.033
0.033
0.036
0.040
0.038
0.041
0.036
0.050
0.053
0.040
0.035
0.045
0.044
0.030
0.043
0.042
0.049
0.056
0.059
0.060
0.075
0.065
0.059
0.067
0.050
0.040
0.043
0.046
0.038
0.047
Apr
0.064
0.051
0.057
0.044
0.052
0.066
0.058
0.060
0.041
0.046
0.045
0.044
0.053
0.055
0.052
0.058
0.044
0.035
0.045
0.043
0.040
0.057
0.052
0.033
0.050
0.047
0.052
0.043
0.045
0.059
0.034
0.049
0.047
0.058
0.056
0.063
0.071
0.074
0.064
0.059
0.061
0.062
0.058
0.048
0.052
0.038
0.052
May
0.055
0.055
0.055
0.044
0.056
0.068
0.045
0.059
0.059
0.060
0.052
0.047
0.059
0.057
0.054
0.058
0.040
0.042
0.050
0.043
0.055
0.059
0.053
0.042
0.055
0.058
0.045
0.044
0.055
0.059
0.055
0.059
0.054
0.061
0.057
0.064
0.067
0.070
0.065
0.063
0.053
0.057
0.061
0.053
0.058
0.045
0.055
Jun
0.046
0.051
0.046
0.046
0.045
0.050
0.052
0.043
0.050
0.050
0.038
0.041
0.053
0.052
0.048
0.051
0.050
0.045
0.035
0.041
0.053
0.048
0.047
0.035
0.055
0.042
0.038
0.038
0.060
0.052
0.044
0.053
0.043
0.049
0.057
0.064
0.068
0.074
0.067
0.054
0.048
0.048
0.050
0.046
0.037
0.044
0.049
Jul
0.057
0.046
0.045
0.045
0.057
0.039
0.053
0.043
0.043
0.037
0.043
0.039
0.043
0.040
0.049
0.057
0.044
0.037
0.036
0.041
0.045
0.040
0.040
0.038
0.044
0.042
0.033
0.039
0.051
0.051
0.039
0.042
0.047
0.052
0.051
0.062
0.062
0.072
0.067
0.048
0.037
0.043
0.047
0.048
0.045
0.039
0.046
Aug
0.049
0.051
0.049
0.053
0.048
0.043
0.054
0.044
0.043
0.044
0.045
0.050
0.043
0.049
0.049
0.047
0.052
0.035
0.032
0.040
0.052
0.039
0.046
0.043
0.041
0.040
0.035
0.053
0.051
0.041
0.033
0.041
0.044
0.044
0.057
0.068
0.064
0.067
0.074
0.052
0.053
0.040
0.048
0.046
0.038
0.043
0.047
Sep
0.058
0.058
0.056
0.060
0.060
0.053
0.057
0.055
0.055
0.042
0.054
0.058
0.052
0.048
0.051
0.058
0.049
0.038
0.039
0.052
0.054
0.053
0.049
0.055
0.043
0.047
0.048
0.060
0.071
0.050
0.046
0.049
0.045
0.054
0.067
0.061
0.062
0.072
0.079
0.056
0.052
0.052
0.052
0.052
0.059
0.034
0.054
Oct
0.057
0.056
0.069
0.070
0.058
0.069
0.072
0.060
0.057
0.059
0.055
0.047
0.059
0.057
0.047
0.044
0.064
0.072
0.049
0.049
0.060
0.048
0.055
0.042
0.049
0.061
0.064
0.071
0.053
0.060
0.068
0.062
0.049
0.060
0.064
0.063
0.075
0.075
0.073
0.062
0.053
0.045
0.066
0.062
0.065
0.065
0.060
Nov
0.063
0.060
0.075
0.079
0.065
0.067
0.077
0.076
0.058
0.056
0.058
0.064
0.067
0.054
0.050
0.057
0.061
0.062
0.075
0.061
0.051
0.065
0.072
0.049
0.065
0.075
0.063
0.060
0.057
0.070
0.066
0.063
0.060
0.061
0.063
0.069
0.069
0.083
0.083
0.059
0.065
0.056
0.069
0.072
0.076
0.068
0.065
Dec
0.065
0.066
0.058
0.066
0.064
0.072
0.072
0.055
0.060
0.053
0.061
0.071
0.053
0.047
0.052
0.063
0.049
0.051
0.042
0.045
0.072
0.055
0.050
0.050
0.073
0.067
0.052
0.053
0.048
0.054
0.059
0.051
0.051
0.069
0.067
0.067
0.086
0.071
0.071
0.067
0.073
0.049
0.048
0.054
0.050
0.071
0.060
74
APPENDIX C
RAND ( )
erf -1
ti,j
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Apr-08
May-08
Jun-08
Jul-08
Aug-08
Sep-08
Oct-08
Nov-08
Dec-08
Jan-09
Feb-09
Mar-09
Apr-09
May-09
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10
0.699645
0.45481
0.063732
0.224711
0.236038
0.471912
0.999341
0.533139
0.095672
0.044674
0.997494
0.407816
0.656401
0.32176
0.733219
0.724521
0.401592
0.010641
0.096817
0.516508
0.053638
0.222905
0.612597
0.663435
0.143889
0.070315
0.523247
0.919276
0.705168
0.237308
0.877403
0.425101
0.402188
0.338947
0.687608
0.014286
0.684203
0.305343
0.627906
0.641724
0.751243
0.729118
0.289185
0.954236
0.428914
0.264273
0.687481
0.765445
0.846072
0.27472
0.555255
0.800866
0.779092
0.847218
0.420992
0.996074
0.600695
0.32158
0.630127
0.323203
0.399289
-0.090379
-0.872536
-0.550577
-0.527923
-0.056176
0.998683
0.066278
-0.808656
-0.910651
0.994989
-0.184368
0.312802
-0.35648
0.466438
0.449041
-0.196816
-0.978717
-0.806366
0.033016
-0.892724
-0.554191
0.225195
0.32687
-0.712222
-0.85937
0.046495
0.838551
0.410335
-0.525384
0.754806
-0.149797
-0.195624
-0.322107
0.375216
-0.971427
0.368406
-0.389314
0.255813
0.283447
0.502486
0.458237
-0.421629
0.908473
-0.142173
-0.471453
0.374963
0.530889
0.692144
-0.45056
0.110509
0.601733
0.558183
0.694435
-0.158017
0.992148
0.20139
-0.35684
0.260254
-0.353593
0.370085
-0.08027
-1.0558
-0.53482
-0.50847
-0.04983
1.443813
0.058805
-0.91763
-1.15355
1.429319
-0.16487
0.284724
-0.32724
0.440226
0.421663
-0.17623
-1.36824
-0.91316
0.029268
-1.1059
-0.53909
0.2023
0.298297
-0.75074
-1.02497
0.041228
0.978848
0.381358
-0.50556
0.819547
-0.13354
-0.17514
-0.29369
0.345832
-1.34224
0.339046
-0.35998
0.230738
0.256729
0.479699
0.431438
-0.39299
1.147587
-0.12667
-0.44563
0.34558
0.511878
0.720449
-0.42327
0.098252
0.597223
0.543827
0.723842
-0.14097
1.418338
0.180416
-0.32759
0.234893
-0.32439
1.523379
0.886483
-0.49313
0.243657
0.280915
0.929536
3.041859
1.083163
-0.29772
-0.63136
3.021363
0.766834
1.402661
0.537217
1.622573
1.596322
0.750771
-0.93498
-0.2914
1.041391
-0.56398
0.237618
1.286095
1.421856
-0.0617
-0.44952
1.058306
2.384299
1.539321
0.28503
2.159015
0.81114
0.752312
0.584661
1.489081
-0.89822
1.479484
0.490905
1.326313
1.36307
1.678397
1.610146
0.444235
2.622933
0.820859
0.369778
1.488724
1.723905
2.018868
0.401403
1.138949
1.8446
1.769087
2.023667
0.800643
3.005833
1.255146
0.536714
1.332189
0.541241
75
APPENDIX D
Deterministic Component
Random Component
Model Flow
qi-1,j-1
qj+bj(qi-1,j-1-qj-1)
ti,j
Sjti,j(1-rj2)
qi,j (Log)
0.050
0.063
0.053
0.043
0.054
0.057
0.055
0.068
0.055
0.052
0.055
0.089
0.067
0.062
0.050
0.060
0.066
0.060
0.042
0.044
0.055
0.049
0.062
0.075
0.073
0.049
0.042
0.055
0.073
0.065
0.051
0.062
0.053
0.060
0.064
0.077
0.051
0.062
0.049
0.057
0.064
0.066
0.060
0.049
0.066
0.060
0.063
0.077
0.076
0.067
0.049
0.056
0.068
0.067
0.063
0.052
0.069
0.064
0.064
0.076
0.049541669
0.045386033
0.04653475
0.051865643
0.054889433
0.048803168
0.046082272
0.04726108
0.053859993
0.059642058
0.065034911
0.059661808
0.049559131
0.045384746
0.046529892
0.051883571
0.054896185
0.04880782
0.046063986
0.047223647
0.053859658
0.059640286
0.065039039
0.059650993
0.049565308
0.04536888
0.046516145
0.051878774
0.05490011
0.048815617
0.046075966
0.047251163
0.053858046
0.059649041
0.065040692
0.059652259
0.049543394
0.045385559
0.046529249
0.051881059
0.054895021
0.048816984
0.046088968
0.047231941
0.053870927
0.059649508
0.065039672
0.059652256
0.049568162
0.045391438
0.046528015
0.05187947
0.05489742
0.048817884
0.046093027
0.047235947
0.053873657
0.0596524
0.065040467
0.059651281
1.523379
0.886483
-0.49313
0.243657
0.280915
0.929536
3.041859
1.083163
-0.29772
-0.63136
3.021363
0.766834
1.402661
0.537217
1.622573
1.596322
0.750771
-0.93498
-0.2914
1.041391
-0.56398
0.237618
1.286095
1.421856
-0.0617
-0.44952
1.058306
2.384299
1.539321
0.28503
2.159015
0.81114
0.752312
0.584661
1.489081
-0.89822
1.479484
0.490905
1.326313
1.36307
1.678397
1.610146
0.444235
2.622933
0.820859
0.369778
1.488724
1.723905
2.018868
0.401403
1.138949
1.8446
1.769087
2.023667
0.800643
3.005833
1.255146
0.536714
1.332189
0.541241
0.013
0.007
-0.004
0.002
0.002
0.007
0.022
0.008
-0.002
-0.005
0.024
0.007
0.012
0.004
0.013
0.014
0.005
-0.007
-0.002
0.007
-0.004
0.002
0.010
0.013
-0.001
-0.004
0.009
0.021
0.011
0.002
0.015
0.006
0.006
0.005
0.012
-0.008
0.013
0.004
0.011
0.012
0.012
0.012
0.003
0.019
0.006
0.003
0.012
0.016
0.017
0.003
0.009
0.016
0.012
0.014
0.006
0.021
0.010
0.004
0.011
0.005
0.063
0.053
0.043
0.054
0.057
0.055
0.068
0.055
0.052
0.055
0.089
0.067
0.062
0.050
0.060
0.066
0.060
0.042
0.044
0.055
0.049
0.062
0.075
0.073
0.049
0.042
0.055
0.073
0.065
0.051
0.062
0.053
0.060
0.064
0.077
0.051
0.062
0.049
0.057
0.064
0.066
0.060
0.049
0.066
0.060
0.063
0.077
0.076
0.067
0.049
0.056
0.068
0.067
0.063
0.052
0.069
0.064
0.064
0.076
0.065
76
APPENDIX E
Actual Flow
(m3/s)
13.08
8.12
6.11
29.72
29.22
17.82
7.94
9.95
28.05
17.63
17.72
11.23
9.05
6.80
7.62
13.46
12.05
11.38
13.06
8.95
9.36
14.33
14.26
8.24
11.29
6.76
9.58
12.86
9.73
12.28
10.89
7.83
9.85
13.14
16.74
10.96
9.73
9.67
15.10
13.72
8.75
7.31
8.05
9.03
10.08
7.99
Model Flow
(m3/s)
13.533
9.077
5.641
9.604
10.807
10.210
16.422
10.014
8.642
9.821
32.326
15.849
13.020
7.992
12.065
15.262
12.299
5.514
6.059
9.874
7.877
13.038
21.161
19.599
7.719
5.384
10.032
19.404
15.098
8.379
13.007
9.219
12.127
14.502
22.302
8.531
13.343
7.856
10.970
14.160
15.629
12.423
7.800
15.337
12.388
13.587
MAPE
RMSE
3.462
11.786
7.681
67.685
63.015
42.706
106.822
0.644
69.192
44.298
82.432
41.109
43.872
17.535
58.336
13.384
2.069
51.550
53.607
10.339
15.846
9.013
48.391
137.858
31.625
20.351
4.721
50.886
55.171
31.768
19.439
17.734
23.119
10.398
33.224
22.161
37.128
18.764
27.352
3.205
78.619
69.942
3.111
69.845
22.897
70.046
0.205001
0.915831
0.220272
404.6583
339.0362
57.91601
71.93914
0.00411
376.688
61.00476
213.3443
21.31845
15.7641
1.421735
19.76006
3.245474
0.06214
34.41474
49.01556
0.855987
2.199906
1.668293
47.61862
129.0379
12.74858
1.89258
0.20453
42.82308
28.81687
15.21902
4.481376
1.928084
5.185165
1.865887
30.93221
5.899487
13.05078
3.292187
17.05788
0.193326
47.32323
26.14056
0.062732
39.77884
5.327024
31.32252
Chi-square
Test
0.015148
0.100896
0.039051
42.13488
31.37171
5.672621
4.380738
0.00041
43.59034
6.211446
6.599874
1.345112
1.210723
0.177887
1.637769
0.212657
0.005052
6.241801
8.089859
0.086689
0.27929
0.127953
2.250341
6.58374
1.651481
0.3515
0.020387
2.206928
1.908638
1.816363
0.344538
0.209153
0.427585
0.12866
1.386991
0.691526
0.97813
0.41909
1.554974
0.013653
3.027875
2.104243
0.008043
2.593644
0.430014
2.305389
77
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10
12.73
6.88
6.83
4.86
4.36
7.18
6.17
7.51
7.45
8.04
7.16
6.30
9.56
11.01
22.299
21.522
15.834
7.597
10.312
16.493
15.985
13.908
8.744
16.912
14.091
14.296
21.417
14.672
75.168
212.820
131.827
56.317
136.513
129.701
159.084
85.191
17.405
110.346
96.798
126.927
124.026
33.261
53.659
91.56381
214.3895
81.06853
7.491085
35.42605
86.72375
96.34386
40.93266
1.680274
78.70912
48.03522
63.94217
140.5846
13.41047
7.29
4.106203
9.961389
5.119965
0.98606
3.435427
5.258356
6.026957
2.943131
0.192169
4.65409
3.408991
4.472611
6.564209
0.914016
250.9884
78
APPENDIX F
Actual Flow
(m3/s)
13.08
8.12
6.11
29.72
29.22
17.82
7.94
9.95
28.05
17.63
17.72
11.23
9.05
6.80
7.62
13.46
12.05
11.38
13.06
8.95
9.36
14.33
14.26
8.24
11.29
6.76
9.58
12.86
9.73
12.28
10.89
7.83
9.85
13.14
16.74
10.96
9.73
9.67
15.10
13.72
8.75
7.31
8.05
9.03
10.08
7.99
12.73
Model Flow
(m3/s)
9.6732
7.1884
7.2612
9.0165
9.9281
7.6110
6.7046
7.0851
9.5168
12.2889
15.2005
12.3581
7.9227
6.6970
7.1341
8.9949
9.9369
7.6286
6.7248
7.1060
9.5379
12.3101
15.2217
12.3794
7.9439
6.7182
7.1553
9.0161
9.9581
7.6499
6.7460
7.1273
9.5592
12.3314
15.2429
12.4006
7.9651
6.7394
7.1765
9.0373
9.9794
7.6711
6.7673
7.1485
9.5804
12.3526
15.2642
Residual
Fit
Coefficient
*
*
*
*
*
*
*
*
*
*
*
*
*
-1.57988
-1.39072
-1.05700
-0.14946
1.10867
-1.04180
1.04920
0.26505
-2.99026
-3.58500
4.03841
1.99786
-1.67458
2.57792
0.10621
-1.42906
-1.18154
-1.02019
0.57523
-0.37209
3.89633
3.01737
-4.56349
-1.32522
-0.91615
-1.58516
-3.54478
-3.07188
1.04294
-0.27357
2.58702
1.07675
4.26644
5.43986
*
*
*
*
*
*
*
*
*
*
*
*
*
7.5299
7.6507
9.4570
10.2195
7.3913
7.8818
7.2208
11.0050
13.4603
15.6250
11.4816
9.9828
8.3305
7.7005
10.9538
11.4991
7.8015
7.3802
7.2148
10.9121
13.0737
18.1226
15.8535
9.3852
7.2661
7.9552
9.5648
9.2719
5.7171
6.7266
6.7039
11.0133
13.5536
18.4266
0.289364
0.878761
0.955283
79
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10
6.88
6.83
4.86
4.36
7.18
6.17
7.51
7.45
8.04
7.16
6.30
9.56
11.01
12.4218
7.9864
6.7607
7.1978
9.0586
10.0006
7.6923
6.7885
7.1698
9.6017
12.3739
15.2854
12.4431
-1.30056
-0.79690
-1.45611
-0.78662
-1.79769
-0.43999
-1.69829
3.62122
-1.95910
1.06042
-3.37562
-2.06698
1.18330
16.6716
11.1109
8.5383
8.6866
10.5277
10.7900
8.1383
7.4688
9.4791
11.3296
14.6156
16.6470
12.9267
80
APPENDIX G
Actual Flow
(m3/s)
13.08
8.12
6.11
29.72
29.22
17.82
7.94
9.95
28.05
17.63
17.72
11.23
9.05
6.80
7.62
13.46
12.05
11.38
13.06
8.95
9.36
14.33
14.26
8.24
11.29
6.76
9.58
12.86
9.73
12.28
10.89
7.83
9.85
13.14
16.74
10.96
9.73
9.67
15.10
13.72
8.75
7.31
8.05
9.03
10.08
7.99
Model Flow
(m3/s)
9.6732
7.1884
7.2612
9.0165
9.9281
7.6110
6.7046
7.0851
9.5168
12.2889
15.2005
12.3581
7.9227
6.6970
7.1341
8.9949
9.9369
7.6286
6.7248
7.1060
9.5379
12.3101
15.2217
12.3794
7.9439
6.7182
7.1553
9.0161
9.9581
7.6499
6.7460
7.1273
9.5592
12.3314
15.2429
12.4006
7.9651
6.7394
7.1765
9.0373
9.9794
7.6711
6.7673
7.1485
9.5804
12.3526
MAPE
RMSE
26.046
11.473
18.841
69.662
66.023
57.290
15.559
28.793
66.072
30.303
14.215
10.029
12.457
1.515
6.377
33.173
17.536
32.965
48.508
20.594
1.901
14.095
6.744
50.235
29.638
0.618
25.310
29.890
2.345
37.705
38.053
8.975
2.948
6.129
8.943
13.144
18.138
30.306
52.473
34.130
14.050
4.940
15.934
20.836
4.956
54.601
11.606
0.868
1.325
428.633
372.178
104.224
1.526
8.208
343.480
28.547
6.344
1.269
1.271
0.011
0.236
19.937
4.465
14.073
40.135
3.397
0.032
4.080
0.925
17.134
11.196
0.002
5.879
14.776
0.052
21.438
17.172
0.494
0.084
0.648
2.241
2.075
3.115
8.588
62.781
21.927
1.511
0.130
1.645
3.540
0.250
19.032
Chi-square
Test
1.200
0.121
0.183
47.538
37.487
13.694
0.228
1.158
36.092
2.323
0.417
0.103
0.160
0.002
0.033
2.217
0.449
1.845
5.968
0.478
0.003
0.331
0.061
1.384
1.409
0.000
0.822
1.639
0.005
2.802
2.546
0.069
0.009
0.053
0.147
0.167
0.391
1.274
8.748
2.426
0.151
0.017
0.243
0.495
0.026
1.541
81
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10
12.73
6.88
6.83
4.86
4.36
7.18
6.17
7.51
7.45
8.04
7.16
6.30
9.56
11.01
15.2642
12.4218
7.9864
6.7607
7.1978
9.0586
10.0006
7.6923
6.7885
7.1698
9.6017
12.3739
15.2854
12.4431
19.907
80.550
16.931
39.109
65.087
26.164
62.085
2.428
8.848
10.824
34.102
96.411
59.889
13.016
27.497
6.422
30.712
1.337
3.613
8.053
3.529
14.674
0.033
0.434
0.757
5.962
36.892
32.780
2.054
5.416
0.421
2.472
0.167
0.534
1.119
0.390
1.467
0.004
0.064
0.106
0.621
2.981
2.145
0.165
191.114