You are on page 1of 10

Energy and Buildings 37 (2005) 1250–1259

www.elsevier.com/locate/enbuild

On-line building energy prediction using adaptive


artificial neural networks
Jin Yang a, Hugues Rivard b,*, Radu Zmeureanu a
a
Department of Building, Civil and Environmental Engineering, Centre for Building Studies Concordia University,
1455 Maisonneuve Blvd. West, Montreal, Canada H3G 1M8
b
Department of Construction Engineering, ETS, 1100 Notre-Dame Street West, Montreal, Canada H3C 1K3
Received 9 December 2004; received in revised form 28 January 2005; accepted 10 February 2005

Abstract

While most of the existing artificial neural networks (ANN) models for building energy prediction are static in nature, this paper evaluates
the performance of adaptive ANN models that are capable of adapting themselves to unexpected pattern changes in the incoming data, and
therefore can be used for the real-time on-line building energy prediction. Two adaptive ANN models are proposed and tested: accumulative
training and sliding window training. The computational experiments presented in the paper use both simulated (synthetic) data and measured
data. In the case of synthetic data, the accumulative training technique appears to have an almost equal performance with the sliding window
training approach, in terms of training time and accuracy. In the case of real measurements, the sliding window technique gives better results,
compared with the accumulative training, if the coefficient of variance is used as an indicator.
# 2005 Elsevier B.V. All rights reserved.

Keywords: On-line prediction; Electric demand; Energy demand prediction; Building cooling; Artificial neural networks; Adaptive models

1. Introduction scheme that involves a single prediction model that does not
evolve over time: when the estimation of the model
The prediction of building energy consumption can play parameters is completed, the model is fixed; the most
an important role in building management since it can help recently collected data is not used to update the model
optimize the building daily operation and select better parameters. To obtain an accurate static model, a large
control strategies [1]. An automated energy prediction volume of historical data is required to estimate the model
system is often built on top of a mathematical prediction parameters. The alternative approach is the use of a dynamic
model consisting of several parameters. The model (adaptive) model that constantly updates model parameters
parameters are estimated using existing data that typically based on newly available data. As the energy data collection
include energy demand or consumption and temperature process is automated, the entire process of retrieving new
measurements recorded in the past. A variety of prediction measurements, updating the model and making short-term
models have been proposed in the literature that include energy prediction can be performed in ‘real time’ on-line.
time-series models, Fourier series models, regression The objective of this paper is to evaluate the performance
models, artificial neural network (ANN) models and fuzzy of several adaptive neural network models for the on-line
logic models. Each model type has its own features, prediction of building energy demand using both simulated
advantages and disadvantages, and in addition, its perfor- (synthetic) and measured data.
mance varies from one application to another.
With the exception of a few ANN models, most of the
surveyed literature focus on static prediction, a prediction 2. Literature survey

* Corresponding author. Tel.: +1 514 396 8667; fax: +1 514 396 8584. Regression models have been demonstrated to be
E-mail address: hugues.rivard@etsmtl.ca (H. Rivard). effective for building energy predictions in a number of

0378-7788/$ – see front matter # 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.enbuild.2005.02.005
J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259 1251

experiments (e.g., [2–4]). Relatively few parameters must be Most of the literature focuses primarily on static
identified, thus reducing the time required for the model prediction. In a static prediction, the prediction model is
development. The regression models do not, however, set up in advance using historical data and does not change
accurately reflect the hourly or sub-hourly energy demand. afterward, when new information become available. It is
They are best suited for predicting the average consumption highly possible that such a model becomes invalid when new
over longer periods such as days or months. For different patterns emerge and more recent data becomes available. In
buildings with different environment and weather condi- this case, a dynamic prediction model that can adapt itself to
tions, much effort and time must be spent on selecting time such changes in the energy consumption pattern is desirable.
scales and regressors to find a best fit model. Also, auto- This is especially true for short-term energy prediction. Only
correlation or multicollinearity problems must be consid- a few dynamic prediction systems were found in the
ered when evaluating the performance of prediction because literature and all in the field of electric load forecasting for
they tend to lead to model uncertainty. power system. They all used a sliding window approach in
The use of time-series analysis techniques to forecast which the size of data set used for training is kept constant;
energy use is logical because the history of energy use can be however, the data set is periodically updated as the window
represented by a time series. Kimbara et al. [5] experimented moves forward in time. Djukanovic et al. [10] used an
with the autoregressive integrated moving average adaptive system for short-term load forecasting with a
(ARIMA) model and found the performance of ARIMA moving window consisting of data associated with the 4
to be better than a two-dimensional autoregressive (AR) previous weeks as well as with 8 weeks at the same time in
model. Several models and applications have been the previous year. Khotanzad et al. [11] used a combination
implemented based on the autoregressive moving average of three separate models (i.e., weekly, daily and hourly
with exogenous input model (ARMAX) model (e.g., [6]). models) for short-term load forecasts. Each model is updated
On one hand, time-series models can capture the relationship at the end of each period with the data associated with that
between the hourly energy use and time variation given a set period (e.g., at the end of each week, the weekly model is
of time-series data. On the other hand, both ARMA and AR updated using the last week data). Mohammed et al. [12]
models work under the assumption that the present value is a used three ANN adaptive models and a two-stage training
linear combination of the previous ones. In most cases, this algorithm. The first training stage produces a set of initial
assumption is invalid. The ARIMA and ARMAX models ANN weights that capture the general, day by day trend of
can handle the changes in an unstationary process, but the electric load. During the second-stage training, the ANN
require the estimation of many parameters. Also, the auto- is refined and enhanced to capture special features of that
correlation between variables must be considered because it particular day for which the forecast is made by using a
strongly impacts the accuracy of the prediction. Dhar et al. subset of data consisting of those that share similar
[7] used a Fourier series model to predict the energy demand temperature conditions with the day being forecasted as
in an institutional building. Fourier series models provide well as data from the previous 5 days. Charytoniuk and Chen
better performance compared to the above time-series [13] used an ANN model with an adaptive scheme that
models; however, they are based on the assumption that involves a moving window, which consists of training data
energy use in most buildings is periodic. If dramatic changes from the 3 last weeks as well as 2 weeks around the same day
happen, high-frequency Fourier components must be in the previous year. These approaches have potentials in the
included in the model, thus dramatically increasing the area of on-line building energy prediction.
computational cost. The regression and time-series models are based on
Artificial neural networks (ANN) is a type of artificial classical mathematical theory. Thus, the behavior of these
intelligence technique that mimics the behavior of the models is well understood and the model parameters are
human brain. It can approximate a nonlinear relationship straightforward to estimate. However, these models tend to
between the input variables and the output of a complicated work well only for energy systems that are well behaved.
system. The main advantage of an ANN model is its self- The ANN model generally works better for buildings that
learning capability. The use of ANN in building energy exhibit highly nonlinear energy consumption patterns.
prediction has been investigated by many researchers (e.g., However, the success of using ANN depends on a number
[8,9]). Their models share some similarities, but each differs of design issues such as the choice of input and output data,
significantly in implementation details because each is the number of hidden layers, the number of neurons used in
tailored toward a specific type of energy prediction under a each layer and the training algorithms used. A few studies
specific building environment. On one hand, ANN models have compared these forecasting methods. Kawashima et al.
estimate parameters faster by learning from examples [14] found that ANN models provide the best performance.
automatically. On the other hand, because it is hard to Dhar et al. [7] compared Fourier series models, ANN models
distinguish structure from noise in the data, an ANN tends to and the winners of the Great Energy Prediction Shootout
memorize noise. Also, an ANN might not be able to adapt to [15]. The performance of Fourier series was found to be
dramatic changes such as unstable behavior in the power comparable to the performance of ANN models and that of
load–temperature relationship. the winners of the shootout competition. Fourier series
1252 J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259

models and artificial neural networks models are the two in the system. To ensure the accuracy of the ANN prediction,
most recommended methods found in the literature. a subset of the data is reserved for cross-validation. The
An ANN model has the unique advantage that no clear magnitude of the cross-validation error can be used as a
relationship between the input variable and output needs to stopping criterion for the ANN training.
be defined before the model is used in the prediction process,
since this relationship is identified through a self-learning 3.2. Adaptive ANN models
process. Because of this unique feature, the time and effort
that are normally required to establish a proper mathematical In a static prediction model, one tries to establish the
model in a conventional prediction methodology can be model in advance and estimate the model parameters using
significantly reduced. The ANN-based models appeared to historical data. Once the model is built, it is rarely changed
perform well in the two Great Energy Shootout competitions even though the presence of new data may indicate that the
organized by ASHRAE. Therefore, this research adopts model is no longer valid. To address this problem, this paper
ANN and explores ways to develop adaptive ANN models evaluates the performance of adaptive ANN models that can
for the on-line prediction of buildings energy demand. be constantly updated as new environmental and operational
data becomes available. The adaptive ANN prediction
models have an inherent self-revision capability to adapt to
3. Proposed artificial neural network models for weather and other condition changes. Two adaptive ANN
building energy prediction models are proposed: (1) the accumulative training and (2)
the sliding window training.
This section presents two ANN models used for building An ANN can be retrained periodically by a set of
energy prediction. augmented data infused with newly collected measurements.
This type of training strategy is referred to as accumulative
3.1. Architecture issues for the ANN models training. Accumulative training has the obvious advantage of
being able to identify both the local (for example, daily) and
The electric, gas or chilled water demand or consumption the global (seasonal) trends of energy variation. Its main
is usually the desired output of an ANN model used for disadvantage lies in the fact that the volume of data
building energy prediction. Typical input elements include accumulated continuously increases and may become too
the environmental data (e.g., outdoor dry-bulb temperature, large to be manageable. The larger the volume of the training
wet-bulb temperature, horizontal solar flux), time (e.g., hour data, the longer it takes for training the ANN. It is also likely
of the day) and operating parameters (e.g., status of chillers that the latest changes in the accumulative training data set
or temperature of hot water leaving the boiler). For some have smaller impact on the model training because their
applications, not all environmental variables that contribute quantity is less compared to the older data.
to variations of energy usage may be available. In this case, Alternatively, the size of the training data set can be kept
the accuracy of the ANN prediction will be limited by the constant and new measurements are added while some of the
incompleteness of the data. On the other hand, if some oldest data are dropped from the training set. This approach
measurements are irrelevant to the energy usage to be can be graphically viewed as periodically sliding a time
predicted, they carry no useful information and contribute window across a time series of measurements to select the
only noise to the ANN input. Removing these types of training data. An adaptive ANN-based model on this
measurements from the list of inputs can improve the training technique will be referred to as a sliding window
accuracy of the prediction and reduce the training time. ANN. The relative small and constant size of the training
Feuston and Thurtell [16] used the principal component data makes it possible to perform fast ANN re-training. The
analysis (PCA), a multivariate statistical analysis technique, drawback of this approach is that the training data may only
to assemble, synthesize and select relevant input variables contain recent information, and the prediction result may not
among a large number of measurements. As a consequence, accurately reflect the annual or seasonal change in energy
the neural network input vector does not consist of the usage pattern. Also, determining the optimum window size
original input variables, but of linear combinations of these ahead of time is difficult. The window cannot be too large;
variables. The PCA technique reduces the dimensionality of otherwise it defeats the purpose of limiting the amount of
data and removes redundancy by seeking clusters of data training data. If the window size is too small, the resulting
points that can be used to represent the main features of the ANN model maybe not completely capture the trend of the
data. This method is adopted in this study. energy demand.
Typically one or two hidden layers are sufficient in an
ANN designed for building energy prediction. The number
of neurons in each layer may vary depending on the 4. Computational experiments using synthetic data
quantities to be predicted. The presence of redundant
neurons does not pose a significant problem. A good This section contains the computational results obtained
learning algorithm tends to ignore the excessive parameters from applying the ANN models to a data set that was
J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259 1253

obtained from the simulation of an office building using the for the Laval building. Adding more layers and/or neurons
DOE 2.1E software. This data set is called synthetic data. can potentially improve the prediction accuracy, but also
The Laval office building, located in Montreal was built in adds complexity to the ANN training time.
1972 [17]. The building has a total floor area of 10,400 m2 The MATLAB ANN toolbox is used to build, configure
spread over a seven-floor office tower, an underground and train the network. The toolbox offers several algorithms
garage and a ground floor. There is a central variable air for training the network. All algorithms were tested in this
volume system, which provides cooling in the summer and study. When the data size is small, the Levenberg–Marquart
ventilation all year to the office spaces. Direct expansion (LM) algorithm appeared to be the fastest training algorithm
cooling coils are connected to four condensing units, each (the mean square error of the ANN output approaches to zero
equipped with two compressors with a refrigeration capacity at a quadratic rate). However, because the LM method must
of about 90 kW. The supply air temperature is controlled in solve a linear system of equations in order to obtain the
terms of the outdoor air temperature. The system is also search direction, the computation becomes expensive when
equipped with a dry-bulb temperature economizer system. the number of input elements and the volume of the training
The simulated data associated with the Laval building is data increase. Therefore, when the volume of the data is
noise free: the building is assumed to operate under normal large, the standard gradient descent algorithm is used for
conditions, which do not change from week to week, and training the ANN. Logistic sigmoid functions are used as the
from season to season. In addition, there are no measuring activation function in each neuron. In the output layer, a
errors, operation mistakes, faults or degradation of energy linear transfer function is used to allow the network to
performance of equipments in time. Because the simulated produce values outside of the range [ 1, 1]. The training
data is generated from a well-behaved energy model defined process is terminated when the mean square error (MSE)
in the simulation software (DOE 2.1E), it provides an ideal between the ANN output and the target values becomes less
scenario under which the variation of the energy consump- than 10 5, or when a maximum of 500 epochs is reached.
tion is more ‘‘predictable’’. Performing experiments on this The initial weights and biases of the ANN are generated
data set serves as the first step towards the understanding, randomly.
developing and testing a realistic ANN model.
Several experiments are performed on this data set. 4.2. Data processing
Because a static ANN model serves as the building block for
a dynamic ANN, its construction and testing are described in Besides defining the architecture parameters of the ANN
Section 4.3, before the experiments associated with the as discussed above, the data needs to be processed prior to
dynamic models are shown in Section 4.4. Prior to present the training of the ANN. It has been recognized in the
these results, Section 4.1 describes the architecture of the literature that it is important to adopt a day-typing procedure
ANN models used followed by Section 4.2 that describes to separate energy data with distinct load patterns into
data processing. different prediction groups. Energy prediction can be made
within each group instead of on the entire data set. An
4.1. ANN architecture obvious separation can be drawn between weekdays and
weekends (holidays). The Laval building operates from 7:30
The quantity of interest here is the hourly electric demand a.m. to 11:00 p.m., Monday–Friday. Since the chiller is not
of the chiller installed in the Laval building. The hourly data used outside this interval, the non-working hours are
set consists of: outdoor dry-bulb temperature, Td(t); outdoor removed from the data set since prediction is only to be
wet-bulb temperature, Tw(t); temperature of water leaving made for the working hours. Without day-typing, the
the chiller, Tl(t); chiller electric demand (compressor and implementation is easier. However, the prediction errors
fans of the condensing unit), E(t). It is assumed that the increase if all data is included, as demonstrated below.
electric demand of the chiller, at time t, is a function of some Two experiments were conducted by using data collected
of the above mentioned environmental and operation in June to predict the electric demand of the chiller in July. The
variables collected at hours t i, for i = 1, 2, . . ., n. Those prediction accuracy is measured by the coefficient of variation
variables will become inputs to an ANN to predict the (CV) and the root mean square error (RMSE). In the first test,
electric demand of the chiller at time t. when the training process is carried out using only the
The ANN model used in the experiments consists of one working-hour data, the statistics are the following: CV = 0.20
hidden layer in addition to the input and output layers. The and RMSE = 27.0 kW. In the second test, the results for the
input layer contains n neurons used to feed n different inputs entire data set are: CV = 0.67 and RMSE = 34.4 kW. Since
into the network (n is a variable because it varies with the the use of working-hour data improves significantly the
experiments). The output layer contains one neuron from accuracy of prediction, all non-working hour data were
which the predicted chiller electric demand is extracted. The removed for the experiments presented below.
hidden layer consists of 2n + 1 neurons [18]. This three- Because the numerical range of the input and output
layer ANN model was found to be sufficient for making a variables may be quite different for some applications, it is
reasonably accurate prediction of the chiller electric demand often useful to normalize the input and output variables so that
1254 J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259

the training process does not suffer from severe numerical Table 1
round-off effects. In the following experiments, all input Statistical indices, CV and RMSE, used for comparing the difference
between the predicted and measured energy demand
variables are normalized to values between 1 and 1.
Experiment CV ( ) RMSE (kW)

4.3. Training and testing of static prediction models Static models for chiller electric demand using synthetic data
No. 1 0.04 6.10
No. 2 0.16 25.5
Three experiments were carried out with static prediction No. 3 0.15 22.0
models. The first experiment assesses the accuracy of ANN No. 3 (PCA) 0.07 11.4
to predict the chiller electric demand at time t based on input On-line models with accumulative training for chiller electric demand using
data at time t. The second experiment predicts the demand synthetic data
using time-lagged data by an hour (t 1). Finally, the third No. 4 (PCA) 0.15 28.3
experiment tests various combinations of time-lagged data No. 5 (PCA) 0.17 28.9
to improve the prediction. Seventy-five percent of data is On-line models with sliding window training for chiller electric demand
used to train the ANN models, and 25% of data are set aside using synthetic data
for testing. The second data set corresponds to the second No. 6 0.40 8.05
No. 6 (PCA) 0.15 27.7
week of each month.
No. 7 (PCA) 0.16 27.8

4.3.1. Experiment no. 1—Without time-lagged Static models for chiller electric demand using measured data
No. 8 0.23 3.73
measurements as input No. 9 (PCA) 0.26 4.28
The chiller electric demand at time t is predicted in terms
On-line models with accumulative training for chiller electric demand using
of Td(t), Tw(t) and Tl(t) at time t. Because the number of input
measured data
elements is small, the Levenberg–Marquardt algorithm was No. 10 2.53 13.29
used to train the ANN. Training takes 4.9 s on a personal
On-line models with sliding window training for chiller electric demand
computer equipped with a Pentium III running at 933 MHz, using measured data
with 128 MB of RAM and Windows 98 as the operating No. 11 0.26 12.88
system. The mean square error (MSE) between the output of
the ANN and the target electric demand decreases below
10 3 in less than 10 epochs. Once the training process is 4.3.3. Experiment no. 3—Using additional time-lagged
completed, the quality of the ANN prediction model is measurements as input and PCA to reduce the
assessed by predicting the chiller electric demand associated dimension of the input
with the second week of each month (the test data). The To improve the prediction accuracy, longer past history is
accuracy is measured by using the CV and RMSE values of included as additional input data to train the ANN model.
the prediction error associated with the test: CV = 0.04, Several combinations of time-lagged input data, up to 6 h
RMSE = 6.10 kW (Table 1). Experiment no. 1 indicates that delay, are tried and tested (Table 2). It is assumed that h(t)
the ANN model constructed in that experiment can and Tl(t 1) are always chosen as input variables. The
effectively capture the direct nonlinear relationship between inclusion of all temperature measurements collected at time
the chiller electric demand of the Laval building and various t i, for i = 1, 2, . . ., 6, (case D), does not necessarily gives
temperature measurements at a given time. However, this the best performance: CV = 0.22 and RMSE = 34.4 kW. The
model cannot be used for the on-line prediction of the best prediction result is obtained for case A, where the CV
electric demand at time t, since the input variables Td(t), and RMSE values are 0.15 and 22.0 kW, respectively.
Tw(t) and Tl(t) are not available until the end of that hour. The use of all available time-lagged measurements is not
effective because of undesirable redundancy. The redun-
4.3.2. Experiment no. 2—Using time-lagged dancy in the ANN input makes it difficult for the back-
measurements as input propagation algorithm to capture the optimal weights and
One possibility is to use h(t), Td(t 1), Tw(t 1) and biases for the desired ANN model. Thus, a meticulous
Tl(t 1) as input variables to train the ANN to predict E(t). selection of time-lagged data must be used to train the ANN
The variable h(t) indicates the hour for which to do the model. However, in practice, it is not possible to try all
prediction. It is observed that this change of input vectors to possible combinations of lagged timed temperature mea-
the ANN leads to a longer training period of 11.4 s. Even surements before a prediction is made. This problem can be
when the Levenberg–Marquardt algorithm is used, the mean eliminated by using the principal component analysis (PCA)
square error between the ANN output and the target technique to select appropriate input data from h(t),
decreases slowly to zero. This difficulty is even more Tl(k 1), Td(t k), Tw(t k), for k = 1, 2, . . ., 6. Six
pronounced when the ANN model is applied to the test data principal components that contribute to more than 1% of the
for prediction. The error between the predicted electric variance of all past history data are retained. The training
demand and the measured electric demand is defined by time is 9.9 s, and the CV and RMSE values are reduced to
CV = 0.16 and RMSE = 25.46 kW (Table 1). 0.07 and 11.4 kW, respectively (Table 1). The CVand RMSE
J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259 1255

Table 2
Prediction of the chiller electric demand at time t using different combinations of time-lagged temperature measurements (Experiment no. 3)
Outdoor dry-bulb temperature, Td Outdoor wet-bulb temperature, Tw CV ( ) RMSE (kW)
t 1 t 2 t 3 t 4 t 5 t 6 t 1 t 2 t 3 t 4 t 5 t 6
A        0.15 22.0
B     0.17 26.5
C            0.17 26.7
D             0.22 34.4

values of the prediction obtained in this experiment are to the ANN model consists of a combination of the past
much better than the ones obtained in Experiment nos. 2 or 3 history temperature measurements Td(t k), Tw(t k) and
(without PCA). Tl(t 1), where 1  k  6. The output of the ANN is the
chiller electric demand E(t). The PCA is applied to remove
4.4. Training and testing of on-line prediction models the redundancy in the input and only components that
contribute to more than 1% to the variance are retained. Six
The previous static ANN model is modified to take principal components emerged as the input to the ANN after
advantage of new measurements that become available on a PCA has been applied.
continuous basis. This research evaluates two different The baseline model is modified and retrained daily by
approaches to implement adaptive ANN. The first approach including the electric demand and temperature measure-
simply accumulates all the measurements collected up to ments collected on the previous day in the training data set.
time t, and retrain the ANN periodically using the entire set The value of MSE between the target and ANN training
of measurements. This is referred to as accumulative output converges to zero rapidly. The predicted chiller
training (presented in sub-Section 4.4.1). The second energy demand matches the actual usage reasonably well
approach maintains a fixed amount of training data by (Fig. 1). The CV and RMSE values obtained from testing are
discarding old measurements while adding new measure- 0.15 and 28.3 kW, respectively.
ments. This is referred to as sliding window training
(presented in sub-Section 4.4.2). 4.4.1.2. Experiment no. 5—Accumulative training with
time-lagged chiller energy usage as input. In this experi-
4.4.1. Accumulative training ment, E(t - k) is added to the set of input variables. The
Data about temperatures and chiller electric demand from ANN model is constructed and trained in a way similar
the month of June are set aside, and this portion of the data to that carried out in Experiment no. 4. The predicted
file is used to establish, through training, what is called a electric demand matches the measurements reasonably
baseline ANN model for the chiller electric demand. The well (CV = 0.17 and RMSE = 28.9 kW) (Table 1). Since
baseline ANN model is then used to predict the chiller the CV and RMSE values obtained in Experiment nos. 4
electric demand associated with the first day of July. Once and 5 are comparable to those obtained with static ANN
the prediction has been performed, the hourly temperature models without PCA (Experiment nos. 2 and 3), one can
and electric demand measurements associated with the day conclude that the on-line model with accumulative training
being predicted is added to the initial data set allocated for provides reasonable accuracy, when compared with static
the baseline training. This updated data set is used to retrain models.
the ANN model for carrying out subsequent predictions. The
weights and biases that emerge from the baseline model are 4.4.2. Sliding window training
used as the initial weights and biases during the retraining The training data consists of the temperature and electric
process. Since these initial weights and biases are likely to demand measurements enclosed within the sliding window.
capture some features of the nonlinear mapping between the The ANN model used in the sliding window approach has
independent temperature variables and the electric energy to the same architecture as the one used in the accumulative
be predicted, it is conceivable that retraining will not take as prediction model. The same parameter settings (such as the
long as the baseline training. This behavior is confirmed in learning algorithm, training convergence tolerance and the
these experiments. The results presented below compares maximum number of epochs allowed) are used for the
the impact of different choices of input variables on the following experiments. The sixth experiment uses only past
accuracy of the accumulative on-line prediction model. In temperature measurements as input while the seventh
particular, the fourth experiment uses only past temperature experiment adds past electric demand measurements as
measurements as input while the fifth experiment adds past input.
electric energy measurements as input.
4.4.2.1. Experiment no. 6—Training with sliding window
4.4.1.1. Experiment no. 4—Accumulative training with using temperature data collected in previous hours. Ex-
time-lagged temperature measurements as input. The input periments show that a window size of 20 working days
1256 J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259

Fig. 1. Comparison between the predicted chiller electric demand and synthetic data using an on-line model with accumulative training and time-lagged
temperature measurements (Experiment no. 4).

Table 3 However, the number of principal components may change


Comparison between the predicted and synthetic data using an on-line when the sliding window is updated. More or fewer principal
model with sliding window training (Experiment no. 6)
components may appear as daily training and prediction
Window size (days) CV ( ) RMSE (kW)
move forward. When the number of principal components
10 0.25 45.5 associated with the new training data set is different from the
20 0.40 8.05
one associated with the previous training data, one cannot
30 0.20 3.00
40 0.46 82.0 restart from the ANN model obtained from previous training
cycle. Weights and biases must be reinitialized randomly,
and the training may take more time.
provides a reasonable balance between accuracy and Fig. 2 shows that overall the training of sliding window
computational complexity per online prediction cycle produces a qualitatively good prediction. The predicted
(Table 3). Hence, all subsequent experiments use this electric demand matches the actual demand curve reason-
window size. Measurements of temperature and electric ably well except at some hours where the actual load shows
demand associated with the first 20 days of June were some unexpected fluctuation. The overall CV and RMSE
selected as the initial set of training data. The prediction is values obtained in this experiment are 0.15 and 27.7 kW,
made on a daily basis. Thus, once the initial training is respectively (Table 1).
completed and a prediction has been made for the electric
demand on the 21 working days in June, the hourly 4.4.2.2. Experiment no. 7—Training with sliding window
temperature and electric demand measurements correspond- using temperature and electric demand measured in the
ing to the first working day of June are removed from the previous hours as input. In this experiment, we investigate
training data. The measurements of temperatures and whether adding E(t k) to the list of input variables: h(t),
electric demand associated with the 21 days are added into Tl(t k), Td(t - k), Tw(t - k), k = 1, 2, . . ., 6, can improve
the training data and retraining is carried out. Consequently, prediction accuracy. The PCA technique is used to remove
the volume of training data does not change, and the the potential redundancy in the data. There is no clear
selection window is shifted forward in time by 1 day. The improvement in prediction accuracy when E(t k) is
overall CV and RMSE values of the prediction are 0.40 and added. The CV and RMSE values of the prediction are 0.16
8.05 kW, respectively. and 27.78 kW (Table 1). Results from the on-line models
After PCA is applied to the initial set of time-lag using the sliding window training, applied to synthetic data,
temperature measurements to select principal components, are as accurate as those obtained from the accumulative
six principal components emerged as the ANN input. training.
J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259 1257

Fig. 2. Comparison between the predicted chiller electric demand and synthetic data using an on-line model with sliding window training and time-lagged
temperature measurements (Experiment no. 6).

5. Computational experiments using measured data Table 4 are initially identified to be the independent
variables that can potentially affect the variation of E(t). A
To evaluate the effectiveness of the proposed ANN closer examination of data reveals that not all variables listed
models some computational experiments are performed with in the data file are measured at every hour. Furthermore, the
measured data from a real environment. Measurements number of available measurements is different from one
contain a number of anomalies that make it more variable to another. The number of hourly measurements for
challenging to produce highly accurate prediction results. some variables is roughly 50% of the total number of hours
The computational experiments present the typical difficul- between the beginning and ending period of the measure-
ties encountered in developing an ANN model to predict the ments due to problems related to data collection. The low
energy demand in a real building. quality of the raw data makes it difficult to design, train and
This section contains the results of using the proposed test an accurate ANN model. However, this situation could
ANN techniques to predict the chiller electric demand of the occur in a monitoring system installed in buildings.
building housing the CANMET Energy Technology Center Therefore, the challenge is to develop, in the absence of
located in Varennes, Que., Canada. Because the original data
provided in this experiment is not prepared in a format that Table 4
can be directly used by the MATLAB code developed in this Input and output variables at time t related to the chiller electric demand
research, the raw data was first preprocessed and converted Variable Description
into the desired format. During the process of conversion, SC1(t) On/off status of compressor 1
several problems associated with the completeness and SC2(t) On/off status of compressor 2
fidelity of the data was discovered. Problems encountered SC3(t) On/off status of compressor 3
SC4(t) On/off status of compressor 4
and methodologies for addressing these problems are
SC5(t) On/off status of compressor 5
described below, before discussing the ANN architecture, SC6(t) On/off status of compressor 6
experiments and results. Te(t) Temperature of the water entering the ice tank
Tev(t) Temperature of the water entering the evaporator
5.1. Data Tlv(t) Temperature of the water leaving the evaporator
Hum(t) Outdoor relative humidity
TOD(t) Outdoor temperature
The data set contains measurements between 12:00 p.m. SV1(t) Is the chilled water prepared in the ice tanks? (yes/no)
June 21, 2002 and 12:00 a.m. March 27, 2003 and between SV2(t) Percentage of chilled water prepared in the ice tanks
11:00 a.m. May 8, 2003 and 0:00 a.m. July 10, 2003. The HD(t) Holiday indicator
objective of this study is to predict the chiller electric WS(t) Weekday schedule
CR(t) Electric current used by the chiller
demand E(t), in kW, at a particular time t. Variables listed in
1258 J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259

complete information, an ANN model able to provide previous hours. The PCA is used to reduce the dimension
reasonably accurate on-line predictions of electric demand. of the input and to remove redundancy in the data.
The prediction of chiller electric demand is made only Six principal components that contribute to more than 1%
when at least one compressor is turned on (indicated by of the total variation are retained. Training time is 5.7 s.
SCi(t) = 1 for i = 1, 2, . . ., 6). As a result, both training and The MSE of the ANN output at the end of the training
testing are only performed using measurements associated process is around 10 2. The overall accuracy of this ANN
with non-zero SCi(t) values. model is satisfactory: CV is 0.26 and RMSE is 4.28 kW
(Table 1).
5.2. ANN architecture
5.4. On-line prediction of electric demand with
Similar to the ANN models used to predict the chiller accumulative training
electric demand using synthetic data (Section 4), the ANN
models used in this section consists of one hidden layer in The Experiment no. 10 was carried out to test the
addition to the input and output layers. The input layer accumulative training ANN model that uses the chiller
contains n neurons used to feed n different inputs into the operating status at time t 1.
network. The output layer contains one neuron from which the
chiller electric demand is extracted. The hidden layer consists 5.4.1. Experiment no. 10—Prediction using SCi(t 1)
of 2n + 1 neurons [18]. Logistic sigmoid functions are used as In this experiment, the chiller related variables, measured
the activation function in each neuron. In the output layer, a between September 2002 and May 2003, are selected for
linear transfer function is used to allow the network to produce baseline training. The volume of the training data is small (it
values outside of the range [ 1, 1]. The training process is corresponds to 130 h). After the baseline training, the initial
terminated when the mean square error (MSE) between the ANN model is used to predict the chiller electric usage for
ANN output and the target values becomes less than 10 3, or the next 24 h. In this accumulatively trained on-line model,
when a maximum of 500 epochs is reached. The initial the ANN is updated daily by adding measurements that
weights and biases of the ANN are generated randomly. become available on the day chiller energy is to be predicted
Whenever possible, the Levenberg–Marquardt (LM) algo- into the training data set.
rithm is specified to train the network. When the number of The CV and RMSE values are 2.53 and 13.29 kW,
input elements or the volume of the data is large, the standard respectively (Table 1).
gradient descent algorithm is used for training the ANN.
5.5. On-line prediction of electric demand with sliding
5.3. Training and testing for static prediction of chiller window training
electric demand
The Experiment no. 11 was carried out to test the sliding
Two experiments were first carried out with static window training ANN model that uses the chiller operating
prediction models. The eighth experiment assesses the status at time t 1 as input.
accuracy of ANN to predict the chiller electric demand at Like in Experiment no. 10, the chiller related
time t based on input data at time t. The ninth experiment variables measured between September 2002 and May
predicts the demand using time-lagged data as input. 2003, are set aside for baseline training. Once the baseline
training is completed, the initial ANN model is used to
5.3.1. Experiment no. 8—Without time-lagged predict the chiller electric usage for the next 24 h. The
measurements as input ANN is updated daily by adding new measurements into
In this experiment, the chiller electric demand at time t is the training data set and deleting some previous measure-
predicted in terms of all the variables listed in Table 4, at ments from the training set. Only data recorded during the
time t, except the status variables (SCi(t), i = 1, 2, . . ., 6). hours during which a chiller is on are added to the training
About 80% of the non-zero measurements data between data set.
September 2002 and May 2003 are reserved for training.
After training the network for a maximum of 500 epochs, the 5.5.1. Experiment 11—Prediction using SCi(t 1)
MSE becomes less than 10 3. The predicted chiller energy The difference between the predicted and measured data
usage matches well the actual chiller electric demand. The becomes large: CV and RMSE values are 0.26 and
CV and RMSE values are 0.23 and 3.73 kW, respectively 12.88 kW, respectively (Table 1). The drawback with this
(Table 1). Training takes 2.4 s only. approach is that the prediction assumes that if in the last
hour the chiller was ON, it will be ON in the next hour. So
5.3.2. Experiment no. 9—Using time-lagged when the chiller is turned OFF, the system will miss it by
measurements as input 1 h since it assumes it is ON based on the previous hour.
In this experiment, an ANN model is developed Only in the following hour will the system finally ‘‘sees’’ it
that predicts E(t) based on measurements collected in as OFF and then works properly. The prediction of the ON/
J. Yang et al. / Energy and Buildings 37 (2005) 1250–1259 1259

OFF status of the chiller is always lagged by 1 h. One way References


to address this would be to develop another ANN that
would predict when the chiller will be turned ON or OFF. [1] A. Dhar, T.A. Reddy, D.E. Claridge, A Fourier series model to predict
This could not be achieved at the moment due to the lack of hourly heating and cooling energy use in commercial buildings with
outdoor temperature as the only weather variable, ASME Journal of
data. Solar Energy Engineering 121 (1) (1999) 47–53.
[2] M. Fels, Special Issues devoted to measuring energy savings: the
scorekeeping approach, Energy and Buildings 9 (1–2) (1986) 5–18.
6. Conclusions [3] D. Ruch, D.E. Claridge, A four–parameter change-point model for
predicting energy consumption in commercial buildings, ASME
Journal of Solar Energy Engineering 114 (2) (1992) 77–83.
Most of the surveyed literature focused on using static [4] S. Katipamula, T.A. Reddy, D.E. Claridge, Multivariate regression
ANN models to predict energy demand at time t, when all modeling, ASME Journal of Solar Energy Engineering 120 (1998)
independent parameters are known at the same time t. 177–184.
Although this modeling approach has serious drawbacks for [5] A. Kimbara, S. Kurosu, R. Endo, K. Kamimura, T. Matsuba, A.
Yamada, On-line prediction for load profile of an air-conditioning
the on-line prediction, it is presented in this paper for the
system, ASHRAE Transactions 101 (2) (1995) 198–207.
sake of comparison. With synthetic, noise-free data the [6] Y. Hong-Tzer, H. Chao-Ming, H. Ching-Lien, Identification of
training time is between 10 and 11 s, and the coefficient of ARMAX model for short term load forecasting: an evolutionary
variance (CV) is between 0.07 and 0.16. The static models programming approach, IEEE Transaction on Power Systems 11
applied to real measurements lead to lower accuracy (1) (1996) 403–408.
(CV = 0.23–0.26) than in the case of synthetic data [7] A. Dhar, T.A. Reddy, D.E. Claridge, Modeling hourly energy use in
commercial buildings with Fourier series functional form, ASME
(CV = 0.07–0.16). This is due to the lower quality of Journal of Solar Energy Engineering 120 (3) (1998) 217–223.
available input data, the smaller amount of data and the [8] M. Anstett, J.F. Kreider, Application of neural networking models to
complex operation strategies of chillers in the existing predict energy use, ASHRAE Transactions 99 (1) (1993) 505–517.
building, compared to the simulated one. Therefore, one [9] P.S. Curtiss, J.F. Kreider, M.J. Brandemuehl, Energy management in
central HVAC plants using neural networks, ASHRAE Transactions
can expect that the adaptive models would have an even
100 (1) (1994) 476–493.
lower accuracy. This is not always the case, as presented [10] M.R.S. Djukanovic, B. Babic, D.J. Sobajic, Y.H. Pao, A neural-based
below. short term load forecasting using moving window procedure, Electric
Two types of adaptive ANN training schemes are Power and Energy System 17 (6) (1995) 391–397.
presented in this paper. In the case of synthetic data, the [11] A. Khotanzad, R.C. Hwang, A. Abaye, A.D. Maratukulam, An
accumulative training technique appears to have an equal adaptive modular artificial neural network hourly load forecaster
and its implementation at electric utilities, IEEE Transactions on
performance with the sliding window training approach in Power System 10 (3) (1995) 1716–1722.
terms of CV (0.15–0.17). In the case of real measurements, [12] O. Mohammed, D. Park, R. Merchant, T. Dinh, C. Tong, A. Azeem, J.
the sliding window technique gives better results, compared Farah, C. Drake, Practice experience with an adaptive neural networks
with the accumulative training, if the coefficient of variance short-term load forecasting system, IEEE Transactions on Power
is used as an indicator: CV of 0.26, compared with 2.53 (for Systems 10 (1) (1995) 254–265.
[13] W. Charytoniuk, M.S. Chen, Very short-term load forecasting using
accumulative training). artificial neural networks, IEEE Transactions on Power Systems 15 (1)
Future research work will use another institutional (2000) 263–268.
building, having a larger volume of operational data over [14] M. Kawashima, C.E. Dorgan, J. Mitchell, Hourly thermal load pre-
several years, to estimate the optimal window size for the diction for the next 24 h by ARIMA, EWMA, LR, and an artificial
neural network, ASHRAE Transactions 101 (1) (1995) 186–200.
sliding window training approach. Other architectures and
[15] J.S. Haberl, S. Thamilseran, The great energy predictor shootout II-:
types of neural networks, such as recurrent neural networks, measuring retrofit savings-overview and discussion of results, ASH-
will also be considered. RAE Transactions 102 (1) (1996) 419–435.
[16] B. Feuston, J. Thurtell, Generalized non-linear regression with ensem-
ble of neural nets, ASHRAE Transactions 100 (2) (1994) 1075–1080.
[17] R. Zmeureanu, L. Pasqualetto, F. Bilas, Comparison of cost and energy
Acknowledgement savings in an existing large building as predicted by three simulation
programs, in: J.W. Mitchell, W.A. Beckman (Eds.), Proceedings of the
The authors acknowledge the technical assistance and IBPSA Building Simulation’95 Conference, Madison, 1995, pp. 485–
492.
financial support received from CANMET Energy Technol- [18] R. Hecht-Nielson, Theory of the backpropagation neural network,
ogy Centre of Natural Resources Canada, in Varennes, Proceedings of IEEE International Conference on Neural Networks 1
Quebec. (1989) 593–605.

You might also like