You are on page 1of 6

Expert Systems with Applications 38 (2011) 1484614851

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

A comparative study of articial neural networks, and decision trees for digital
game content stocks price prediction
Tsung-Sheng Chang
Department of Information Management, National Chung Cheng University, Min-Hsiung, Chia-Yi 621, Taiwan, ROC

a r t i c l e

i n f o

Keywords:
Articial neural networks (ANN)
C&RT
Decision tree
Stock price forecasting

a b s t r a c t
Precise prediction of stock prices is difcult chiey because of the many intervening factors. Unpredictability is particularly notable in the aftermath of the global nancial crisis. Data mining may however be
used to discover highly correlated estimation models. This study looks at articial neural networks
(ANN), decision trees and the hybrid model of ANN and decision trees (hybrid model), the three common
algorithm methods used for numerical analysis, to forecast stock prices. The author compared the stock
price forecasting models derived from the three methods, and applied the models on 10 different stocks
in 320 data sets in an empirical forecast. Average accuracy of ANN is 15.31%, the highest, in terms of
match with real market stock prices, followed by decision trees, at 14.06%; hybrid model is 13.75%.
The study also discovers that compared to the other two methods, ANN is a more stable method for predicting stock prices in the volatile post-crisis stock market.
2011 Elsevier Ltd. All rights reserved.

1. Introduction
Since mid-2008, when the collapse of Lehman Brothers led to
global economic repercussions, the stock market has been hard
hit. Stock indices fell, and economies went into recession. More
than one year later, stock indices have witnessed sharp uctuations, especially in new emerging markets. In addition, globalization has made forecasting of stock prices increasingly difcult
(Albuquerque, Francisco, & Marques, 2008; Stock & Watson,
2007). However, we need to know if the study samples of this period and the results of the forecasts of previous researchers are in
line with our expectations. In addition, among Asian emerging
markets, Chinas economic growth has been the engine spurring
on the development of stock markets in the region; hence the
greater volatility of stock prices in these markets. Thus, we need
to attach greater importance to emerging stock markets (Dutta,
Jha, Laha, & Mohan, 2006; Fidrmuc & Korhonen, 2009). The Taiwan
stock market, which has close inter-connection with Mainland
China, is another example worth observing (Lai, Fan, Huang, &
Chang, 2009; Lin & Yeh, 2009).
Many algorithm methods are used to predict stock prices.
Examples are the articial neural networks (ANN) (Desai & Bharati, 2007; Kim & Shin, 2007; Pino, Parreno, Gomez, & Priore,
2008; Zhu, Wang, Xu, & Li, 2008), Fuzzy (Khashei, Hejazi, & Bijari,
2008; Lee & Kim, 2007), or other statistical or forecasting meth-

Tel.: +886 973 519943.


E-mail address: china@mis.ccu.edu.tw
0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.05.063

ods (Chen, Gou, Guo, & Gao, 2008; Hu & He, 2007; Ince & Trafalis,
2008). All these methods attempt to predict stock prices under
different market and economic conditions, and of them, ANN
has produced rather good outcomes and has been the favourite
method for many.
Ou and Wang (2009), and Lai et al. (2009) believe that the decision tree (DT) method is good for forecasting stock prices. Levin
and Zahavi (2001) found that problem-correlation using DT is
much clearer than traditional methods. In fact, DT is a very good
forecasting method. The Bayes Theorem may be used as a basis
for scientic forecasts. However, DT studies have focused on
commercial activities (Aitkenhead, 2008; Reyck, Degraeve, &
Vandenborre, 2008). In recent years, there has been a lack of research in the prediction of stock prices using both DT and ANN,
and comparing the results of one method against the other. Most
ANN studies have focused on its evolution and improvement
(Ihme, Marsden, & Pitsch, 2008; Lin & Yeh, 2009; Paliwal & Kumar,
2009), and integrated fuzzy models on the forecasting of stock
prices (Khashei et al., 2008; Lai et al., 2009). This study aims to ll
the research gap by adopting a broader approach through in-depth
empirical studies.
The study adopts a hybrid model, using ANN and DT as the
foundation, to forecast stock prices. Try to nd out if this model
produces better forecasts of stock prices, compared to the earlier
two methods. Hence, the results of forecasted stock prices using
the three abovementioned methods (ANN, DT and hybrid model)
are compared against each other to nd out the differences. In
doing so, we can see if the results of our forecasts match our expectations, and discover the most stable model.

T.-S. Chang / Expert Systems with Applications 38 (2011) 1484614851

2. Digital game content stocks


Taiwan began promoting the digital content industry around
the year 2000. Taiwan consumers were receptive to digital games.
In addition some companies were early investors in Mainland China. The absence of a language barrier allowed them to have higher
overseas revenue; hence, good performance in stock prices. However, no stock price index is available for reference for this stock
cluster. Instead, they are classied as information software
stocks.
A stock price index comprises prices of stocks of companies in
similar businesses; hence, price movements of stocks tend to affect
one another. And investors can determine uctuations against each
other of stock prices of companies in relevant clusters from the key
constituent indices. However, an industry stock index may not be
available for some industries, especially in emerging stock markets,
because of their lack of maturity, and few listings. As a result,
investors can rely only on subjective norms and personal perception, and are unable to achieve better forecasting of stock prices
(Hong, Torous, & Valkanov, 2007). Hence, to have a better grasp
of digital game content stocks in Taiwan, we need to observe the
business composition of a company, and based on prior expert consultations, determine if a company is indeed in the business of digital game content.
There is no precise denition for the term digital game content
stocks. But the term digital game stocks dened in the Taiwan
stock market basically includes manufacturing of games spare
parts and components, and online games. Manufacturing of games
spare parts and components does not derive its main income from
game content; instead, income is generated mainly from the production of parts and components of console games. Examples of
OEM manufacturers of console games are PixArt Imaging, Delta
Group, and Genius. However, digital game stocks of this nature
that are mainly stocks of OEM manufacturers are not included in
this study. On the other hand, as summer vacation approaches
every year, most investors would think of digital content game
stocks along the lines of online games, PC games or arcade games,
most of which are either information software or actually contain
designed game content. These are the digital game content companies whose stocks will be studied herein.
3. Literature review
The study uses the data mining method to derive actual forecasted stock prices. Data mining is concerned with the development and applications of algorithms for discovery of a priori
unknown relationships. Han and Kamber (2006, p. 7) denes data
mining as the process of discovering interesting knowledge from
large amounts of data stored in databases, data warehouses, or
other information repositories. Berry and Linoff (1997) point out
that data mining is the exploration and analysis, by automatic or
semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules, and to establish effective
models and rules. Data mining is currently widely used in many
applications, as users attempt to discover patterns or predict the
future using historical data.
The data mining methods used in this study are mainly ANN
and DT in order to derive the most relevant results. As for the hybrid model, it is developed by integrating the results from ANN and
DT analyses, to deduce a new model.
3.1. Articial neural networks
Articial neural networks, or ANN, is a popular prediction tool.
It is a technique that simulates the learning process of biological
neural networks, by developing models from extremely complex

14847

and non-linear formulae. By using different variables and assumed


parameters, it trains a neural network to perform better analysis
and predictions.
ANN has been fruitful in predicting stock prices. Antonio, Claudio, Manuel, and Nelson (1996) was 63.3% accurate in their prediction of the range of rises of San Diegos stock market, and 74.7%
accurate in falls. When Steiner and Wittkemper (1997) predicted
the European stock market between 1991 and 1997, the accuracy
of their predictions was generally efcient. When Shachmurove
and Witkowska (2000) compared results using the Ordinary Least
Squares method (OLS) and ANN, they found ANN a better prediction tool. Dutta et al. (2006) predicted the Indian stock market with
ANN, and found that the root mean square error (RMSE) and mean
absolute error (MAE) are in line with the expected smallness of error. Zhu et al. (2008) forecasted the NASDAQ, DJIA and STI indices,
and found the ANN model to have good prediction performance.
Based on previous papers written, there have been good researches and recommendations on the use of ANN as a prediction
method for stock prices. Among the ANN learning models, the
back-propagation neural network (BPN) is most popular and
widely used. Most users of BPN would have to carry out multiple
tests to arrive at a better model. Hence, there are many methods
that help optimize the network and determine parameters. Multiple, Prune and K-means are highly accurate methods with fast responses, and are often used to train the ANN. Thus, this study will
use BPN as a prediction model of stock prices.
3.2. Decision tree
Modeled after the structure of a tree, DTs are able to provide a
good explanation applicable to the prediction of stock prices, and
interpret problems very much according to the principles of mathematical and statistical principles (Brida & Risso, 2010). DT is a
fairly mature technique which includes models such as C5.0,
C&RT, CHAID, QUEST and ID3, the difference of which is in the derivations of formula, such as entropy and information gain, to determine the attributes that result in splitting. However, most decision
trees consist of many nodes, which, under certain circumstances,
hinder analysis or interpretation of information (Aitkenhead,
2008; Ture, Tokatli, & Kurt, 2009). The classication and regression
trees (CART or C&RT) method of Breiman, Friedman, Olshen, and
Stone (1984) generates binary decision trees. In the real world,
the chances of biased binary outcomes are few, but the binary
method allows for easy interpretation and analysis (Ture et al.,
2009; Yang et al., 2003). Thus, the study uses the binary method
in the DT prediction of stock prices, and attempts to deduce a correlated prediction factor.
4. Experiment design
4.1. Data collection and methods
By using digital game content stocks in Taiwan as the sample. In
terms of the choice of digital game stocks, the key words game
stock were keyed into the UDNDATA online database to search
for information between 1 January 2008 and 31 May 2009, and
game stocks as described in the database were extracted. Through
pre-testing, we interviewed managers of securities companies and
investors who traded more than US$50,000 in May 2009. Hence,
every game stock was veried by a total of 30 experts and investors. We then compiled and screened the results, and selected 10
game stocks as shown in Table 1.
The 10 game stocks are traded in the OTC Exchange of Taiwan,
and have fairly similar IPO backgrounds. Daily closing stock prices
of each of the 10 stocks were gathered from the Gre Tai Securities
Market (GTSM) database. Key information collected includes: daily

14848

T.-S. Chang / Expert Systems with Applications 38 (2011) 1484614851


Table 1
Digital content stock corporation.
Stock ID

Name of corporation

(3064)
(3083)
(3086)
(3293)
(3546)
(4415)
(5478)
(6111)
(6169)
(6180)

Astro
Chinesegamer
Wayi
International Games System (IGS)
UserJoy
Mega Biotech & Electronics
Soft-World
SoftStar
Interserv
Gamania

closing OTC index and the daily closing prices of each stock for the
period between 1 July 2008 and 31 June 2009. A total of 5229 records were extracted. The unit cluster for all data (for each stock
and OTC index) is based on the trading date. Finally, 249 cases
were selected as our data set.
4.2. Symbol description
To better describe the results of our data analysis, mathematical
symbols and alphabetical representations are used for simplication purposes. The symbols represent the following descriptions:
(Stockid): Stock code, a numerical value. For example, 3064 represents the company Astro.
S (stockid): Current day closing price of a stock. The opening
price of a stock cannot be used as the benchmark for inputting
or analysis, as it does not represent its previous days closing
price due to uncertainties during the period when the market
is closed for trading. At the same time, comparing the days
closing price with the previous days closing price provides an
indication of the magnitude of any rise or fall. Hence, the daily
closing price is used as the parameter for our analysis. In addition, when collecting information on stock prices, the days
closing price is treated as the same closing price of the previous
day if no real transaction is concluded on that day. As such, the
previous days closing price is indicated as Sday1(stockid).
O: OTC index. Refers to the current day closing OTC index; OTC
index of the previous day is represented by Oday1.
P(stockid): Current days predicted stock closing price.
PER Prediction error ratio, the absolute value of the sum of the
current days predicted stock closing price divided by the current days closing price minus 1. A smaller ratio indicates a
more accurate and better prediction. Its mathematical formula
is shown as follows:




Pstockid
 1
PER
Sstockid

4.3. Experimental procedure


Data mining is carried out via SPSS Clementine11.1, where ANN
and C&RT analyses are performed. SPSS15.0 is then used to develop
a regression model. Prediction analysis is based on three methods,
the procedures of each is described as follows:
(i) Articial neural networks: ANN is a type of articial intelligence. We used a standard three-layer fully connected
back-propagation neural network. Input layer nodes represent the previous days stock closing prices and the previous
days OTC closing index (Oday1). Output layer nodes represent the current days stock closing prices. Supervised learn-

ing is used to train the back-propagation neural network,


and current days actual stock prices to correct the model
to arrive at a better prediction. The pruning method is used
to train, assuming that hidden layer 1 is 12, in line with the
selection criteria for the basic hidden layer (input layer + output layer)/2). As training progresses, the worst units in
the hidden layers and input layer will be pruned, and the
nal outcome generated will be better than other training
methods. The stopping rule is triggered by selecting the best
training mode (default). After ANN analysis, we observed if
the estimated accuracy value is greater than 90%; if so, we
repeated our observation of the relative importance of
inputs. As the higher the value of the relative importance
of inputs represent a higher level of importance, while the
values vary with the stock; hence, to avoid differences in
the input variables of the stock prediction model, the values
are arranged in order, and the rst three values are taken as
the independent variables (X) for the subsequent models
used for predicting the dependent variables (Y) of future
stock prices. This study employs the simple regression
model using the forced entry method to validate the accuracy of the forecasted stock prices.
(ii) C&RT: C&RT is used in the prediction tree models. C&RT does
not require a long time to train the model, and its interpretation is comparatively easy to understand. Binary nodes
will be generated from analyzing the C&RT results. Assuming that the initial binary node (node 0) is X1, and the parent
node further splits into two child nodes (X2, X3). If the tree
depth is 2, there will be X1, X2, . . . , X7, i.e. a total of seven signicant reference entities. The study uses these seven reference entities (X1, X2, . . . , X7) as the input variable (X) for
subsequent prediction models for predicting future stock
prices (Y). However, there may be duplication among these
seven reference entities due to different information for
the various stocks. Thus, the nal number of input variables
(X) may not be always the same.
(iii) Hybrid model: Besides the above two algorithms, the study
also proposes a hybrid model to predict stock prices. The
hybrid model does not consist of complex algorithms. First,
it derives prediction models from the original ANN and
C&RT models, and carries out union operations on these
two models. As such, the independent variable included in
this models regression equation will be more than the number of independent variables in other algorithm methods.
Where the least number of independent variables exist, the
hybrid model may be similar to the original ANN or C&RT
models.
Finally, we studied relative prediction performance and absolute prediction performance to see if they match our expectations,
so as to verify the accuracy of the model. Relative predicted results
refer mainly to the comparison between the three algorithm methods, to determine which of the algorithm methods produces relatively stable predictions and are closest to the actual stock prices.
Absolute prediction performance refers to the algorithm method,
whose predicted outcomes are the most accurate, and which could
predict with precision the closing prices of the current days stock
prices.

5. Experimental analysis and results


5.1. Articial neural networks analysis
Taking Stock Code 3064 as an example. Using ANN analysis, we
found the estimated accuracy to be 96.58%, and the relative

T.-S. Chang / Expert Systems with Applications 38 (2011) 1484614851

importance of the top three gures is represented as X1, X2, X3.


Regression model is

Y 0:87 X 1 0:198 X 2 0:138 X 3 0:73:


To enable clearer and easier calculation of the stock prediction
model, based on the previous symbolic representations, we rewrite the model as:

S3064 0:870 Sday1 3064 0:198 Sday1 3293

14849

whose regression models are identical to the original ANN prediction model. The reason that the regression models for (3546) and
(6180) are the same as the ANN model is because there are fewer
independent variables in the C&RT models. The hybrid model does
not consist of any regression model that is the same as those in the
C&RT model. Although the C&RT model assumes seven independent variables for observation, we found more independent variables for prediction by the ANN model, compared to the C&RT
model.

0:138 Sday1 3083 0:73


5.4. Comparative analysis of performance
(refer to Table 2 for the full list).
From Table 2, the estimated accuracy is greater than 95%, indicating the good tness from ANN training. Also, R2 > 0.9 and
p < 0.05 applies to all regression models. Thus, the model may be
used for the prediction of stock prices.
5.2. C&RT analysis
For DT analysis, the tree diagram of each stock is rst drawn.
Based on a tree depth of 2, we obtain seven independent variables
X1, X2, . . . , X7. After screening the duplicated independent variables, the nal regression models are shown in Table 3. All models
are found with R2 > 0.9 and p > 0.05, which match our expectations.
In this study, at least two independent variables and not more than
four independent variables will be retained. Among the stocks, the
prediction model of (3064) is the same as the ANN predicted model, which indicates that the hybrid model will be the same.
5.3. Hybrid model analysis
Finally, the rst two models are combined to obtain the new hybrid model (refer to Table 4). The hybrid model shows R2 > 0.9 and
p < 0.05. The (3083) regression model incorporates more independent variables. (3064), (3546) and (6180) are the three stocks

In the actual prediction of stock prices, observation is done in


two parts: relative prediction performance, and absolute prediction performance.
Relative prediction performance looks at the stability of the prediction. Observation dates are 1 July to 14 August 2009. A total of
32 sets of data are used to verify the prediction capability of the
model. Because of the difculties in delivering 100% accuracy in
predicting the current days closing stock prices, whether comprising a rise or a fall, we use prediction error ratio (PER) to measure
the prediction performance.
As the PERs of different stocks differ, and in order to avoid
inconsistent measurements, the PER is used to derive the data
range, from which, we calculate the 20%-value closest to the minimum value (implies prediction is closer to accuracy) and rate it as
A, values falling between 20% and 50% are rated B, and values
greater than 50% are rated C. The more A ratings there are, the
greater the number of accurately predicted values, the greater the
stability. Please refer to Fig. 1 for details.
According to Fig. 1, ANN outperforms both C&RT (97 > 79) and
the hybrid model (97 > 88) in terms of overall prediction results.
On the other hand, the hybrid model did better than ANN in terms
of stability (88 > 79). The results show that the three models produce clearly different prediction results.

Table 2
ANN of forecasting regression mode for different stock.

Stock ID

Estimated accuracy (%)

Regression model

R2

(3064)
(3083)
(3086)
(3293)
(3546)
(4415)
(5478)
(6111)
(6169)
(6180)

96.58
98.45
96.35
97.63
98.34
98.75
97.65
97.72
98.75
98.45

S(3064) = 0.870 Sday1(3064) + 0.198 Sday1(3293) + (0.138) Sday1(3083) + 0.73


S(3083) = 1.047 Sday1(3083) + (0.022) Sday1(4415) + (0.031) Sday1(5478) + 2.212
S(3086) = 0.900 Sday1(3086) + (0.139) Sday1(5478) + (0.053) Sday-1(3083) + 0.925
S(3293) = 0.924 Sday1(3293) + 0.017 Sday1(6180) + 2.488
S(3546) = 1.05 Sday1(3546) + 0.087 Sday1(5478) + 0.034 Sday1(3293) + 5.978
S(4415) = 0.933 Sday1(4415) + 0.07 Sday-1(3083) + (0.003) Sday1(3293)  1.473
S(5478) = 0.972 Sday1(5478) + 0.08 Sday1(3293) + 0.133 Sday1(3083) + 4.194
S(6111) = 0.915 Sday1(6111) + 0.107 Sday1(6169) + (0.026) Sday1(3083) + 0.789
S(6169) = 1.021 Sday1(6169) + (0.014) Sday1(4415) + (0.015) Sday1(5478) + 1.204
S(6180) = 0.938 Sday1(6180) + 0.017 Sday-1(5478) + 0.048 Sday1(6169) + 0.187

.962*
.993*
.948*
.981*
.992*
.996*
.983*
.988*
.993*
.991*

p < 0.05.

Table 3
C&RT of forecasting regression model for different stock.

Stock ID

Regression model

R2

(3064)
(3083)
(3086)
(3293)
(3546)
(4415)
(5478)
(6111)
(6169)
(6180)

S(3064) = 0.870 Sday1(3064) + 0.198 Sday1(3293) + (0.138) Sday1(3083) + 0.73


S(3083) = 0.972 Sday1(3083) + (0.016) Sday1(6169) + (0.003) Sday1(3546) + 0.007 Sday1(3293) + (1.710)
S(3086) = (0.032) Sday1(6180) + 0.918 Sday-1(3086) + 0.105 Sday1(5478) + 0.619
S(3293) = 0.001 Sday1(6180) + 0.985 Sday1(3293) + 0.007 Sday1(6111) + 1.495
S(3546) = 0.987 Sday1(3546) + (0.087) Sday1(3293)  1.335
S(4415) = 0.968 Sday1(4415) + 0.033 Sday1(3546)  0.611
S(5478) = 0.053 Sday1(3546) + 0.776 Sday1(5478) + 0.1 Sday1(6169) + 0.082 Sday1(3293) + 6.349
S(6111) = 0.913 Sday1(6111) + 0.074 Sday1(6169) + 0.014 Sday1(3293) + 0.355
S(6169) = 0.989 Sday1(6169) + 0.010 Sday1(3293)  0.656
S(6180) = 0.949 Sday1(6180) + 0.054 Sday1(6169) + 0.436

.962*
.993*
.948*
.981*
.992*
.995*
.984*
.988*
.993*
.991*

p < 0.05.

14850

T.-S. Chang / Expert Systems with Applications 38 (2011) 1484614851

Table 4
Hybrid model of forecasting regression model for different stock.
Stock ID

Regression
model

R2

(3064)
(3083)

S(3064) = 0.870 Sday1(3064) + 0.198 Sday1(3293) + (0.138) Sday1(3083) + 0.73


S(3083) = 1.037 Sday1(3083) + 0.033 Sday1(6169) + 0.011 Sday1(3546) + 0.049 Sday1(3293) +
(0.009) Sday1(4415) + (0.117) Sday1(5478) + 6.685
S(3086) = 0.9 Sday-1(3086) + 0.139 Sday1(5478) + (0.052) Sday1(3083) + (0.001)
Sday1(6180) + 0.632
S(3293) = 0.986 Sday1(3293) + 0.065 Sday1(6169) + (0.009) Sday1(6180) + (0.048) Sday1(6111) + 4.371
S(3546) = 1.05 Sday1(3546) + 0.087 Sday1(5478) + 0.034 Sday1(3293) + 5.978
S(4415) = 0.933 Sday1(4415) + (0.002) Sday1(3546) + 0.071 Sday1(3083) + (0.002) Sday1(3293)  1.509
S(5478) = 0.729 Sday1(5478) + 0.03 Sday1(3546) + 0.099 Sday1(3293) + 0.089 Sday1(6169) + 0.066 S
day1(3083) + 7.173
S(6111) = 0.946 Sday1(6111) + 0.044 Sday1(6169) + 0.047 Sday1(3293) + (0.041) Sday1(3086) + 1.019
S(6169) = 1.057 Sday1(6169)+0.074 Sday1(3293) + 0.006 Sday1(4415) + (0.13) Sday1(5478) + 2.692
S(6180) = 0.938 Sday1(6180) + 0.017 Sday1(5478) + 0.048 Sday1(6169) + 0.187

.962*
.994*

(3086)
(3293)
(3546)
(4415)
(5478)
(6111)
(6169)
(6180)
*

.948*
.981*
.992*
.996*
.984*
.989*
.993*
.991*

p < 0.05.

Total score

Rating: A

150

132

130

125
100

97

Rating: B

129

109
88

93

Rating: C

103

79

75
50
25
0
A NN

C&RT

Hy b rid M o d el

Fig. 1. Bar graph of stock price prediction performance ratings.

Total prediction
accuracy
51

ANN

C&RT

Hybrid Model

49

48
45
45

44

42
39
ANN

C&RT

Hybrid
Model

Fig. 2. Stock price total prediction accuracy of different methods.

It does not represent absolute prediction performance in Fig. 1.


Hence, we need to calculate the absolute prediction performance,
and the closer the prediction error ratio is to zero, the greater
the degree of accuracy. Daily opening prices uctuate according
to the median value of the previous days closing prices. Current
days price intervals will see a largest rise or largest fall of 7% from
the median value, which means that the current days greatest
magnitude in price uctuation could be as high as 14%, and more
than 14% if compared to the previous days prices. The study is
based on PER being smaller than 0.02, i.e. a valid prediction is when

the predicted stock prices differ from the actual stock prices by
only 2%. The total number of valid predictions divided by the total
number of predictions is the actual prediction performance (refer
to Fig. 2).
We can therefore deduce that using ANN to predict the closing
prices of stocks is more in line with actual stock prices. A total of
49 data sets are accurately predicted. Therefore, for ANN, there is
a 15.31% probability that a valid prediction will occur across all
samples; C&RT is 14.06%; the hybrid model is 13.75%. This is different from prediction stability. Although the total prediction accuracy
for C&RT and the hybrid model differs only by 1, compared to the
hybrid model, C&RT produces a more even rating distribution
among the 10 sample stocks. Hence, we believe that prediction
accuracy is better for C&RT than the hybrid model.

6. Conclusion
Between the three models, ANN demonstrates the greatest stability and accuracy at predicting stock prices during the post-crisis
period. And although C&RT is less stable than the hybrid model, it
is more accurate. The hybrid model proposed by the study clearly
produces different prediction results. This implies that integrated
use of C&RT and ANN require improved efciency and further
development and more in-depth study is necessary.
By simply using the closing prices of stocks, the study has managed to discover stocks with good investment value. However, the
ever-changing market also implies a broad spectrum of impact factors. In terms of stock analysis, data mining is an effective tool for
discovering the correlation between stocks of a different cluster, so

T.-S. Chang / Expert Systems with Applications 38 (2011) 1484614851

as to provide investors with a simple method for predicting stock


prices. For future studies, other stock selection criteria may be included, and when combined with the characteristics of the relevant
algorithm methods, we could develop the most appropriate prediction criteria. However, although predicted stock prices may be
used as a reference, they are not the main deciding factors. Prudent
investment decisions must be supported by a companys nancial
statements.
There are some limitations to the study: rst, samples collected
are for the period of July 2008June 2009 (one year). However, this
is not a complete scal year, and the time period is relatively short.
Nonetheless, it may be viewed as a full business cycle. Inclusion of
stock prices of the past 35 years would upset the original sample.
On the contrary, during the period of data collection, the stock
market exhibited fairly large uctuations. Difculty of predictability also implies more meaningful results. Secondly, in verifying the
models performance, we used 32 days as the actual verication
period for accuracy. We took into consideration the impact of the
markets movements on the changes to the prediction models,
and that the original prediction model might lose its predictive
ability. Thirdly, the samples were digital content stocks. The high
correlation between companies businesses results in a rather high
R2 value. If all OTC stocks were used as input factors for our prediction, R2 would have been lower. Also, different stock markets may
result in different outcomes.

References
Aitkenhead, M. J. (2008). A co-evolving decision tree classication method. Expert
Systems with Applications, 34(1), 1825.
Albuquerque, R. A., Francisco, E. De., & Marques, L. B. (2008). Marketwide private
information in stocks: Forecasting currency returns. Journal of Finance, 63(5),
22972343.
Antonio, G. B., Claudio, O. U., Manuel, M. S., & Nelson, O. M. (1996). Stock market
indices in Santiago de Chile: Forecasting using neural networks. In Proceedings
of the 1996 IEEE international conference on neural networks (pp. 21722175).
Washington, DC: IEEE Press.
Berry, M. J. A., & Linoff, G. (1997). Data mining techniques: For marketing, sales, and
customer support. New York: Wiley.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classication and
regression trees. Belmont, CA: Wadsworth.
Brida, J. G., & Risso, W. A. (2010). Hierarchical structure of the German stock market.
Expert Systems with Applications, 37(5), 38463852.
Chen, F., Gou, C., Guo, X., & Gao, J. (2008). Prediction of stock markets by the
evolutionary mix-game model. Physica A, 387(14), 35943604.
Desai, V. S., & Bharati, R. (2007). The efcacy of neural networks in predicting
returns on stock and bond indices. Decision Sciences, 29(2), 405423.
Dutta, G., Jha, P., Laha, A. K., & Mohan, N. (2006). Articial neural network models
for forecasting stock price index in the Bombay stock exchange. Journal of
Emerging Market Finance, 5(3), 283295.

14851

Fidrmuc, J., & Korhonen, I. (2009). The impact of the global nancial crisis on
business cycles in Asian emerging economies. Journal of Asian Economics.
doi:10.1016/j.asieco.2009.07.007.
Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (second ed.). San
Francisco, CA: Morgan Kaufmann.
Hong, H., Torous, W., & Valkanov, R. (2007). Do industries lead stock markets?
Journal of Financial Economics, 83(2), 367396.
Hu, C., & He, L. T. (2007). An application of interval methods to stock market
forecasting. Reliable Computing, 13(5), 423434.
Ihme, M., Marsden, A. L., & Pitsch, H. (2008). Generation of optimal articial neural
networks using a pattern search algorithm: Application to approximation of
chemical systems. Neural Computation, 20(2), 573601.
Ince, H., & Trafalis, T. B. (2008). Short term forecasting with support vector
machines and application to stock price prediction. International Journal of
General Systems, 37(6), 677687.
Kim, H. J., & Shin, K. S. (2007). A hybrid approach based on neural networks and
genetic algorithms for detecting temporal patterns in stock markets. Applied
Soft Computing, 7(2), 569576.
Khashei, M., Hejazi, S. R., & Bijari, M. (2008). A new hybrid articial neural networks
and fuzzy regression model for time series forecasting. Fuzzy Sets and Systems,
159(7), 769786.
Lai, R. K., Fan, C. Y., Huang, W. H., & Chang, P. C. (2009). Evolving and clustering
fuzzy decision tree for nancial time series data forecasting. Expert Systems with
Applications, 36(2P2), 37613773.
Lee, K. C., & Kim, W. C. (2007). Integration of human knowledge and machine
knowledge by using fuzzy post adjustment: Its performance in stock market
timing prediction. Expert Systems, 12(4), 331338.
Levin, N., & Zahavi, J. (2001). Predictive modeling using segmentation. Journal of
Interactive Marketing, 15(2), 222.
Lin, C. T., & Yeh, H. Y. (2009). Empirical of the Taiwan stock index option price
forecasting model Applied articial neural network. Applied Economics, 41(15),
19651972.
Ou, P., & Wang, H. (2009). Prediction of stock market index movement by ten data
mining techniques. Modern Applied Science, 3(12), 2842.
Paliwal, M., & Kumar, U. A. (2009). Neural networks and statistical techniques: A
review of applications. Expert Systems with Applications, 36(1), 217.
Pino, R., Parreno, J., Gomez, A., & Priore, P. (2008). Forecasting next-day price of
electricity in the Spanish energy market using articial neural networks.
Engineering Applications of Articial Intelligence, 21(1), 5362.
Reyck, B. De., Degraeve, Z., & Vandenborre, R. (2008). Project options valuation with
net present value and decision tree analysis. European Journal of Operational
Research, 184(1), 341355.
Shachmurove, Y., & Witkowska, D. (2000). Utilizing articial neural network model to
predict stock markets (CARESS working paper #00-11). Philadelphia, PA:
University of Pennsylvania.
Steiner, M., & Wittkemper, H. G. (1997). Portfolio optimization with a neural
network implementation of the coherent market hypothesis. European Journal
of Operational Research, 100(1), 2740.
Stock, J. H., & Watson, M. W. (2007). Why has US ination become harder to
forecast? Journal of Money, Credit and Banking, 39(S1), 333.
Ture, M., Tokatli, F., & Kurt, I. (2009). Using KaplanMeier analysis together with
decision tree methods (C&RT, CHAID, QUEST, C4 5 and ID3) in determining
recurrence-free survival of breast cancer patients. Expert Systems with
Applications, 36(2P1), 20172026.
Yang, C. C., Prasher, S. O., Enright, P., Madramootoo, C., Burgess, M., Goel, P. K., et al.
(2003). Application of decision tree technology for image classication using
remote sensing data. Agricultural Systems, 76(3), 11011117.
Zhu, X., Wang, H., Xu, L., & Li, H. (2008). Predicting stock index increments by neural
networks: The role of trading volume under different horizons. Expert Systems
with Applications, 34(4), 30433054.

You might also like