You are on page 1of 8

Applications of Data Mining in the Banking Sector

Abstract
In todays globalization and cut throat competition the banks are struggling to gain a
competitive edge over each other. Apart from execution of business processes, the creation of
knowledge base and its utilization for the benefit of the bank is becoming a strategy tool to
compete. In recent years the ability to generate, capture and store data has increased
enormously. The information contained in this data can be very important. The wide
availability of huge amounts of data and the need for transforming such data into knowledge
encourage IT industry to use data mining. The banking industry around the world has
undergone a tremendous change in the way business is conducted. The banking industry has
started realizing the need of the techniques like data mining which can help them to compete
in the market. Leading banks are using Data Mining (DM) tools for customer segmentation
and profitability, credit scoring and approval, predicting payment default, marketing,
detecting fraudulent transactions, etc. This paper provides an overview of the concept of DM
and highlights the applications of data mining to enhance the performance of some of the
core business processes in banking industry.
1. Introduction
Technological innovations have aided the banking industry to open up proficient delivery
channels.
IT has helped the banking industry to deal with the challenges the new economy poses.
Nowadays, Banks have realized that customer relationships are a very important factor for
their success. Customer relationship management (CRM) is a strategy that can help them to
build long-term associations with their customers and increase their revenues and profits.
CRM in the banking sector has a great significance. The CRM focus is shifting from
customer acquisition to customer retention and ensuring the appropriate amounts of time,
money and managerial resources are directed at both of these key tasks. The challenge the
bank face is how to retain the most profitable customers and how to do that at the lowest cost.
At the same time, they need to find and implement this solution quickly and the solution to be
flexible. Traditional methods of data analysis have long been used to detect fraud. They
require difficult and time-consuming investigations that deal with different realms of
knowledge like financial, economics, business practices and law. Fraud instances can be
similar in content and appearance but usually are not identical. In developing countries like
India, Bankers face more problems with the fraudsters.
2. Literature Review
Data Mining is the process of extracting hidden, unknown, valid and actionable information
from large databases and then using this information to make crucial business decisions.
Previously unknown means quantities that are not hypothesized in advance, valid means if a
large collection of data is scrutinized; patterns are not there may be found, Actionable means
Action that must be translated into some business advantage (Han et al., 2011). Data mining
is the application of statistical and machine-learning techniques for extracting interesting
patterns from raw data (Hsu et al., 2012). Data Mining referred as knowledge mining from
data or knowledge extraction or data/pattern analysis or data archaeology or data dredging. It
turns a large collection of data into knowledge (Liao et al., 2012). With the mounting growth
of data in every application, data mining meets the valuable and efficient requirements for
effective, scalable and flexible data analysis. Data Mining is the process of identifying and
discovering the interesting patterns from massive amount of data (Mabroukeh and Ezeife,
2010). Data Mining can be conducted on any kind of data as long as the data are meaningful

for a target application. Data Mining can be considered as a natural evaluation of information
technology and a confluence of several related disciplines and application domains. (Blake
and Mangiameli, 2011)
Vivek Bhambhri in his paper Application of Data Mining in Banking Sector said that Data
Mining techniques can be of immense help to the banks and financial institutions in this
arena for better targeting and acquiring new customers, fraud detection in real time,
providing segment based products for better targeting the customers, analysis of the
customers purchase patterns over time for better retention and relationship, detection of
emerging trends to take proactive approach in a highly competitive market adding a lot more
value to existing products and services and launching of new product and service bundles.
Data mining has wide application domain almost in every industry where the data is
generated thats why data mining is considered one of the most important frontiers in
database and information systems and one of the most promising interdisciplinary
developments in Information Technology.
Dr. Madan Lal Bhasin in his paper Data Mining: A Competitive Tool in the Banking and
Retail Industries concluded that Data mining is a tool used to extract important information
from existing data and enable better decision-making throughout the banking and retail
industries. They use data warehousing to combine various data from databases into an
acceptable format so that the data can be mined. The data is then analysed and the
information that is captured is used throughout the organisation to support decision-making.
3. Data Mining
Data mining is a knowledge discovery process. It helps us understand the substance of the
data in a special unsuspected way. It unearths patterns and trends in the raw data we never
knew existed. Data mining centres around the automated discovery of new facts and
relationships in data. With traditional query tools, we search for known information. Data
mining tools enable us to uncover hidden information. The assumption is that more useful
knowledge lies hidden beneath the surface.
Data might be one of the most valuable resources of any bank but only if it knows how to
expose valuable knowledge hidden in raw data. Data mining allows extracting knowledge
from the historical data, and predicting outcomes of future situations. It helps optimize
business decisions, increase the value of each customer and communication, and improve
customer satisfaction.

4. Data Mining Algorithms and Techniques


There are several data mining techniques and algorithms have been developed and used in
data mining. Many data mining practitioners seem to agree on a set of data mining functions
that can be used in specific application areas. Various data mining techniques are applicable
to each type of function. These techniques consist of the specific algorithms that can be used
for each function. The following figure shows the application areas, examples of mining
functions, mining processes, and mining techniques.

We will now briefly examine some of these data mining techniques.


4.1 Cluster detection
Clustering means forming groups. The clustering helps us take specific and proper action for
the individual pieces that make up the cluster. Clustering or cluster detection is one of the
earliest data mining techniques. This technique is designated as undirected knowledge
discovery or unsupervised learning.
In the cluster detection technique, we do not search pre-classified data. No distinction is made
between independent and dependent variables. The cluster detection algorithm searches for
groups or clusters of data elements that are similar to one another. We expect similar
customers or similar products to behave in the same way. Then we can take a cluster and do
something useful with it. There is one important aspect of clustering that should be noted.
When the mining algorithm produces a cluster, we must understand what that cluster means
exactly. Only then we will be able to do something useful with that cluster. A bank may get as
many as twenty clusters but be able to interpret the meanings of only two. But the return for

the bank from the use of just these two clusters may be enormous enough so that they may
simply ignore the other eighteen clusters.
If there are only two or three variables or dimensions, it is fairly easy to spot the clusters,
even when dealing with many records. But if we are dealing with 500 variables from 100,000
records, you need a special tool. The most common clustering techniques are the K-nearest
neighbour, the Nave Bayes technique and self-organizing maps.
4.2 Decision Trees
This technique applies to classification and prediction. The major attraction of decision trees
is their simplicity. By following the tree, we can decipher the rules and understand why a
record is classified in a certain way. Decision trees represent rules. We can use these rules to
retrieve records falling into a certain category.
A decision tree represents a series of questions. Each question determines what follow-up
question is best to be asked next. Good questions produce a short series. Trees are drawn with
the root at the top and the leaves at the bottom, an unnatural convention. The question at the
root must be the one that best differentiates among the target classes. A database record enters
the tree at the root node. The record works its way down until it reaches a leaf. The leaf node
determines the classification of the record.
The decision tree algorithms build the trees in the following manner. First, the algorithm
attempts to find the test that will split the records in the best possible manner among the
wanted classifications. At each lower level node from the root, whatever rule works best to
split the subsets is applied. This process of finding each additional level of the tree continues.
The tree is allowed to grow until we cannot find better ways to split the input records.
4.3 Memory-Based Reasoning
We are all good at making decisions on the basis of our experiences. We depend on the
similarities of the current situation to what we know from past experience. We use the
experience to solve the current problem by identifying similar instances in the past, then we
use the past instances and apply the information about those instances to the present. The
same principles apply to the memory-based reasoning (MBR) algorithm.
MBR uses known instances of a model to predict unknown instances. This data mining
technique maintains a dataset of known records. The algorithm knows the characteristics of
the records in this training dataset. When a new record arrives for evaluation, the algorithm
finds neighbours similar to the new record, then uses the characteristics of the neighbours for
prediction and classification. When a new record arrives at the data mining tool, first the tool
calculates the distance between this record and the records in the training dataset. The
distance function of the data mining tool does the calculation. The results determine which
data records in the training dataset qualify to be considered as neighbours to the incoming
data record. Next, the algorithm uses a combination function to combine the results of the
various distance functions to obtain the final answer. The distance function and the
combination function are key components of the memory-based reasoning technique
4.4 Link Analysis
This algorithm is extremely useful for finding patterns from relationships. If we look at the
business world closely, you clearly notice all types of relationships. Airlines link cities
together. Telephone calls connect people and establish relationships. We notice relationships
everywhere. The link analysis technique mines relationships and discovers knowledge.
Depending upon the types of knowledge discovery, link analysis techniques have three types
of applications: associations discovery, sequential pattern discovery, and similar time
sequence discovery. Let us briefly discuss each of these applications.

a) Associations Discovery. Associations are affinities between items. Association


discovery algorithms find combinations where the presence of one item suggests the
presence of another. The algorithms derive the association rules systematically and
efficiently. The two factors to be interpretedsupport factor and the confidence factor
indicate the strength of the association. Rules with high support and confidence
factor values are more valid, relevant, and useful. Simplicity makes association
discovery a popular data mining algorithm.
b) Sequential Pattern Discovery. As the name implies, these algorithms discover
patterns where one set of items follows another specific set. Time plays a role in these
patterns. When you select records for analysis, you must have date and time as data
items to enable discovery of sequential patterns.
c) Similar Time Sequence Discovery. This technique depends on the availability of time
sequences. In the previous technique, the results indicate sequential events over time.
This technique, however, finds a sequence of events and then comes up with other
similar sequences of events. For example, in retail department stores, this data mining
technique comes up with a second department that has a sales stream similar to the
first.
4.5 Neural Networks
Neural networks mimic the human brain by learning from a training dataset and applying the
learning to generalize patterns for classification and prediction. These algorithms are
effective when the data is shapeless and lacks any apparent pattern. The basic unit of an
artificial neural network is modelled after the neurons in the brain. This unit is known as a
node and is one of the two main structures of the neural network model. The other structure is
the link that corresponds to the connection between neurons in the brain.
4.6 Genetic Algorithm
This technique uses a highly iterative process of selection, cross-over, and mutation operators
to evolve successive generations of models. At each iteration, every model competes with
everyone other by inheriting traits from previous ones until only the most predictive model
survives.
5. Data Mining Applications in Banking
The banking industry across the world has undergone tremendous changes in the way the
business is conducted. With the recent implementation, greater acceptance and usage of
electronic banking, the capturing of transactional data has become easier and,
simultaneously, the volume of such data has grown considerably. It is beyond human
capability to analyses this huge amount of raw data and to effectively transform the data into
useful knowledge for the organization. Data Mining can help by contributing in solving
business problems by finding patterns, associations and correlations which are hidden in the
business information stored in the data bases. By using data mining to analyse patterns and
trends, bank executives can predict, with increased accuracy, how customers will react to
adjustments in interest rates, which customers will be likely to accept new product offers,
which customers will be at a higher risk for defaulting on a loan, and how to make customer
relationships more profitable.
The banking industry is widely recognizing the importance of the information it has about its
customers. Undoubtedly, it has among the richest and largest pool of customer information,
covering customer demographics, transactional data, credit cards usage pattern, and so on. As
banking is in the service industry, the task of maintaining a strong and effective CRM is a

critical issue. To do this, banks need to invest their resources to better understand their
existing and prospective customers. By using suitable data mining tools, banks can
subsequently offer tailor-made products and services to those customers. There are
numerous areas in which data mining can be used in the banking industry, which include
customer segmentation and profitability, credit scoring and approval, predicting payment
default, marketing, detecting fraudulent transactions, cash management and forecasting
operations, optimizing stock portfolios, and ranking investments. In addition, banks may use
data mining to identify their most profitable credit card customers or high-risk loan
applicants. To help bank to retain credit card customers, data mining is used. By analysing the
past data, data mining can help banks to predict customers that likely to change their credit
card affiliation so they can plan and launch different special offers to retain those customers.
Credit card spending by customer groups can be identified by using data mining. Following
are some examples of how the banking industry has been effectively utilizing data mining in
these areas.
5.1 Marketing
One of the most widely used areas of data mining for the banking industry is marketing. The
banks marketing department can use data mining to analyse customer databases. Data
mining carry various analyses on collected data to determine the consumer behaviour with
reference to product, price and distribution channel. The reaction of the customers for the
existing and new products can also be known based on which banks will try to promote the
product, improve quality of products and service and gain competitive advantage. Bank
analysts can also analyse the past trends, determine the present demand and forecast the
customer behaviour of various products and services in order to grab more business
opportunities and anticipate behaviour patterns. Data mining technique also helps to identify
profitable customers from non-profitable ones. The data mining techniques can be used to
determine that how customers will react to adjustments in interest rates, the risk profile of a
customer segment for defaulting on loans.
5.2 Risk Management
Data mining is widely used for risk management in the banking industry. Bank executives
need to know whether the customers they are dealing with are reliable or not. Offering new
customers credit cards, extending existing customers lines of credit, and approving loans can
be risky decisions for banks if they do not know anything about their customers. Banks
provide loan to its customers by verifying the various details relating to the loan such as
amount of loan, lending rate, repayment period, type of property mortgaged, demography,
income and credit history of the borrower. Customers with bank for longer periods, with high
income groups are likely to get loans very easily. Even though, banks are cautious while
providing loan, there are chances for loan defaults by customers. Data mining technique helps
to distinguish borrowers who repay loans promptly from those who don't. Bank executives by
using Data mining technique can also analyse the behaviour and reliability of the customers
while selling credit cards too. It also helps to analyse whether the customer will make prompt
or delay payment if the credit cards are sold to them. Credit scoring, in fact, was one of the
earliest financial risk management tools developed. Credit scoring can be valuable to lenders
in the banking industry when making lending decisions. Data mining can also derive the
credit behaviour of individual borrowers with instalment, mortgage and credit card loans,
using characteristics such as credit history, length of employment and length of residency. A
score is thus produced that allows a lender to evaluate the customer and decide whether the
person is a good candidate for a loan, or if there is a high risk of default. By knowing what
the chances of default are for a customer, the bank is in a better position to reduce the risks.

5.3 Fraud Detection


Another popular area where data mining can be used in the banking industry is in fraud
detection. Being able to detect fraudulent actions is an increasing concern for many
businesses; and with the help of data mining more fraudulent actions are being detected and
reported. Two different approaches have been developed by financial institutions to detect
fraud patterns. In the first approach, a bank taps the data warehouse of a third party and use
data mining programs to identify fraud patterns. The bank can then cross-reference those
patterns with its own database for signs of internal trouble. In the second approach, fraud
pattern identification is based strictly on the banks own internal information. Most of the
banks are using a hybrid approach. One system that has been successful in detecting fraud is
Falcons fraud assessment system. It is used by nine of the top ten credit card issuing banks.
The data mining techniques will help the organization to focus on the ways and means of
analysing the customer data in order to identify the patterns that can lead to frauds.
5.4 Customer Relationship Management
In the era of cut throat competition the customer is considered as the king. Data mining can
be useful in all the three phases of a customer relationship cycle: Customer Acquisition,
Increasing value of the customer and Customer retention. Customer acquisition and retention
are very important concerns for any industry, especially the banking industry. Today
customers have wide range of products and services provided by different banks. Hence,
banks have to cater the needs of the customer by providing such products and services which
they prefer. This will result in customer loyalty and customer retention. Data mining
techniques helps to analyse the customers who are loyal from those who shift to other banks
for better services. If the customer is shifting from his bank to another, reasons for such
shifting and the last transaction performed before shifting can be known which will help the
banks to perform better and retain its customers.
6. Suggestions and Conclusion
To make the process of implementation of data mining techniques easier in the banking
sector, the following framework has been proposed:
A. Identify various application areas where data mining tool can be used.
B. Identify the objectives of these data mining areas.
C. Convert the objectives into data mining objectives
D. Identify various sources of data like Financial statements, audit reports, account
ledgers etc.
E. Clean the data to make it suitable for loading and integrate all the data from various
sources
F. Applying suitable data mining techniques.
G. Evaluating the pattern that emerged from the application of techniques.
H. Performance evaluation by finding error rate or other suitable metrics
I. Convert the results into useful information or knowledge for future use.
Data mining is a tool used to extract important information from existing data and enable
better decision-making throughout the banking and retail industries. They use data
warehousing to combine various data from databases into an acceptable format so that the
data can be mined. The data is then analysed and the information that is captured is used
throughout the organization to support decision-making. Data Mining techniques can be very
helpful to the banks for better targeting and acquiring new customers, fraud detection in real
time, providing segment based products, analysis of the customers purchase patterns over

time for better retention and relationship. Those banks that have realized the usefulness of
data mining and are in the process of building a data mining environment for their decisionmaking process will obtain huge benefit and derive considerable competitive advantage in
future.
7. References
1) Vivek Bhambri Application of Data Mining in Banking Sector, International
Journal of Computer Science and Technology Vol. 2, Issue 2, June 2011
2) Dr. Madan Lal Bhasin, Data Mining: A Competitive Tool in the Banking and Retail
Industries, The Chartered Accountant October 2006
3) Han, J., M. Kamber and J. Pie, Data Mining Concepts and Techniques 3rd Ed.,
Elsevier, Burlington, ISBN-10: 9780123814807, pp: 744., 2011
4) Hsu, F.M., L.P. Lu and C.M. Lin, Segmenting customers by transaction data with
concept hierarchy 2012
5) Liao, S.H., P.H. Chu and P.Y. Hsiao, Data mining techniques and applications-A
decade review from 2000 to 2011. ,2012
6) Mabroukeh, N.R. and C.I. Ezeife, A taxonomy of sequential pattern mining
algorithms, 2010.
7) Blake, R. and P. Mangiameli, The effects and interactions of data quality and
problem complexity on classification, 2011.

You might also like