Data Mining Principles

QUESTION 1:
Data mining is computer software used for the extraction of useful knowledge from the heaps
of data available. Data mining is typically used in retail industry, Banking sector,
mathematics, fraud detection activity.
Use of data mining is mathematics.
According to Patricia B.Cerrito (2003) data mining applications in department of

mathematics. Data mining is used typically in mathematics. Statistics a part of the
mathematics in which all the calculations fluctuate within no time. Among all the fluctuations
some of them are important and these can be collected by using data mining.
Use of data mining is science and technology.
According to data mining for scientific and engineering applications (Robert L. Grossman,
Chandrika kamath,Vipin Kumar and Raju 2001).Computer simulation has been improved, the
computer can be able to produce terra bytes of data in few hours .It takes a long time may be
few weeks or month for a human to extract the useful information in the data.
Data mining is used in computer security
According to data mining in computer security (Daniel Barbara and Sushi Jajodia 2002).The
data dealing with both network and host is very large. There can be many number users that
login into same host and the network. Data mining techniques provide unique candidate to
use it.
Data mining is used in business intelligence
According to data mining for business intelligence (Galit Shmueli, Nitin R.Patel, Peter C.
Bruce2007). There are heaps of data is available in the market for using in a particular
business. Choosing a particular information that is useful for the business from the data.
QUESTION 2:
The whole data mining process is based on the cycle plan, do, check, and act.
Plan: formulate the data

Do: experiment
Check: evaluate the results
Act: implement the results if successful.
For the better understanding about the plan stage it is sub divided into four steps. They are
problem identification, collation of data, data pre-processing, choosing an algorithm
respectively.
Problem identification:
We need to clear idea about the query before choosing the data mining. So that we can the
goals that we need to reach the goals in a better way. It is very hard to set a data mining for
an organisation without knowing the requirements f the organisation.
Collation of data: The data has to be gathered from many sources. As long the data from the
data ware house is sufficient then there is no concept of collation of data. If it is not sufficient
then it has to be collected from the heaps of data available. This is a big task to do.
Data pre-processing: In this step the data that is collected from the data warehouse is
cleaned which means the replacing the missing values and the data is transformed into the
useful form.
Algorithm selection: Trial and error method is the best way to choose an algorithm for a
business. As there are many algorithms are available in the market. Choosing of the best
algorithm for the organisation gives the best result for the organisation.
Data Processing: The collected data from different ways is executed by the computer
according to their procedures. This is the data that has to be processed by the organisation.
Model construction and Evaluation: After the collected data is executed, it is the time
evaluates the results of the data. What are the goals of the organisation and At what level that
the project meet the requirements of the organisation. The organisation should have a good
vision about it.
Taking Action:
Last but not the least this is the major step in any project. If the project did not meet the
requirements of the organisation, this step deals with what went wrong and what are steps to
be taken to meet our the goals.
QUESTION 3:
(a)
In supervised learning, the input attributes of the forecast are the humidity, the barometric
pressure, the temperature.
The output of this is the forecast is windy or possible hail or storms in the night.
(b)
In supervised learning, if a medical practitioner wants to predict the survival rate of the breast
cancer the input attributes for this are
What is the blood pressure of the patient, heart beat of the patient, percentage of
haemoglobin in the blood, strength of the bone.
The outcome of this is the survival of the breast cancer patient is low or high.
(c)
In a supervised learning, The possible inputs if a company wants to identify the fraud cases
to minimise the risks in loan system are. The age of the person, visa status of the person,
monthly or annual income.
The possible outcomes the credit card is rejected due to poor credit history. Or the credit card
is approved.
(d)
In an unsupervised learning, if a super market manager wants to improve the success rate of
the direct mail targeting the possible input attributes are
The name of the person, address o f the person, contact number of the person
The outcome is the super market is responding well for the customers.
(e)
It is a data query.
QUESTION 4:
(A)
The missing values are 5.95, 110.85.
(B)
(C)
Create the note pad.
Identify the attributes and define the relation between them.
Changed the extension to .arff
Opened the WEKA and selected open file and selected .arff file
References:
1. Patricia Cerrito (2003) B data mining applications in department of mathematics, viewed

29 march 2010.
http://www.math.louisville.edu/people/faculty/Cerrito/DataMine.pdf
2. Robert L. Grossman, Chandrika kamath, Vipin Kumar and Raju, 2001‘kluwer academic
publishers’, The Netherlands, 29 march p.128
http://books.google.com.au/books?
id=K9bRLRpGM2cC&dq=applications+of+data+mining&printsec=frontcover&source=in&h
l=en&ei=f4C1S7GFF9CHkQWSj4yXDQ&sa=X&oi=book_result&ct=result&resnum=11&v
ed=0CDoQ6AEwCg#v=onepage&q=&f=false
3. Daniel Barbara and Sushil Jajodia, 2002 ‘kluwer academic publishers’, north and Central
America, 29march p.25
id=QXNj15Lp1OsC&dq=applications+of+data+mining&printsec=frontcover&source=in&hl
=en&ei=f4C1S7GFF9CHkQWSj4yXDQ&sa=X&oi=book_result&ct=result&resnum=12&ve
d=0CDwQ6AEwCw#v=onepage&q=&f=false
4. Galit Shmueli, Nitin R.Patel, Peter C. Bruce2007 ‘John Wiley and sons’, Canada, 29
march p.13
id=cM3hN0mvzLsC&dq=applications+of+data+mining&printsec=frontcover&source=in&hl
=en&ei=hYO1S-
HKApiekQWsp8WPDQ&sa=X&oi=book_result&ct=result&resnum=13&ved=0CEIQ6AEw
DA#v=onepage&q=&f=false

Data Mining Principles

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Principles

Uploaded by

Copyright:

Available Formats

QUESTION 1:

Use of data mining is mathematics.

According to Patricia B.Cerrito (2003) data mining applications in department of

Use of data mining is science and technology.

Data mining is used in computer security

Plan: formulate the data

The missing values are 5.95, 110.85.

1. Patricia Cerrito (2003) B data mining applications in department of mathematics, viewed

You might also like