Professional Documents
Culture Documents
Volume: 3 Issue: 7
ISSN: 2321-8169
4730 - 4734
_______________________________________________________________________________________________
Analysis of Web Log from Database System utilizing E-web Miner Algorithm
Mukul B. Chavan
Abstract Enormous content of information on the World Wide Web makes it clear for contender for data mining research. Data Mining
Technique application is used to the World Wide Web referred as Web mining where this term has been used diverse ways. Web Log Mining is
one of the Web based application where it confronts with large amount of log information. In order to produce the web log through portal usage
patterns and user behaviors recognition, this intended work is an endeavor to apply an efficient web mining algorithm for web log analysis,
which is applied to identify the context related with the web design of an e- business web portal that requests security. Because of tremendous
utilization of web, web log documents have a tendency to become large resulting in noisy data files. It can find the browsing patterns of client
and some class of relationships between the web pages. Here we analyze the logs using web mining algorithm. Whatever the result we will get
compare within the Apriori, AprioriAll and E-Web Miner Algorithm. Through the analysis we recognize that web mining algorithm called Eweb miner i.e. Efficient Web Mining performs better considering space and time complexity. It can likewise be verified by comparison,
candidate sets are much smaller and our results show number of database scanning get reduced due to implementation of E-Web Miner
Algorithm.
Keywords- Web Mining, Web Log Analysis, Server Log File, E-Web Miner, Apriori Algorithm, AprioriAll Algorithm.
__________________________________________________*****_________________________________________________
I.
INTRODUCTION
_______________________________________________________________________________________
ISSN: 2321-8169
4730 - 4734
_______________________________________________________________________________________________
particular web site. Web usage mining uses web logs to record
user access patterns. Log files are created by web servers and
filled with information about user requests on a particular Web
site. We make an endeavor to apply an efficient web mining
algorithm for web log analysis. The results acquired may be
utilized against class of problems; from search engines in
order to identify the context on the basis of association to web
site design of an ecommerce web portal that demands security.
This work intends to illustrate that the E-Web Miner has much
better execution in provisions of time and space complexity
than Apriori and AprioriAll Algorithm and affirms the
appropriateness of result acquired by giving a trace back route
for candidate set pruning for the algorithms. The number of
data base scannings significantly gets decreased in proposed
algorithm and the candidate sets are discovered to be much
minor.
II.
LITERATURE SURVEY
PROPOSED SYSTEM
_______________________________________________________________________________________
ISSN: 2321-8169
4730 - 4734
_______________________________________________________________________________________________
This system is an attempt to apply an efficient web mining
algorithm for web log analysis. The results obtained from the
web log analysis may be applied to different problems of
search engines in order to identify the context and to design
web site of an e-commerce as per the users behavior.
Generally, the main objective of this project is to Web Usage
Mining process, specifically:
Server Log File: The server log files are retrieved from the
server. Web log file is automatically created and kept by a web
server. Every hit to the Web site, including every
observation of a HTML document, image or other object, is
logged. The unprocessed web log file format is essentially one
line of text for each hit to the web site. Thus server log file are
been chosen for further analysis. The information in web log
file keep up a correspondence to the access patterns of
different users of the overall web traffic, ranging from singleuser, single-site browsing behaviour to multi-user, multi-site
access patterns. A log file record contains essential
information about a request: the client side host name or IP
address, the date and time of the request, the requested file
name, the HTTP response status and size, the referring URL
[12], and the browser information.
Data Selection: In the data selection phase, selection of the
server log files must be done carefully; there are several
facilities such as My Portfolio and Resources. The server log
file also includes the mix of log file for every transaction
between the facilities in the Web portal.
Data Preprocessing: Data Pre-processing comprises of all
the considerable actions taken before the actual pattern
Analysis phase process starts. The pre-processing steps
include cleaning, user identification and session identification.
Cleaning is the process which removes all entries which will
have no use during analysis or mining. The major task in this
phase is includes handling missing values, identifying outliers,
smooth out noisy data and correct inconsistent data. The large
amount of server log file data becomes the most challenging
problem to handle during the Data Preprocessing phase. We
use Clean and Sort in our implementation. It gives total no of
users in Particular Log.
Pattern Discovery: The pattern discovery has three major
operations of concern, Association (i.e. which pages to be
accessed collectively), Clustering (i.e. finding groups of users,
transactions, pages, etc.), and Sequential analysis (the order in
which web pages tend to be accessed). Here all the algorithms
are applied to the preprocessed data.
In session wise view we undergo following type of log
format
User Agent: This is nothing but the IP address from where
the user sends the request to the web server.
Date: The time or date when the user surf web page from
the web site. This is identified as the session.
URL: The resource accessed by the user. It may be an
HTML page, a CGI program, or a script.
Request type: The method used for information transfer is
noted. The methods like GET, POST.
Response Type:
100 HTTP_INFO
200 HTTP_SUCCESS
300 HTTP_REDIRECT
400 HTTP_CLIENT ERROR
500 HTTP_SERVER ERROR
Pattern Analysis: Pattern analysis the inspiration is to filter
out uninteresting rules or patterns found in the earlier phase.
During the Pattern Analysis phase, the descriptive method is
being used to analyse the data after the various algorithm
4732
_______________________________________________________________________________________
ISSN: 2321-8169
4730 - 4734
_______________________________________________________________________________________________
implementations such as general summary of the Web usage
and customer behaviours. The analysis also tries to find out the
top visitors for each facility or option that being provided by
the portal. Beside the option analysis, the sever log files also
trace the information of documents that was downloaded.
Applying Algorithms: Apriori, AprioriAll and E-Web
Miner Algorithm are applied individually to similar log file
and then results are calculated with respect to time and Log
records. It also describes differences between all algorithms in
terms of time and space complexity.
Security: Here we are going secure the log analysis result
using AES. In this module the analysed data will be gathered
in encrypted format so its not be attacked or reveal by any
attacker. This data can be viewed by only authenticated person
by the authentication process such as Admin who holds the
desired decryption Key.
AES Public key Encryption Algorithm:
The Advanced Encryption Standard (AES), is a
specification for the encryption of electronic data established
by the U.S. National Institute of Standards and Technology in
2001. AES is based on the Rijndael cipher developed by two
Belgian cryptographers, Joan Daemen and Vincent Rijmen,
who submitted a proposal to NIST during the AES selection
process. Rijndael is a family of ciphers with different key and
block sizes. For AES, NIST selected three members of the
Rijndael family, each with a block size of 128 bits, but three
different key lengths: 128, 192 and 256 bits.
AES has been adopted by the U.S. government and is now
used worldwide. It supersedes the Data Encryption Standard
(DES), which was published in 1977. The algorithm described
by AES is a symmetric-key algorithm, meaning the same key
is used for both encrypting and decrypting the data. The block
cipher Rijndael is designed to use only simple whole-byte
operations. Also, it provides extra flexibility over that required
of an AES candidate, in that both the key size and the block
size may be chosen to be any of 128, 192, or 256 bits. During
an early stage of the AES process, a draft version of the
requirements would have required each algorithm to have
three versions, with both the key and block sizes equal to each
of 128, 192, and 256 bits. This was later changed to make the
three required versions have those three key sizes, but only a
block size of 128 bits, which is more easily accommodated by
many types of block cipher design. In our system we
implement it for the result generation phase of analysis. Only
the desired user will be able to view the analysis.
Results: As stated above, this implementation will focus on
Web Usage Mining of Portal. The results of this study are
divided into two areas where the first section will discuss
about the general descriptions of the access pattern and users
behaviours of Portal (descriptive statistic). Another section
will display the supports and confidences of the different level
in Portal. All the results will display using certain chart such
as graphs and tabular data result to make it easier understand.
Proposed Algorithm
E-Web Miner is the improvisation of Web mining algorithms
which removes the loopholes in the Apriori-All Algorithm.
Following are the steps of the algorithm.
1. Make the set of web pages in the ascending order for the
various users.
2. Now assign the set of pages in the string array a for the
user u.
3. Initialize the f=0, max= 0, where f is frequency and max is
maximum.
4. Consider I vary from 1 to n; also J varies from 0 to (n-1)
5. If substring (a [I] ; a [J])
f=f+1;
END IF
b [I] = f;
If max f
Max = f;
END IF
6. Find out the positions in array b, where the value is
nearly equal to maximum value and choose the
corresponding substring from array a.
7. Repeat the step 6 and produce output for all the
substring with their positions, which is intended output.
Mathematical Model
1.
2.
3.
4.
5.
_______________________________________________________________________________________
ISSN: 2321-8169
4730 - 4734
_______________________________________________________________________________________________
6.
EXPRIMENTAL RESULTS
1800
1600
1400
1200
1000
800
600
400
200
0
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
eb
in
e
ri A
ll
ACKNOWLEDGMENT
EW
Ap
rio
rio
Ap
Time
ri
Time (ms)
A. Dataset
In this work we have used Server Log file as input to
system.
B. Results
As per this proposed System, we can achieve various
Performances for log analyzer like Time and space Efficient
Web mining Algorithm, Free up User Behavior, Secure result
transmission.
From below graph, it proves proposed E-Web miner
system performs better than existing Apriori, AprioriAll
system in which time required for analyzing log. Which
proves that proposed E-web Miner has lower time complexity
as compared to rest of mining algorithms.
V.
4734
IJRITCC | July 2015, Available @ http://www.ijritcc.org
_______________________________________________________________________________________