You are on page 1of 4

201O International Conference On Computer Design And Appliations (ICCDA 2010)

Research on Semantic Web Mining

1 2
WANG Yong-gui JIA Zhen
Dept of Software Dept of Electronics and Information Engineering
Liaoning Technical University Liaoning Technical University
Huludao, Liaoning, China Huludao, Liaoning, China
yghI2000@163.net liazh555@126.com

Abstract-A semantic-based Web mmmg is mentioned by information contained in Web, because, the desired search
many people in order to improve Web service levels and results will be submerged by the traditional search engines
address the existing Web services which is supported by the which are based on the keywords; the other, since the
lack of semantic problem. Semantic-based Web data mining is majority of Web data is unstructured, which lead to the
a combination of the semantic Web and Web mining. Web traditional data mining results will be unsatisfactory. In
mining results help to build the semantic Web. The knowledge
order to solve these problems, people start to use semantic
of Semantic Web makes Web mining easier to achieve, but
information to improve the Internet capacity to provide
also can improve the effectiveness of Web mining. This paper
services for human. Machine-processable semantics
firstly introduces the related knowledge of Semantic Web and
information can be with the intelligent software products
Web mining, and then discusses the semantic-based Web
such as Agent to effectively interact. Web mining based on
mining, finally proposes to build a semantic-based Web
semantic is a combination of semantic Web and Web mining,
mining model under the framework of the Agent.
which can better improve the intelligence level of access to
Keywords-Web Ming; Semantic Web; Ontology; Agent information.

I. INTRODUCTION II. WEB MINING AND SEMANTIC WEB-RELATED


KNOWLEDGE
Following the rapid development and wide application of
the Internet, Web has become an exchange, sharing of A. Web mining
information and effective tool for collaborative work. [l]
Web mining can be generally defined as : Extract
People's attention and frequent use of the Web promote the
interested, useful patterns and implicit information from the
development of this technology, but also make the Web
WWW resources and behavior.
information resources on the rapid growth. However, there
In general, Web mining can be divided into three
are flood of information resources distribute on Web, to
categories: Web content mining, Web structure mining and
Convenient to bring the people at the same time, also makes
Web usage mining.
the network very difficult to in-depth application. On the one
Figure 1 shows the classification of Web mining:
hand a person is only concerned about a small information
of Web, and user is not interested in the rest of the

Figure 1 Web Mining Classification

Web content mining is used to extract the text, image, or engine query further to get more accurate and useful
other information and knowledge component of the web information.
content. Which sites sell cars? Which pages are in Chinese? Web structure mining is used to extract the network
Which pages introduce the music, or introduce news? Search topology information, that is, the link between pages of
engines, intelligent agents, and some recommend use information. Mine knowledge from the WWW organization
content mining to help the user in the vast network of space and links. Which pages are linked to other pages? Which
to find the necessary content. Web content mining has two pages point to other page? Which collection of pages
strategies: page text mining; process results for search

978-1-4244-7164-51$26.00 2010 IEEE VI-67 Volume 1


2010 International Conference On Computer Design And Appliations (ICCDA 2010)

constitutes an independent entity? Can sort the page and URI is responsible for resource identification, which allows
found that an important page. precise retrieval of information possible. The Second layer
Web usage mining is used to extract about the customer of XML + NS (Namespace) + XML Schema, is responsible
how to use the browser and use the page links. It extracts for representing the content and structure of data from the
interested patterns from the access to records of Web. For linguistic to separate the performance format, the data
example, which pages are the client accesses? How long structure and content of the network information form
spent on each page? What next click on? What are the entry through the use of a standard format language. The third
and exit routes? WWW Each server retains the Web access layer of RDF + RDF Schema, which provides a semantic
log, recording information for the user access and interaction. model used to describe the information on the Web and type.
Analysis of these data can help understand the user's The fourth layer of ontology vocabulary layer is responsible
behavior, thus improving the structure of the site, or to for the definition of shared knowledge and describes the
provide users with personalized services. semantic relationships between the various kinds of
information to reveal the semantic between information
B. Semantic Web
itself and information. The fifth layer of logic layer is
[2 ]
The basic idea of Semantic Web is that embed responsible for providing axioms and inference principles to
machine-readable, on behalf of certain types of knowledge provide the basis for intelligent services. The sixth layer of
mark in the Web message. So that the data on the Web is not Proof and the seventh layer of trust are responsible for
only used to display, but also be understood by the machine providing authentication and trust mechanisms. Digital
so as to enhance the quality of the information services and signatures and encryption technology used to detect changes
explore a variety of new, intelligent information services. If in the document situation is a mean to enhance Web
the knowledge that reflect the link between data and security.
application are embedded in a variety of different This is a hierarchical structure of the enhanced functional.
information sources in a user transparent manner, Web XML, RDF (S) and the Ontology are its core in the Semantic
pages, database, procedures will be able to link up through Web architecture. The formation of the Semantic Web's
the agent and each other collaborate. technical support system mark with the three core
According to Berners-Lee's vision, the semantic network technology. They support semantic description for network
Constituted by seven levels is constituted of a layered information and knowledge, to play a central role in
[3]
architecture . As shown in Table 1. The first layer of URI achieving the semantic-level knowledge sharing and
and Unicode is the basis for the structure of the entire system. knowledge reuse.
Unicode is responsible for processing resources encoding,

TABLE 1 SEMANTIC WEB ARCHITECTURE

Layers name Description


Low The Semantic Web-based: Unicode Processing resources to encoding, URI
Layer 1 Unicode and URI
(Unifonn Resource Locator) negative Responsible for identification of resources
L XML+NS+XML
Layer 2 Used to represent the data content and structure
Schema
RDF+RDF
Layer 3 Used to describe resources on the Web and types
Schema
Ontology
Layer 4 Describe the various types of resources and the relationship between resources
Vocabulary
Layer 5 Logic In the following four layers operate on the basis of logical reasoning

V
High
Layer 6

Layer 7
Proof

Trust
According to logic, to verify statements in order to draw conclusions

The establishment of a trust relationship between users

understandmg of knowledge m thIS area to determine the


Semantic Web is known as Web3.0, it is based on field of co-sanctioned vocabulary, and to give a clear
resource description framework RDF to integrate a variety definition between the words and the interrelationship of
of applications of XML-syntax, uniform resource identifier words, according to the relationship between the concept to
as naming mechanism. Semantic Web is just an extension of describe the semantics of the concept. Ontology-based
the current Web and is not a new Web. The research focus is semantic annotation using ontology defined by experts
how the information can only be changed from the form that support the content creator to add semantic metadata in the
a computer can read to the form that a computer can Web page, so content can be understood by people and
understand and deal with, that is with the semantics, so that machines, as compared with the general public, this is a
the computer and people can work together. marked top-down classification. Semantic Web which can
Web resources (such as Web pages, Web service) for the be seen as a new generation of information infrastructure is a
use of ontology annotation terms are an important new distributed intelligent network platform based on
prerequisite for goal to achieve the semantic Web. Ontology semantic information processing.
in Tim Berners-Lee proposed the Semantic Web-seven is in
the fourth tier architecture, which aims to capture the
knowledge in related fields, provides a common

VI-68 Volume 1
2010 International Conference On Computer Design And Appliations (ICCDA 2010)

III. WEB MINING BASED ON SEMANTIC N ETWORK structure mining.


4] Semantic Web usage mining. In the Semantic Web
Semantic Mining[ is a series of semantic analysis of

environment, we can give a clear semantics to user


information resources and users' question by advanced
behavior the body of knowledge based on the log
intelligence theory and technology, through mining its deep
file of semantic ontology knowledge. On this basis,
semantics, in order to fully and accurately to express
excavation shown to be effective in establishing the
knowledge resources and user needs, and then in various
users gathering in the same interest, which provides
distributed, heterogeneous databases, data warehouses,
users with ontology-based personalized view to
Knowledge Base to search, at last, retrieve information in
improve the Web usage mining results.
intelligent processing to return the most relevant results of
Agent is an intelligent software entity, which is able to
the semantic retrieval mechanism.
complete spontaneously a specific function and can be
Semantic-based Web data mining combine Semantic that
related to Agent communications under certain
is extracted from existing Web data extraction or existing
circumstances. Agent is usually autonomous, social, active
Semantic structures with Web Mining. Web mining results
and passive response to their own adaptability and mobility.
help to build the semantic Web, the Semantic Web mining
Intelligent Agent can complete intelligent reasoning tasks
knowledge makes it easier to achieve and improves the
according to the semantic information on Web, and can
effectiveness of Web mining. Corresponds to the Web
improve the accuracy of information retrieval. So now
mining, semantic-based Web Mining we can be divided into
Agent technology has been widely used in building an
semantic Web content mining, Semantic Web structure
intelligent system.
mining and semantic Web usage mining categories.
Semantic Web Mining Model under the framework of
Semantic Web content and structure mining. In the
Agent
Semantic Web, content and structure of the tangled,
According to the above-mentioned knowledge, we can
which makes content mining and structure mining
create a Semantic Web Mining Model under the framework
differences almost vanished, so we put them here
of Agent to better understand the combination of the
collectively referred to as the semantic Web content
semantic network and Web mining techniques. This model
and structure mining. Thus, the traditional relevant
creates the whole process from five steps to complete.
technical for relational data mining can easily be
Semantic Web Mining Model under the framework of
transferred to the Semantic Web content and
Agent is shown in Figure 2,

Ontology Le ming

Figure 2 Agent under the framework of Semantic Web Mining Model

The first step: In the beginning, you need to build an conceptual level (initial ontology). The ontology level will
initial ontology. To build an initial ontology first need to be stored in ontology library system to provide support for
obtain the relevant set of atomic concepts, we use clustering the next phase of work.
algorithm to obtain the document from the Web; and then The second step: resource acquisition module collects
get this concept hierarchy by a variety of different ways. task-related data sets according to received tasks instructions
One way is to use the knowledge acquisition methods to by ontology Agent from a Web mining. Usually this step is
generate, such as ONTEX (ontology Exploration) which essential. Because the data set on Web is very scattered,
input a group of concept sets depending on knowledge dynamic and often inconsistent data, whether the data
acquisition techniques of properties detect, and then output collection is good or bad will have a direct impact on the
the level of above concept collection. Another way also can results of Web mining.
use many of the ontology models that the current ontology The third step: RDF clustering module achieves ontology
researchers have developed. These include both general clustering learning to the data that resource acquisition
knowledge ontology model description and a specific modules has collected. The resource nodes of closest
description of knowledge in the field. Ontology model characteristics will be got together in the RDF data
combine knowledge of experts in the field builds a repository.

VI-69 Volume I
2010 International Conference On Computer Design And Appliations (ICCDA 2010)

The fourth step: Data stored in the RDF data repository


are mined by Semantic Web Mining module and the mining
results are provided to ontology Agent.
The fifth step: Ontology Agent completes semantic
filtering and clustering of processing for results obtained by
Semantic Web Mining module, to improve the relevance of
return information; and also ontology learning can take
advantage of the semantic Web mining modules to carry out
the expansion and modification of ontology knowledge.

IV. SUMMARY

In this paper, first introduces briefly Web mining and


Semantic Web-related knowledge, then describes the
integration of the two-Web mining based on semantic, and
proposes an Semantic Web mining model under the
framework of Agent, which gives the build process and brief
description of each module functions. Due to the immaturity
of the relevant technologies, as well as various aspects of the
limitations, this paper is not a concrete realization of the
model, which will in future work remains to be further
study.

REFERENCES

[I] Wen-Wei Chen, "Data Warehouse and Data Mining Tutorial,"[M],


Beijing: Tsinghua University Press, 2008, 4
[2] Zhong Xue Ling, "Semantic Web in the core layer of technical
analysis,"[M], South China Financial Computer Applications
Technology, 2007,10
[3] Lu Jian-Jiang, "Semantic Web principles and technology," [M],
Beijing: Science Press, 2007, 3
[4] Zhang Hui, ed, "Ontology-based Semantic Web Mining
Technology."[D], computer development and applications, 2009, 2

VI-70 Volume 1

You might also like