Sematic Web

SEMANTIC WEB
Submitted in partial fulfillment of the requirements for
the award of the degree of
MASTER OF COMPUTER APPLICATIONS
[SOFTWARE ENGINEERING]
Guide(s): Submitted By:
MR. S.K. MALIK MANIT PANWAR
MCA (SE) 1ST SEM.
Roll No. 001164105409
University School of Information Technology
GGS Indraprastha University, Delhi – 6
(2009-2012)
1
CERTIFICATE
This is to certify that the Term Paper (IT-655) entitled “SEMANTIC
WEB” done by Mr. MANIT PANWAR, Roll No. 00116404509 is an
authentic work carried out by him at USIT, GGSIPU under my guidance.
The matter embodied in this term work has not been submitted earlier for
the award of any degree or diploma to the best of my knowledge and
belief.
Dated: (Signature of the Guide)
Mr. S.K. MALIK
Lecturer, USIT
GGSIPU,Delhi-6
2
ACKNOWLEDGEMENT
I owe a great many thanks to a great many people who helped and supported
me during the writing of this term paper.
My deepest thanks to Mr S.K. MALIK, the Guide of my term paper for
guiding and correcting various documents of mine with attention and care. She
has taken pain to go through the term paper and make necessary correction as
and when needed.
I express my thanks to the Dean of USIT, GGSIPU, for extending his support.
My deep sense of gratitude to Mr. Amit Prakash Singh, Teacher Incharge of
Term Paper for his support and guidance. Thanks and appreciation to the
helpful people at UIRC and Computer Centre, for their support.
I would also thank my Institution and faculty members without whom this
term paper would have been a distant reality. I also extend my heartfelt thanks
to my family, friends and well wishers.
MANIT PANWAR
3
4
ABSTRACT
This paper presents the basic analysis of the semantic web. How this dream can bring
a revolution in the web, businesses, enterprise, AI, Security system etc. everywhere,
because it is an effort toward making the biggest basket of information and data
which we call internet so intelligent that we can get the information what we want, as
close as a person does when asked for something, he will tell exactly what is asked
for and thus saving both time and effort. Semantic Web is going to bring a new
dimension in the field of information technology by giving a better search facility on
the web and is the fire that increasingly generating it’s flame in the IT industry. It is a
collective effort towards making the biggest database from which we can retrieve
information in an intelligent and meaningful manner. This paper focuses the various
aspects of semantic web like: History, Introduction, Tools, Architecture layers .
5
INTRODUCTION
Almost everyone uses the internet today for their specific purposes, they surf, they
chat, they search, if you look on the internet as a sea of data, data here can be
anything but the medium is the water (the internet), we know what data to search,
most of the time, but yes then also we don’t know the exact location of the data
where it is floating, the water (the internet) is just a medium, you get in, and go on
the search of the data you want but where?, and How? And what exactly our data
looks like? The semantic web, what I can understand in a very broad sense, will
make those dead data live, and tell us where they are (location), what they look like
(the exact data we are looking for) on simple call, i.e. data understand us, it is smart
now, it carries the information about itself, can tell us what it is and why and from
where it is, like a normal person gives his introduction. Now in the unending sea I
can see where my data is, and in my call it will respond. Now, the question is how
the machine will understand what data it is communicating with? That is we have
made the data so smart that it can give the introduction about itself, but to whom?
Not me, as I’m the end user, I’m not interested in knowing the data, I want to use it.
The entity or the person who may be interested in knowing the data is the machine,
now we have to make our machine understand what data is saying, but definitely not
by changing the hardware, instead if I can make the information, the data is giving
so, which can be directly understand by the machine then everything will work fine.
So basically our focus is on the data and the information it carries. Now, the data
from a dead entity which has no information about itself, to live and smart entity,
which carries enough semantic information so that a machine can understand it (or
6
can process it), has gone through many changes, in the way we thought of it, the way
it was used.
7
THE EARLY SEMANTIC WEB
The original idea of the Semantic Web was to bring machine-readable descriptions to
the data and documents already on the Web, in order to improve search and data
usage. The Web was, and in most cases still is, a vast set of static and dynamically
generated Web pages linked together. Pages are written in HTML (Hyper Text
Markup Language), a language that is useful for publishing information intended
only for human consumption. Humans can read Web pages and understand them, but
the inherent meaning is not available in a way that allows interpretation by
computers. The Semantic Web aims at defining ways to allow Web information to be
used by computers not only for display purposes, but also for interoperability and
integration between systems and applications. One way to enable machine-to-
machine exchange and automated processing is to provide the information in such a
way that computers can understand it. To give meaning to Web information, new
standards and languages are being investigated and developed. Well-known
examples include the Resource Description Framework (RDF) (RDF 2002) and the
Web Ontology Language (OWL) (OWL 2004). The descriptive information made
available by these languages allows characterizing individually and precisely the
type of resources in the Web and the relationships between resources.
Today, the Semantic Web is not only about increasing the expressiveness of Web
information to enable the automatic or semiautomatic processing of Web resources
and Web pages. Academia and industry have realized that the Semantic Web can
facilitate the integration and interoperability of intra- and inter-business processes
8
and systems, as well as enable the creation of global infrastructures for sharing
documents and data, make searching and reusing Information easier.
9
THE SEMANTIC WEB
A major drawback of XML is that XML documents do not convey the meaning of
the data contained in the document. Exchange of XML documents over the Web is
only possible if the parties participating in the exchange agree beforehand on the
exact syntactical format (expressed in XML Schema) of the data. The Semantic Web
allows the representation and exchange of The information in a meaningful way,
facilitating automated processing of descriptions on the Web. Annotations on the
Semantic Web express links between information resources on the Web and connect
information resources to formal terminologies – these connective structures are
called ontologies. Ontologies form the backbone of the Semantic Web; they allow
machine understanding of information through the links between the information
resources and the terms in the ontologies. Furthermore, ontologies facilitate
interoperation between information resources through links to the same ontology or
links between ontologies. The term “ontology” originates from philosophy and has
been adopted in the field of Computer Science with a slightly different meaning :
An ontology is a formal explicit specification of a shared conceptualization.
In the late 1990s the idea of a Semantic Web boosted interest in the development of
ontologies even further. The general conviction held by the W3C is that the Semantic
Web needs an ontology language that is compatible with current Web standards and
is in fact layered on top of them. The language needs to be expressed in XML and,
preferably, should be layered on top of RDF(S)
10
Ontologies and the Semantic Web
A key feature of ontologies is that, through formal, real-world semantics and
consensual terminologies, they interweave human and machine understanding This
important property of ontologies facilitates the sharing and reuse of ontologies
among humans, as well as among machines.
A major reason for the recent increasing interest in ontologies is the development of
the Semantic Web , which can be seen as knowledge management on a global scale.
Tim Berners-Lee, inventor of the current World Wide Web and director of the World
Wide Web Consortium (W3C), envisions the Semantic Web as the next generation of
the current Web. This “next generation” will expand upon the prowess of the current
Web by adding machinereadable information and automated services. According to ,
“The explicit representation of the semantics underlying data, programs, pages, and
other Web resources will enable a knowledge-based Web that provides a
qualitatively new level of service.” Ontologies provide such an explicit
representation of semantics. The combination of ontologies with the Web has the
potential to overcome many of the problems in knowledge sharing and reuse and in
information integration. Ontologies interweave human and computer understanding
of symbols. These symbols, also called terms and relations, can be interpreted by
both humans and machines. The meaning for a human is represented by the term
itself, which is usually a word in natural language, and by the semantic relationships
between terms. An example of such a human-understandable relationship is a
superconcept – subconcept relationship (often referred to by the term “is-a”). Such a
relationship denotes the fact that one concept (the superconcept) is more general than
another (the subconcept). For instance, the concept Person is more general than
11
Student. Figure below shows an example “is-a” hierarchy (or taxonomy), where the
more general concepts are located above the more specialized concepts.
Concepts describe a set of objects in the real world. For example, the concept PhD-
Student aims to capture all existing PhD students. One such PhD student is Mary,
who is modeled in Fig. as a box, and has an instance of relation to the concept PhD-
Student. This instance-of relationship means that the actual object is captured by the
concept PhD-Student. And because of the formal is-a relationships between the
concepts PhD-Student, Researcher, Student, and Person, John must also be an
instance of the concepts Researcher, Student, and Person. These relationships are
fairly easy to understand for the human reader and, because the meanings of the
relationships are formally defined, a machine can reason with them and draw the
same conclusions as a human can. These relationships, which are implicitly known to
humans (e.g. a human knows that every student is a person) are encoded in a
formally explicitly way so that they can be understood by a machine. In a sense, the
machine does not gain real “understanding”, but the understanding of humans is
encoded in such a way that a machine can process it and draw conclusions through
logical reasoning.
12
13
14
The Resource Description Framework
The Resource Description Framework (RDF) is the first language developed
especially for the Semantic Web. RDF was developed as a language for adding
machine-readable metadata to existing data on the Web. RDF uses XML for its
serialization in order to realize the layering depicted in the Semantic Web language
layer cake (Fig. 3.1). RDF Schema [20] extends RDF with some basic (frame-based)
ontological modeling primitives. There are primitives such as classes, properties, and
instances. Also, the instance-of, subclass-of, and subproperty-of relationships have
been introduced, allowing structured class and property hierarchies. RDF has the
subject–predicate–object triple, commonly written as P(S,O), as its basic data model.
An object of a triple can, in turn, function as the subject of another triple, yielding a
directed labeled graph, where resources (subjects and objects) correspond to nodes,
and predicates correspond to edges. Furthermore, RDF allows a form of reification (a
statement about a statement), which means that any RDF statement can be used as a
subject in a triple.
15
LAYER ARCHITECTURE
Tim Berner’s Lee proposed a nine layer architectureas shown above in figure 1. It
includes Unicode, URI, XML, Namespace, XMLSchema, RDF, RDF schema
Ontology, Digital signature, Logic, Proof and Trust
Following is the description of various Layers:-
Unicode Unicode is a standard way of allowing computers to consistently
representing and manipulating text expressed in most of the world’s writing systems.
URI Uniform Resource Identifier (URI) is a compact string of characters used to
identify or name a resource on the Internet.
XML Extensible Markup Language (XML) is general-purpose specification for
creating custom markup languages. It is classified as an extensible language because
it allows its users to define their own elements. Its primary purpose is to help
16
information systems share structured data, particularly via the Internet. XML Schema
An XML schema is a description of a type of XML document, typically expressed in
terms of constraints on the structure.
XML Namespace An XML namespace is a collection of Names (identified by a URI)
used in XML document as element types and attribute names. RDF Resource
description framework actually creates the metadata about the document as a single
entity, i.e. the author of the document, its creation date, its type etc.
Ontology Vocabulary It is main layer, consist of hierarchical distribution of
important concepts in thedomain and describing about the properties. Some basic
ontology languages are OWL,DAML-ONT and DAML+OIL etc
Digital Signature Digital Signature Support the notion of trust.
Purpose:-
a)Is to digitally sign the document
b) Encryption can be applied to prevent unauthorized access
Logic It is a monotonic Logic. In this layer any rule can export the code but can’t be
imported.
Proof Goal is to make the smarter content, so to make machine understandable.
Trust This is the top most layer, where the trustworthiness of information to be
subjectively evaluated
17
CHALLENGES FOR A NEW SEMANTIC WORLD
As with every technological evolution, the Semantic Web and ontologies need to
promote their unique value proposition for specific target groups in order to achieve
adoption. A common pitfall made in the studies of the Semantic Web is the limited
focus on “technological perspectives” or, in the other extreme, the difficulty to
communicate the underlying capacity of semantics and ontologies to meet critical
real world challenges.
Some of the challenges for the Semantic Web include vastness, vagueness,
uncertainty, inconsistency and deceit. Automated reasoning systems will have to deal
with all of these issues in order to deliver on the promise of the Semantic Web.
Vastness: The World Wide Web contains at least 48 billion pages as of this writing
(August 2, 2009). The SNOMED CT medical terminology ontology contains
370,000 class names, and existing technology has not yet been able to eliminate all
semantically duplicated terms. Any automated reasoning system will have to deal
with truly huge inputs.
Vagueness: These are imprecise concepts like "young" or "tall". This arises from the
vagueness of user queries, of concepts represented by content providers, of matching
query terms to provider terms and of trying to combine different knowledge bases
with overlapping but subtly different concepts. Fuzzy logic is the most common
technique for dealing with vagueness.
Uncertainty: These are precise concepts with uncertain values. For example, a patient
might present a set of symptoms which correspond to a number of different distinct
18
diagnoses each with a different probability. Probabilistic reasoning techniques are
generally employed to address uncertainty.
Inconsistency: These are logical contradictions which will inevitably arise during the
development of large ontologies, and when ontologies from separate sources are
combined. Deductive reasoning fails catastrophically when faced with inconsistency,
because "anything follows from a contradiction". Defeasible reasoning and
paraconsistent reasoning are two techniques which can be employed to deal with
inconsistency.
Deceit: This is when the producer of the information is intentionally misleading the
consumer of the information. Cryptography techniques are currently utilized to
ameliorate this threat.
This list of challenges is illustrative rather than exhaustive, and it focuses on the
challenges to the "unifying logic" and "proof" layers of the Semantic Web. The
World Wide Web Consortium (W3C) Incubator Group for Uncertainty Reasoning
for the World Wide Web (URW3-XG) final report lumps these problems together
under the single heading of "uncertainty". Many of the techniques mentioned here
will require extensions to the Web Ontology Language (OWL) for example to
annotate conditional probabilities. This is an area of active research,[ So3.
19
Semantic Web Application Areas
As a result of the pervasive and user-friendly digital technologies emerging within
our information society, web content is increasingly multiform, inconsistent and very
dynamic. Such content is unsuitable for machine processing, and necessitates human
interpretation and its respective costs in time and money for business. To remedy
this, approaches aim at abstracting of this complexity (e.g. by using ontologies) and
offering new and enriched services able to process those abstractions (e.g., by
mechanized reasoning) in a fully automated way. This abstraction layer is the subject
of a very dynamic activity in research, industry and standardization which is usually
called "Semantic Web".
The initial application of Semantic Web technology has focused on Information
Retrieval (IR) where access through semantically annotated content, instead of
classical (even sophisticated) statistical analysis, aimed to give far better results (in
terms of precision and recall indicators). The next natural extension was to apply IR
in the integration of enterprise legacy databases in order to leverage existing
company information in new ways. Present research has turned to focusing on the
seamless integration of heterogeneous and distributed applications and services.
Some of the application areas of Semantic Web are:
1) Knowledge Management
Knowledge is one of the key success factors for enterprises, both today and in the
future. Therefore, company knowledge management has been identified as a strategic
tool. However, if information technology is one of the foundational elements of KM;
KM, in turn, is also interdisciplinary by its nature. In particular, it includes human
20
resource management, enterprise organization and culture. We view KM as the
management of the knowledge arising from business activities, aiming at leveraging
both the use and the creation of that knowledge for two main objectives:
capitalization of corporate knowledge and durable innovation fully aligned with the
strategic objectives of the organization. The development of knowledge portals
serving the needs of companies or communities is still a manual process. Ontologies
and related metadata provide a promising conceptual basis for generating parts of
such knowledge portals. Obviously, among others, conceptual
models of the domain, of the users and of the tasks are needed. The generation of
knowledge portals has to be supplemented with the semi-automated evolution of
portals. As business environments and strategies change rather rapidly, KM portals
have to be kept up-to date.Evolution of portals should also include some mechanisms
to ‘forget’ outdated knowledge.
KM solutions based on a combination of intranet-based functionalities and mobile
functionalities will be available very near future. Semantic Web technology is a
promising approach to meet the needs of mobile environments, like location-aware
personalization and adaptation of the presentation to the specific needs of mobile
devices, i.e. the presentation of the required information at an appropriate level of
granularity.
Knowledge Management is obviously a very promising area for exploiting Semantic
Web technology. Document-based KM solutions have already reached their limits,
whereas semantic technology opens the way to meet KM requirements in the future.
21
2) E-Commerce
Electronic commerce is mainly based on the exchange of information between
involved stakeholders using a telecommunication infrastructure.There are two main
scenarios: Business-to-Customer (B2C) and Business-to-Business (B2B). B2C
applications enable service providers to promote their offers, and for customers to
find offers which match their demands. By providing unified access to a large
collection of frequently updated offers and customers, an electronic marketplace can
match the demand and supply processes within a commercial mediation
environment. B2B applications have a long history of using electronic messaging
to exchange information related to services previously agreed among two or more
businesses. A knowledge-based approach has the potential to significantly accelerate
the penetration of electronic commerce within vertical industry sectors, by enabling
interoperability at the business level, and reducing the need for standardization at the
technical level. This will enable services to adapt to the rapidly changing online
environment. Knowledge based applications of this kind use one or more shared
ontologies to integrate heterogeneous information systems and allow common access
for humans and computers. This enforces the shared ontology as the standard
ontology for all participating systems, thereby removing the semantic heterogeneity
from the information system. The heterogeneity is a problem because the systems to
be integrated are already operational and it is too costly to redevelop them. A
linguistic ontology is sometimes used to assist in the generation of the shared
ontology or is used as a top-level ontology, describing very general concepts like
space, time, matter, object, event, action, etc, which the shared ontologies can inherit
22
from. Benefits are the integration of heterogeneous information sources, which can
improve interoperability, and more effective use and reuse of knowledge resources.
3) Biosciences and Medical Applications
The medical domain is a favourite target for Semantic Web applications just as the
expert system was for Artificial Intelligence application 20 years ago. The medical
domain is very complex: medical knowledge is difficult to represent in a computer
format, making the sharing of information even more difficult. Semantic Web
solutions have become very promising in this context. One of the main mechanisms
of the Semantic Web - resource description using annotation principles - is of major
importance in the medical informatics domain, especially as regards the sharing of
these resources (e.g. medical knowledge in the Web or genomic database). The web
services technology allows us to imagine some solutions to the interoperability
problem, which is substantial in medical informatics.
4) Other Areas
The diverse application areas of Semantic Technologies also include the following:
• Ambient Intelligence
• Cognitive Systems
• Data Integration
• Multimedia Data Management
• Software Engineering
• Cognitive Systems
23
• Machine Learning
• eScience
• Information Extraction
• Grid Computing
• Peer-to-Peer Systems
• eGovernment
HISTORY OF SEMANTIC WEB
Following are the key milestones year wise in the history of Semantic Web:
1989 Tim Berner Lee proposed WWW to CERN as a Development project.
1991 Portable browser available and distributed.
1994 • Netscape was released as a commercial browser
• Yahoo acted as search engine
• There were 2500 web servers at that time.
1995 • There were 73500 web servers at that time.
• Microsoft released IE and W3C was established as a standard body
1996 Semantic web initiated
1997 First working draft of the RDF language to define metadata was available
24
1998 Tim Berners Lee published a roadmap to the semantic web that includes query
language, inference rules and proof validation
1999 RDF became a W3C recommendation-a crucial step towards the web’s
interoperability and functionality.
2001 A vision of semantic web has broadened the vision further to include trus
TOOLS USED IN SEMANTIC WEB
The World Wide Web is an interesting paradox -- it's made with computers but for
people. The sites you visit every day use natural language, images and page layout to
present information in a way that's easy for you to understand. Even though they are
central to creating and maintaining the Web, the computers themselves really can't
make sense of all this information. They can't read, see relationships or make
decisions like you can.
The Semantic Web proposes to help computers "read" and use the Web. The big idea
is pretty simple -- metadata added to Web pages can make the existing World Wide
Web machine readable. This won't bestow artificial intelligence or make computers
self-aware, but it will give machines tools to find, exchange and, to a limited extent,
interpret information. It's an extension of, not a replacement for, the World Wide
Web.
That probably sounds a little abstract, and it is. While some sites are already using
Semantic Web concepts, a lot of the necessary tools are still in development. In this
25
article, we'll bring the concepts and tools behind the Semantic Web down to earth by
applying them to a galaxy far, far away.
Comprehensive Listing of Few New Semantic Web Tools
The AMALGAM (Automatic Mapping Among Lexico-
Grammatical Annotation Models) project is an attempt to
create a set of mapping algorithms to map between the

AMALGAM
main tagsets and phrase structure grammar schemes used in
various research corpora. Software has been developed to
tag text with up to 8 annotation schemes
Amine is a Multi-Layer Platform implemented in Java. It
provides various Engines and GUIs to build a wide variety

Amine
of Ontology-based applications, Conceptual Graph based
applications, Intelligent Systems and Multi-Agents Systems
Anacubis is a visual analysis tool the lets its users visualize
the relationships between entities in a collection of

Anacubis
information. The visualization is rather similar to concept
maps.
Exteca The Exteca and document categorisation. It can be used in
conjunction with search engines platform is an ontology-
26
based technology written in Java for high-quality
knowledge management
Jena is a Java framework to construct Semantic Web
Applications. It provides a programmatic environment for
RDF, RDFS and OWL, SPARQL and includes a rule-based

Jena
inference engine. It also has the ability to be used as an
RDF database via its Joseki layer. See the Jena discussion
list for more information
Pedro is an application that creates data entry forms based
on a data model written in a particular style of XML
Schema. Users can enter data through the forms to create

Pedro
data files that conform to the schema. They can use
controlled vocabularies to mark-up text fields and have the
application perform basic validation on field data
Platypus Wiki is an enhanced Wiki Wiki Web with ideas
taken from Semantic Web. It offers a simple user interface
Platypus Wiki to create a Wiki Page plus metadata according with W3C
standards. It uses RDF/RDFS and OWL to create
ontologies and manage metadata
Protege+OWL+Ruby (POR) Utilities provides an ontology,
a set of ruby classes and methods to simplify the

POR
development of Protege+OWL Ontology Driven
applications. At the moment project is limited to JRuby
27
ASIS (Ada Semantic Interface Specification) for GNAT on
gcc. ASIS is a published international ISO standard

ASIS for GNAT
(ISO/IEC 15291:1999). ASIS based tools are available as
well
ATLAS (Architecture and Tools for Linguistic Analysis
Systems) is a joint initiative of NIST, MITRE and the LDC
ATLAS to build a general purpose annotation architecture and a
data interchange format. The starting point is the annotation
graph model, with some significant generalizations
Swoogle A semantic Web search engine with 1.5 M resources
SWOOP A lightweight ontology editor
The Raptor RDF parser toolkit is a free software / Open
Source C library that provides a set of parsers and
serializers that generate Resource Description Framework
(RDF) triples by parsing syntaxes or serialize the triples
into >a syntax. The supported parsing syntaxes are

Raptor
RDF/XML, N-Triples, Turtle, RSS tag soup including
Atom 1.0 and 0.3, GRDDL for XHTML and XML. The
serializing syntaxes are RDF/XML (regular, and
abbreviated), N-Triples, RSS 1.0, Atom 1.0 and Adobe
XM
Rasqual Rasqal is a C library for querying RDF, supporting the
28
RDQL and SPARQL languages. It provides APIs for
creating a query and parsing query syntax. It features
pluggable triple-store source and matching interfaces, an
engine for executing the queries and an API for
manipulating results as bindings. It uses the Raptor RDF
parser to return triples from RDF content and can
alternatively work with the Redland RDF library’s
persistent triple stores. It is portable across many POSIX
systems
Protége
Protégé provides a growing user community with a suite of tools to construct domain
models and knowledge-based applications with ontologies. Protégé is a free, open-
source platform developed by Stanford Medical Informatics with support from:
- Defense Advance Research Projects Agency
-National Cancer Institute
- National Institute of Standards and Technology
- National Library of Medicine
- National Science Foundation
with additional support from its affiliates:
- DaimlerChrysler
- iSOCO: Intelligent Software for the Networked Economy
29
At its core, Protégé implements a rich set of knowledge-modeling structures and
actions that support the creation, visualization, and manipulation of ontologies in
various representation formats. Protégé can be customized to provide domain-
friendly support for creating knowledge models and entering data. Further, Protégé
can be extended by way of a plug-in architecture and a Java-based Application
Programming Interface (API) for building knowledge-based tools and applications.
The Protégé platform supports two main ways of modeling ontologies:
Protégé-Frames editor enables users to build and populate ontologies that are frame-
based, in accordance with the Open Knowledge Base Connectivity protocol (OKBC).
In this model, an ontology consists of a set of classes organized in a subsumption
hierarchy to represent a domain's salient concepts, a set of slots associated to classes
to describe their properties and relationships, and a set of instances of those classes -
individual exemplars of the concepts that hold specific values for their properties.
Protégé-OWL editor enables users to build ontologies for the Semantic Web, in
particular in the W3C's Web Ontology Language (OWL). "An OWL ontology may
include descriptions of classes, properties and their instances. Given such an
ontology, the OWL formal semantics specifies how to derive its logical
consequences, i.e. facts not literally present in the ontology, but entailed by the
semantics. These entailments may be based on a single document or multiple
distributed documents that have been combined using defined OWL mechanisms".
Protégé ontologies can be exported into a variety of formats including RDF(S),
OWL, and XML Schema.
Outstanding Protégé features
30
Some of the particular features of Protégé, not available in many of the other
ontology building tools, are following:
Automatic generation of graphical-user interfaces, based on user-defined models, for
acquiring domain instances
Extensible knowledge model and architecture
Scalability to very large knowledge bases
Semantic MediaWiki (Semantic annotation tool)
Semantic MediaWiki (SMW) is a free extension of MediaWiki – the wiki-system
powering Wikipedia – that helps to search, organize, tag, browse, evaluate, and share
the wiki's content. While traditional wikis contain only texts which computers can
neither understand nor evaluate, SMW adds semantic annotations that bring the
power of the Semantic Web to the wiki.
Introduction to Semantic Mediawiki
31
Wikis have become a great tool for collecting and sharing knowledge in
communities. This knowledge is mostly contained within texts and multimedia files,
and is thus easily accessible for human readers. But wikis get bigger and bigger, and
it can be very time-consuming to look for an answer inside a wiki. As a simple
example, consider the following question a user might have:
« What are the hundred world-largest cities with a female mayor? »
Wikipedia should be able to provide the answer: it contains all large cities, their
mayors, and articles about the mayor that tell us about their gender. Yet the question
is almost impossible to answer for a human, since one would have to read all articles
about all large cities first! Even if the answer is found, it might not remain valid for
very long. Computers can deal with large datasets much easier, yet they are not able
to support us very much when seeking answers from a wiki: Even sophisticated
programs cannot yet read and «understand» human-language texts unless the topic
and language of the text is very restricted. The wiki's keyword search does not help
either in discovering complex relationships.
Semantic MediaWiki enables wiki communities to make some of their knowledge
computer-processable, e.g. to answer the above question. The hard problem for the
computer is to find out what the words in a wiki page (e.g. about cities) mean.
Articles contain many names, but which one is the current mayor? Humans can
easily grasp the problem by looking into a language edition of Wikipedia that they do
not understand (Korean is a good start unless you are fluent there). While single
tokens (names, numbers,…) might be readable, it is impossible to understand their
32
relevance in the article. Similarly, computers need some help for making sense of
wiki texts.
In Semantic MediaWiki, editors therefore add «hints» to the information in wiki
pages. For example, someone can mark a name as being the name of the current
mayor. This is done by editors who modify a page and put some special text-markup
around the mayor's name. After this, computers can access this information (of
course they still do not «understand» it, but they can search for it if we ask them to),
and support users in many different ways.
Where SMW can help
Semantic MediaWiki introduces some additional markup into the wiki-text which
allows users to add "semantic annotations" to the wiki. While this first appears to
make things more complex, it can also greatly simplify the structure of the wiki, help
users to find more information in less time, and improve the overall quality and
consistency of the wiki. To illustrate this, we provide some examples from the daily
business of Wikipedia:
Manually generated lists. Wikipedia is full of manually edited listings such as this
one. Those lists are prone to errors, since they have to be updated manually.
Furthermore, the number of potentially interesting lists is huge, and it is impossible
to provide all of them in acceptable quality. In SMW, lists are generated
automatically like this. They are always up-to-date and can easily be customized to
obtain further information.
Searching information. Much of Wikipedia's knowledge is hopelessly buried within
millions of pages of text, and can hardly be retrieved at all. For example, at the time
33
of this writing, there is no list of female physicists in Wikipedia. When trying to find
all women of this profession that are featured in Wikipedia, one has to resort to
textual search. Obviously, this attempt is doomed to fail miserably. Note that among
the 20 first results, only 5 are about people at all, and that Marie Curie is not
contained in the whole result set (since "female" does not appear on her page).
Again, querying in SMW easily solves this problem (in this case even without further
annotation, since existing categories suffice to find the results).
Inflationary use of categories. The need for better structuring becomes apparent by
the enormous use of categories in Wikipedia. While this is generally helpful, it has
also led to a number of categories that would be mere query results in SMW. For
some examples consider the categories Rivers in Buckinghamshire, Asteroids named
for people, and 1620s deaths, all of which could easily be replaced by simple queries
that use just a handful of annotations. Indeed, in this example Category:Rivers,
Property:located in, Category:Asteroids, Category:People, Property:named after, and
Property:date of death would suffice to create thousands of similar listings on the fly,
and to remove hundreds of Wikipedia categories.
Inter-language consistency. Most articles in Wikipedia are linked to according pages
in different languages, and this can be done for SMW's semantic annotation as well.
With this knowledge, you can ask for the population of Beijing that is given in
Chinese Wikipedia without reading a single word of this language. This can be
exploited to detect possible inconsistencies that can then be resolved by editors. For
example, the population of Edinburgh at the time of this writing is different in
English, German, and French Wikipedia.
34
External reuse. Some desktop tools today make use of Wikipedia's content, e.g. the
media player Amarok displays articles about artists during playback. However, such
reuse is limited to fetching some article for immediate reading. The program cannot
exploit the information (e.g. to find songs of artists that have worked for the same
label), but can only show the text in some other context. SMW leverages a wiki's
knowledge to be useable outside the context of its textual article. Since semantic data
can be published under a free license, it could even be shipped with software to save
bandwidth and download time.
AceWiki (A Natural and Expressive Semantic Wiki)
AceWiki is a semantic wiki that is powerful and at the same time easy to use.
Making use of the controlled natural language Attempto Control English, ACE, the
formal statements of the wiki are shown in a way that looks like natural English. The
use of controlled natural language makes it easy for everybody to understand the
semantics of the wiki.
Introduction
AceWiki shows the formal semantics in controlled English. Thus, the users do not
need to cope with complicated formal languages like RDF or OWL. Unlike most
other semantic wikis, the semantics are contained directly in the article texts and not
in some form of annotations. Ontological entities like individuals, concepts, and
properties are mapped one-to-one to linguistic entities like proper names, nouns, of-
constructs, and verbs. In order to help the users to write correct ACE sentences,
AceWiki provides a predictive editor.
35
Design
The main goal of AceWiki is to improve knowledge aggregation and representation.
AceWiki should be easier to use and understand than other semantic wikis. In
addition, it should support a higher degree of expressivity. Unlike other semantic
wikis, the formal statements are not contained in “annotations” and are not
considered “metadata”, but they are the main content of our wiki. In order to achieve
a good usability and still support a high degree of expressivity, AceWiki follows
three design principles: naturalness, uniformity, and strict user guidance.
By naturalness we mean that the formal semantics has a direct connection to natural
language. Uniformity means that only one language is used at the user-interface
level. Strict user guidance, finally, means that a predictive editor ensures that only
well-formed statements are created by the user. We will now discuss these three
principles and show how they are achieved in AceWiki.
Naturalness
AceWiki is natural in the sense that the ontology is represented in a form that is very
close to natural language. This requires a direct mapping of ontological entities to
natural language words. In AceWiki, individuals are represented as proper names
(e.g. “Switzerland”), concepts5 are represented as nouns (e.g. “country”), and roles6
are represented as transitive verbs (e.g. “overlaps-with”) or as of-constructs (e.g.
“part of”). Using those words together with the predefined function words of ACE
(e.g. “every”, “if”, “then”, “something”, “and”, “or”, “not”), we can express
ontological statements as ACE sentences. Since every ACE sentence is a valid
36
English sentence, those ontological statements can be immediately understood by
any English speaker.
Uniformity
The Semantic Web community defines three categories of languages on the logic
level of the Semantic Web stack: ontology languages (e.g. OWL), rule languages
(e.g. SWRL), and query languages (e.g. SPARQL). Most languages cover only one
of those categories, and languages of different categories look usually very different.
We claim that at the user-interface level ideally one single language should cover all
those categories. In the background, there might be several internal languages, but
the users should need to learn only one. For many users who are not familiar with
formal conceptualizations, learning one formal language is already a hard task. We
should not make this learning effort harder than necessary.
ACE is able to represent those different kinds of formal statements in a very natural
way. In the case of queries, this distinction does not need to be made explicit: If a
sentence ends with a question mark then it is clear for the user that this is a query and
not an assertion. However, queries are still future work for AceWiki.
AceWiki classifies declarative ACE sentences into three categories: Some can be
translated into OWL, others can be translated into SWRL, and finally there are ACE
sentences that have no representation in OWL or SWRL at all. In ACE, this
distinction is not visible and we think that users should not bother about it. The only
thing they need to know is that if using an OWL reasoner only the OWL-compliant
sentences are considered.
37
Strict User Guidance
Learning a new formal language is normally accompanied by frequent syntax error
messages from the parser. Wikis are supposed to enable easy and quick
modifications of the content, and syntax errors can certainly be a major hindrance in
this respect, especially for new users.
This problem can be solved by guiding the users during the creation of new
statements in a strict manner. By strict we mean that the creation of syntactically
incorrect sentences is simply made impossible. This can be achieved by a predictive
editor that guides the user step by step and ensures the syntactic correctness.
Syntactic correctness can be subdivided into lexical correctness and grammatical
correctness. Lexical correctness means that only the words that are defined in a
certain lexicon are used. Grammatical correctness means that the grammar rules are
respected.
To some degree, a predictive editor could also take care of the semantic correctness.
It could prevent the users from adding statements that introduce inconsistency into an
underlying ontology. If the verb “meets”, for example, is defined in the ontology as a
relation between humans then the predictive editor could prevent the user from
writing sentences like “a man meets a car”, assuming that the ontology says that
“car” is not human.
AceWiki has a predictive editor that is used for the creation and modification of ACE
sentences. It ensures lexical and grammatical correctness of the resulting sentences.
The semantic correctness is not enforced, but the words that seem to be semantically
suitable are shown first in the list. The suitable words are retrieved on the basis of the
38
hierarchy of concepts and roles and the domain and range restrictions of roles. For
example, if a user creates the incomplete sentence
“Limmat flows-through” and there is a range restriction that says “if something
flows-through something Y then Y is a city” then the individuals that are known to
be cities are shown first in the list.
CONCLUSION
The Evolution of Semantic Web has opened a new window in IT and a hope for
better search results on Web. Tim Berner’s Lee rightly says that Semantic web will
be the next generation of current web and the next IT revolution. It is based on the
39
fundamental idea that web resources should be annotated with “Semantic Markup”
that captures information about their meaning. Semantic Web is not far away when
we understand and work on the various ways to make the current web more
meaningful and intelligent web. This can be achieved by knowing about the various
tools, technologies, layers etc of Semantic web which has been summarized in this
paper.
References
1. en.wikipedia.org/wiki/Semantic_Web
2. www.w3.org/2001/sw/
40
3. http://semanticweb.org/wiki/Tools
4. semantic-mediawiki.org
5. attempto.ifi.uzh.ch/acewiki
6. http://www.w3c.com
7. http://en.wikipedia.org/wiki/Taxonomy
8. http://www.m-w.com/dictionary/taxonomy
9. web ontology working group (http://www.w3.org/2001/sw/WebOnt/)
10. http://www.w3.org/TR/owl-features/
11. The Sementic Web - The Real World Applications from Industry
12.Springer.Enabling.Semantic.Web.Services.Nov.2006
41

Sematic Web

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sematic Web

Uploaded by

Copyright:

Available Formats

SEMANTIC WEB

Submitted in partial fulfillment of the requirements for

the award of the degree of

MASTER OF COMPUTER APPLICATIONS

Guide(s): Submitted By:

MR. S.K. MALIK MANIT PANWAR

MCA (SE) 1ST SEM.

Roll No. 001164105409

University School of Information Technology

GGS Indraprastha University, Delhi – 6

This is to certify that the Term Paper (IT-655) entitled “SEMANTIC

WEB” done by Mr. MANIT PANWAR, Roll No. 00116404509 is an

authentic work carried out by him at USIT, GGSIPU under my guidance.

the award of any degree or diploma to the best of my knowledge and

Dated: (Signature of the Guide)

Mr. S.K. MALIK

me during the writing of this term paper.

My deepest thanks to Mr S.K. MALIK, the Guide of my term paper for

and when needed.

My deep sense of gratitude to Mr. Amit Prakash Singh, Teacher Incharge of

helpful people at UIRC and Computer Centre, for their support.

to my family, friends and well wishers.

dimension in the field of information technology by giving a better search facility on

aspects of semantic web like: History, Introduction, Tools, Architecture layers .

Markup Language), a language that is useful for publishing information intended

the inherent meaning is not available in a way that allows interpretation by

integration between systems and applications. One way to enable machine-to-

machine exchange and automated processing is to provide the information in such a

standards and languages are being investigated and developed. Well-known

available by these languages allows characterizing individually and precisely the

type of resources in the Web and the relationships between resources.

information to enable the automatic or semiautomatic processing of Web resources

facilitate the integration and interoperability of intra- and inter-business processes

documents and data, make searching and reusing Information easier.

allows the representation and exchange of The information in a meaningful way,

facilitating automated processing of descriptions on the Web. Annotations on the

information resources to formal terminologies – these connective structures are

machine understanding of information through the links between the information

resources and the terms in the ontologies. Furthermore, ontologies facilitate

interoperation between information resources through links to the same ontology or

An ontology is a formal explicit specification of a shared conceptualization.

preferably, should be layered on top of RDF(S)

A key feature of ontologies is that, through formal, real-world semantics and

consensual terminologies, they interweave human and machine understanding This

important property of ontologies facilitates the sharing and reuse of ontologies

among humans, as well as among machines.

Web by adding machinereadable information and automated services. According to ,

other Web resources will enable a knowledge-based Web that provides a

qualitatively new level of service.” Ontologies provide such an explicit

information integration. Ontologies interweave human and computer understanding

between terms. An example of such a human-understandable relationship is a

superconcept – subconcept relationship (often referred to by the term “is-a”). Such a

concepts PhD-Student, Researcher, Student, and Person, John must also be an

The Resource Description Framework (RDF) is the first language developed

instances. Also, the instance-of, subclass-of, and subproperty-of relationships have

subject–predicate–object triple, commonly written as P(S,O), as its basic data model.

and predicates correspond to edges. Furthermore, RDF allows a form of reification (a

includes Unicode, URI, XML, Namespace, XMLSchema, RDF, RDF schema

Ontology, Digital signature, Logic, Proof and Trust

Following is the description of various Layers:-

Unicode Unicode is a standard way of allowing computers to consistently

URI Uniform Resource Identifier (URI) is a compact string of characters used to

identify or name a resource on the Internet.

XML Extensible Markup Language (XML) is general-purpose specification for