Professional Documents
Culture Documents
[SOFTWARE ENGINEERING]
(2009-2012)
1
CERTIFICATE
The matter embodied in this term work has not been submitted earlier for
belief.
Lecturer, USIT
GGSIPU,Delhi-6
2
ACKNOWLEDGEMENT
I owe a great many thanks to a great many people who helped and supported
guiding and correcting various documents of mine with attention and care. She
has taken pain to go through the term paper and make necessary correction as
I express my thanks to the Dean of USIT, GGSIPU, for extending his support.
Term Paper for his support and guidance. Thanks and appreciation to the
I would also thank my Institution and faculty members without whom this
term paper would have been a distant reality. I also extend my heartfelt thanks
MANIT PANWAR
3
4
ABSTRACT
This paper presents the basic analysis of the semantic web. How this dream can bring
a revolution in the web, businesses, enterprise, AI, Security system etc. everywhere,
because it is an effort toward making the biggest basket of information and data
which we call internet so intelligent that we can get the information what we want, as
close as a person does when asked for something, he will tell exactly what is asked
for and thus saving both time and effort. Semantic Web is going to bring a new
the web and is the fire that increasingly generating it’s flame in the IT industry. It is a
collective effort towards making the biggest database from which we can retrieve
information in an intelligent and meaningful manner. This paper focuses the various
5
INTRODUCTION
Almost everyone uses the internet today for their specific purposes, they surf, they
chat, they search, if you look on the internet as a sea of data, data here can be
anything but the medium is the water (the internet), we know what data to search,
most of the time, but yes then also we don’t know the exact location of the data
where it is floating, the water (the internet) is just a medium, you get in, and go on
the search of the data you want but where?, and How? And what exactly our data
looks like? The semantic web, what I can understand in a very broad sense, will
make those dead data live, and tell us where they are (location), what they look like
(the exact data we are looking for) on simple call, i.e. data understand us, it is smart
now, it carries the information about itself, can tell us what it is and why and from
where it is, like a normal person gives his introduction. Now in the unending sea I
can see where my data is, and in my call it will respond. Now, the question is how
the machine will understand what data it is communicating with? That is we have
made the data so smart that it can give the introduction about itself, but to whom?
Not me, as I’m the end user, I’m not interested in knowing the data, I want to use it.
The entity or the person who may be interested in knowing the data is the machine,
now we have to make our machine understand what data is saying, but definitely not
by changing the hardware, instead if I can make the information, the data is giving
so, which can be directly understand by the machine then everything will work fine.
So basically our focus is on the data and the information it carries. Now, the data
from a dead entity which has no information about itself, to live and smart entity,
which carries enough semantic information so that a machine can understand it (or
6
can process it), has gone through many changes, in the way we thought of it, the way
it was used.
7
THE EARLY SEMANTIC WEB
The original idea of the Semantic Web was to bring machine-readable descriptions to
the data and documents already on the Web, in order to improve search and data
usage. The Web was, and in most cases still is, a vast set of static and dynamically
generated Web pages linked together. Pages are written in HTML (Hyper Text
only for human consumption. Humans can read Web pages and understand them, but
computers. The Semantic Web aims at defining ways to allow Web information to be
used by computers not only for display purposes, but also for interoperability and
way that computers can understand it. To give meaning to Web information, new
examples include the Resource Description Framework (RDF) (RDF 2002) and the
Web Ontology Language (OWL) (OWL 2004). The descriptive information made
Today, the Semantic Web is not only about increasing the expressiveness of Web
and Web pages. Academia and industry have realized that the Semantic Web can
8
and systems, as well as enable the creation of global infrastructures for sharing
9
THE SEMANTIC WEB
A major drawback of XML is that XML documents do not convey the meaning of
the data contained in the document. Exchange of XML documents over the Web is
only possible if the parties participating in the exchange agree beforehand on the
exact syntactical format (expressed in XML Schema) of the data. The Semantic Web
Semantic Web express links between information resources on the Web and connect
called ontologies. Ontologies form the backbone of the Semantic Web; they allow
links between ontologies. The term “ontology” originates from philosophy and has
been adopted in the field of Computer Science with a slightly different meaning :
In the late 1990s the idea of a Semantic Web boosted interest in the development of
ontologies even further. The general conviction held by the W3C is that the Semantic
Web needs an ontology language that is compatible with current Web standards and
is in fact layered on top of them. The language needs to be expressed in XML and,
10
Ontologies and the Semantic Web
A major reason for the recent increasing interest in ontologies is the development of
the Semantic Web , which can be seen as knowledge management on a global scale.
Tim Berners-Lee, inventor of the current World Wide Web and director of the World
Wide Web Consortium (W3C), envisions the Semantic Web as the next generation of
the current Web. This “next generation” will expand upon the prowess of the current
“The explicit representation of the semantics underlying data, programs, pages, and
representation of semantics. The combination of ontologies with the Web has the
potential to overcome many of the problems in knowledge sharing and reuse and in
of symbols. These symbols, also called terms and relations, can be interpreted by
both humans and machines. The meaning for a human is represented by the term
itself, which is usually a word in natural language, and by the semantic relationships
relationship denotes the fact that one concept (the superconcept) is more general than
another (the subconcept). For instance, the concept Person is more general than
11
Student. Figure below shows an example “is-a” hierarchy (or taxonomy), where the
more general concepts are located above the more specialized concepts.
Concepts describe a set of objects in the real world. For example, the concept PhD-
Student aims to capture all existing PhD students. One such PhD student is Mary,
who is modeled in Fig. as a box, and has an instance of relation to the concept PhD-
Student. This instance-of relationship means that the actual object is captured by the
concept PhD-Student. And because of the formal is-a relationships between the
instance of the concepts Researcher, Student, and Person. These relationships are
fairly easy to understand for the human reader and, because the meanings of the
relationships are formally defined, a machine can reason with them and draw the
same conclusions as a human can. These relationships, which are implicitly known to
humans (e.g. a human knows that every student is a person) are encoded in a
formally explicitly way so that they can be understood by a machine. In a sense, the
machine does not gain real “understanding”, but the understanding of humans is
encoded in such a way that a machine can process it and draw conclusions through
logical reasoning.
12
13
14
The Resource Description Framework
especially for the Semantic Web. RDF was developed as a language for adding
machine-readable metadata to existing data on the Web. RDF uses XML for its
serialization in order to realize the layering depicted in the Semantic Web language
layer cake (Fig. 3.1). RDF Schema [20] extends RDF with some basic (frame-based)
ontological modeling primitives. There are primitives such as classes, properties, and
been introduced, allowing structured class and property hierarchies. RDF has the
An object of a triple can, in turn, function as the subject of another triple, yielding a
directed labeled graph, where resources (subjects and objects) correspond to nodes,
statement about a statement), which means that any RDF statement can be used as a
subject in a triple.
15
LAYER ARCHITECTURE
Tim Berner’s Lee proposed a nine layer architectureas shown above in figure 1. It
representing and manipulating text expressed in most of the world’s writing systems.
it allows its users to define their own elements. Its primary purpose is to help
16
information systems share structured data, particularly via the Internet. XML Schema
used in XML document as element types and attribute names. RDF Resource
description framework actually creates the metadata about the document as a single
entity, i.e. the author of the document, its creation date, its type etc.
important concepts in thedomain and describing about the properties. Some basic
Purpose:-
Logic It is a monotonic Logic. In this layer any rule can export the code but can’t be
imported.
Trust This is the top most layer, where the trustworthiness of information to be
subjectively evaluated
17
CHALLENGES FOR A NEW SEMANTIC WORLD
As with every technological evolution, the Semantic Web and ontologies need to
promote their unique value proposition for specific target groups in order to achieve
adoption. A common pitfall made in the studies of the Semantic Web is the limited
Some of the challenges for the Semantic Web include vastness, vagueness,
uncertainty, inconsistency and deceit. Automated reasoning systems will have to deal
with all of these issues in order to deliver on the promise of the Semantic Web.
Vastness: The World Wide Web contains at least 48 billion pages as of this writing
370,000 class names, and existing technology has not yet been able to eliminate all
semantically duplicated terms. Any automated reasoning system will have to deal
Vagueness: These are imprecise concepts like "young" or "tall". This arises from the
query terms to provider terms and of trying to combine different knowledge bases
with overlapping but subtly different concepts. Fuzzy logic is the most common
Uncertainty: These are precise concepts with uncertain values. For example, a patient
18
diagnoses each with a different probability. Probabilistic reasoning techniques are
Inconsistency: These are logical contradictions which will inevitably arise during the
development of large ontologies, and when ontologies from separate sources are
paraconsistent reasoning are two techniques which can be employed to deal with
inconsistency.
Deceit: This is when the producer of the information is intentionally misleading the
This list of challenges is illustrative rather than exhaustive, and it focuses on the
challenges to the "unifying logic" and "proof" layers of the Semantic Web. The
World Wide Web Consortium (W3C) Incubator Group for Uncertainty Reasoning
for the World Wide Web (URW3-XG) final report lumps these problems together
under the single heading of "uncertainty". Many of the techniques mentioned here
will require extensions to the Web Ontology Language (OWL) for example to
19
Semantic Web Application Areas
our information society, web content is increasingly multiform, inconsistent and very
dynamic. Such content is unsuitable for machine processing, and necessitates human
interpretation and its respective costs in time and money for business. To remedy
this, approaches aim at abstracting of this complexity (e.g. by using ontologies) and
offering new and enriched services able to process those abstractions (e.g., by
mechanized reasoning) in a fully automated way. This abstraction layer is the subject
classical (even sophisticated) statistical analysis, aimed to give far better results (in
terms of precision and recall indicators). The next natural extension was to apply IR
company information in new ways. Present research has turned to focusing on the
1) Knowledge Management
Knowledge is one of the key success factors for enterprises, both today and in the
20
resource management, enterprise organization and culture. We view KM as the
both the use and the creation of that knowledge for two main objectives:
capitalization of corporate knowledge and durable innovation fully aligned with the
and related metadata provide a promising conceptual basis for generating parts of
models of the domain, of the users and of the tasks are needed. The generation of
have to be kept up-to date.Evolution of portals should also include some mechanisms
granularity.
whereas semantic technology opens the way to meet KM requirements in the future.
21
2) E-Commerce
applications enable service providers to promote their offers, and for customers to
find offers which match their demands. By providing unified access to a large
interoperability at the business level, and reducing the need for standardization at the
technical level. This will enable services to adapt to the rapidly changing online
environment. Knowledge based applications of this kind use one or more shared
for humans and computers. This enforces the shared ontology as the standard
ontology for all participating systems, thereby removing the semantic heterogeneity
from the information system. The heterogeneity is a problem because the systems to
space, time, matter, object, event, action, etc, which the shared ontologies can inherit
22
from. Benefits are the integration of heterogeneous information sources, which can
improve interoperability, and more effective use and reuse of knowledge resources.
The medical domain is a favourite target for Semantic Web applications just as the
expert system was for Artificial Intelligence application 20 years ago. The medical
format, making the sharing of information even more difficult. Semantic Web
solutions have become very promising in this context. One of the main mechanisms
these resources (e.g. medical knowledge in the Web or genomic database). The web
4) Other Areas
The diverse application areas of Semantic Technologies also include the following:
• Ambient Intelligence
• Cognitive Systems
• Data Integration
• Software Engineering
• Cognitive Systems
23
• Machine Learning
• eScience
• Information Extraction
• Grid Computing
• Peer-to-Peer Systems
• eGovernment
Following are the key milestones year wise in the history of Semantic Web:
1997 First working draft of the RDF language to define metadata was available
24
1998 Tim Berners Lee published a roadmap to the semantic web that includes query
1999 RDF became a W3C recommendation-a crucial step towards the web’s
2001 A vision of semantic web has broadened the vision further to include trus
The World Wide Web is an interesting paradox -- it's made with computers but for
people. The sites you visit every day use natural language, images and page layout to
present information in a way that's easy for you to understand. Even though they are
central to creating and maintaining the Web, the computers themselves really can't
make sense of all this information. They can't read, see relationships or make
The Semantic Web proposes to help computers "read" and use the Web. The big idea
is pretty simple -- metadata added to Web pages can make the existing World Wide
Web machine readable. This won't bestow artificial intelligence or make computers
self-aware, but it will give machines tools to find, exchange and, to a limited extent,
interpret information. It's an extension of, not a replacement for, the World Wide
Web.
That probably sounds a little abstract, and it is. While some sites are already using
Semantic Web concepts, a lot of the necessary tools are still in development. In this
25
article, we'll bring the concepts and tools behind the Semantic Web down to earth by
maps.
26
based technology written in Java for high-quality
knowledge management
RDF database via its Joseki layer. See the Jena discussion
Platypus Wiki to create a Wiki Page plus metadata according with W3C
27
ASIS (Ada Semantic Interface Specification) for GNAT on
well
Atom 1.0 and 0.3, GRDDL for XHTML and XML. The
XM
28
RDQL and SPARQL languages. It provides APIs for
systems
Protége
Protégé provides a growing user community with a suite of tools to construct domain
- DaimlerChrysler
29
At its core, Protégé implements a rich set of knowledge-modeling structures and
friendly support for creating knowledge models and entering data. Further, Protégé
Protégé-Frames editor enables users to build and populate ontologies that are frame-
based, in accordance with the Open Knowledge Base Connectivity protocol (OKBC).
to describe their properties and relationships, and a set of instances of those classes -
individual exemplars of the concepts that hold specific values for their properties.
Protégé-OWL editor enables users to build ontologies for the Semantic Web, in
particular in the W3C's Web Ontology Language (OWL). "An OWL ontology may
ontology, the OWL formal semantics specifies how to derive its logical
consequences, i.e. facts not literally present in the ontology, but entailed by the
distributed documents that have been combined using defined OWL mechanisms".
30
Some of the particular features of Protégé, not available in many of the other
powering Wikipedia – that helps to search, organize, tag, browse, evaluate, and share
the wiki's content. While traditional wikis contain only texts which computers can
neither understand nor evaluate, SMW adds semantic annotations that bring the
31
Wikis have become a great tool for collecting and sharing knowledge in
communities. This knowledge is mostly contained within texts and multimedia files,
and is thus easily accessible for human readers. But wikis get bigger and bigger, and
Wikipedia should be able to provide the answer: it contains all large cities, their
mayors, and articles about the mayor that tell us about their gender. Yet the question
is almost impossible to answer for a human, since one would have to read all articles
about all large cities first! Even if the answer is found, it might not remain valid for
very long. Computers can deal with large datasets much easier, yet they are not able
to support us very much when seeking answers from a wiki: Even sophisticated
programs cannot yet read and «understand» human-language texts unless the topic
and language of the text is very restricted. The wiki's keyword search does not help
computer-processable, e.g. to answer the above question. The hard problem for the
computer is to find out what the words in a wiki page (e.g. about cities) mean.
Articles contain many names, but which one is the current mayor? Humans can
easily grasp the problem by looking into a language edition of Wikipedia that they do
not understand (Korean is a good start unless you are fluent there). While single
32
relevance in the article. Similarly, computers need some help for making sense of
wiki texts.
pages. For example, someone can mark a name as being the name of the current
mayor. This is done by editors who modify a page and put some special text-markup
around the mayor's name. After this, computers can access this information (of
course they still do not «understand» it, but they can search for it if we ask them to),
Semantic MediaWiki introduces some additional markup into the wiki-text which
allows users to add "semantic annotations" to the wiki. While this first appears to
make things more complex, it can also greatly simplify the structure of the wiki, help
users to find more information in less time, and improve the overall quality and
consistency of the wiki. To illustrate this, we provide some examples from the daily
business of Wikipedia:
Manually generated lists. Wikipedia is full of manually edited listings such as this
one. Those lists are prone to errors, since they have to be updated manually.
automatically like this. They are always up-to-date and can easily be customized to
millions of pages of text, and can hardly be retrieved at all. For example, at the time
33
of this writing, there is no list of female physicists in Wikipedia. When trying to find
all women of this profession that are featured in Wikipedia, one has to resort to
textual search. Obviously, this attempt is doomed to fail miserably. Note that among
the 20 first results, only 5 are about people at all, and that Marie Curie is not
contained in the whole result set (since "female" does not appear on her page).
Again, querying in SMW easily solves this problem (in this case even without further
Inflationary use of categories. The need for better structuring becomes apparent by
the enormous use of categories in Wikipedia. While this is generally helpful, it has
also led to a number of categories that would be mere query results in SMW. For
for people, and 1620s deaths, all of which could easily be replaced by simple queries
Property:date of death would suffice to create thousands of similar listings on the fly,
in different languages, and this can be done for SMW's semantic annotation as well.
With this knowledge, you can ask for the population of Beijing that is given in
Chinese Wikipedia without reading a single word of this language. This can be
exploited to detect possible inconsistencies that can then be resolved by editors. For
34
External reuse. Some desktop tools today make use of Wikipedia's content, e.g. the
media player Amarok displays articles about artists during playback. However, such
reuse is limited to fetching some article for immediate reading. The program cannot
exploit the information (e.g. to find songs of artists that have worked for the same
label), but can only show the text in some other context. SMW leverages a wiki's
knowledge to be useable outside the context of its textual article. Since semantic data
can be published under a free license, it could even be shipped with software to save
AceWiki is a semantic wiki that is powerful and at the same time easy to use.
Making use of the controlled natural language Attempto Control English, ACE, the
formal statements of the wiki are shown in a way that looks like natural English. The
use of controlled natural language makes it easy for everybody to understand the
Introduction
AceWiki shows the formal semantics in controlled English. Thus, the users do not
need to cope with complicated formal languages like RDF or OWL. Unlike most
other semantic wikis, the semantics are contained directly in the article texts and not
properties are mapped one-to-one to linguistic entities like proper names, nouns, of-
constructs, and verbs. In order to help the users to write correct ACE sentences,
35
Design
AceWiki should be easier to use and understand than other semantic wikis. In
wikis, the formal statements are not contained in “annotations” and are not
considered “metadata”, but they are the main content of our wiki. In order to achieve
a good usability and still support a high degree of expressivity, AceWiki follows
By naturalness we mean that the formal semantics has a direct connection to natural
language. Uniformity means that only one language is used at the user-interface
level. Strict user guidance, finally, means that a predictive editor ensures that only
well-formed statements are created by the user. We will now discuss these three
Naturalness
AceWiki is natural in the sense that the ontology is represented in a form that is very
(e.g. “Switzerland”), concepts5 are represented as nouns (e.g. “country”), and roles6
“part of”). Using those words together with the predefined function words of ACE
(e.g. “every”, “if”, “then”, “something”, “and”, “or”, “not”), we can express
36
English sentence, those ontological statements can be immediately understood by
Uniformity
The Semantic Web community defines three categories of languages on the logic
level of the Semantic Web stack: ontology languages (e.g. OWL), rule languages
(e.g. SWRL), and query languages (e.g. SPARQL). Most languages cover only one
of those categories, and languages of different categories look usually very different.
We claim that at the user-interface level ideally one single language should cover all
those categories. In the background, there might be several internal languages, but
the users should need to learn only one. For many users who are not familiar with
ACE is able to represent those different kinds of formal statements in a very natural
way. In the case of queries, this distinction does not need to be made explicit: If a
sentence ends with a question mark then it is clear for the user that this is a query and
not an assertion. However, queries are still future work for AceWiki.
AceWiki classifies declarative ACE sentences into three categories: Some can be
translated into OWL, others can be translated into SWRL, and finally there are ACE
distinction is not visible and we think that users should not bother about it. The only
thing they need to know is that if using an OWL reasoner only the OWL-compliant
37
Strict User Guidance
messages from the parser. Wikis are supposed to enable easy and quick
modifications of the content, and syntax errors can certainly be a major hindrance in
This problem can be solved by guiding the users during the creation of new
editor that guides the user step by step and ensures the syntactic correctness.
correctness. Lexical correctness means that only the words that are defined in a
certain lexicon are used. Grammatical correctness means that the grammar rules are
respected.
To some degree, a predictive editor could also take care of the semantic correctness.
It could prevent the users from adding statements that introduce inconsistency into an
underlying ontology. If the verb “meets”, for example, is defined in the ontology as a
relation between humans then the predictive editor could prevent the user from
writing sentences like “a man meets a car”, assuming that the ontology says that
AceWiki has a predictive editor that is used for the creation and modification of ACE
The semantic correctness is not enforced, but the words that seem to be semantically
suitable are shown first in the list. The suitable words are retrieved on the basis of the
38
hierarchy of concepts and roles and the domain and range restrictions of roles. For
“Limmat flows-through” and there is a range restriction that says “if something
flows-through something Y then Y is a city” then the individuals that are known to
CONCLUSION
The Evolution of Semantic Web has opened a new window in IT and a hope for
better search results on Web. Tim Berner’s Lee rightly says that Semantic web will
be the next generation of current web and the next IT revolution. It is based on the
39
fundamental idea that web resources should be annotated with “Semantic Markup”
that captures information about their meaning. Semantic Web is not far away when
we understand and work on the various ways to make the current web more
meaningful and intelligent web. This can be achieved by knowing about the various
tools, technologies, layers etc of Semantic web which has been summarized in this
paper.
References
1. en.wikipedia.org/wiki/Semantic_Web
2. www.w3.org/2001/sw/
40
3. http://semanticweb.org/wiki/Tools
4. semantic-mediawiki.org
5. attempto.ifi.uzh.ch/acewiki
6. http://www.w3c.com
7. http://en.wikipedia.org/wiki/Taxonomy
8. http://www.m-w.com/dictionary/taxonomy
10. http://www.w3.org/TR/owl-features/
11. The Sementic Web - The Real World Applications from Industry
12.Springer.Enabling.Semantic.Web.Services.Nov.2006
41