You are on page 1of 41


Submitted in partial fulfillment of the requirements for

the award of the degree of



Guide(s): Submitted By:



Roll No. 001164105409

University School of Information Technology

GGS Indraprastha University, Delhi – 6



This is to certify that the Term Paper (IT-655) entitled “SEMANTIC

WEB” done by Mr. MANIT PANWAR, Roll No. 00116404509 is an

authentic work carried out by him at USIT, GGSIPU under my guidance.

The matter embodied in this term work has not been submitted earlier for

the award of any degree or diploma to the best of my knowledge and


Dated: (Signature of the Guide)


Lecturer, USIT



I owe a great many thanks to a great many people who helped and supported

me during the writing of this term paper.

My deepest thanks to Mr S.K. MALIK, the Guide of my term paper for

guiding and correcting various documents of mine with attention and care. She

has taken pain to go through the term paper and make necessary correction as

and when needed.

I express my thanks to the Dean of USIT, GGSIPU, for extending his support.

My deep sense of gratitude to Mr. Amit Prakash Singh, Teacher Incharge of

Term Paper for his support and guidance. Thanks and appreciation to the

helpful people at UIRC and Computer Centre, for their support.

I would also thank my Institution and faculty members without whom this

term paper would have been a distant reality. I also extend my heartfelt thanks

to my family, friends and well wishers.



This paper presents the basic analysis of the semantic web. How this dream can bring

a revolution in the web, businesses, enterprise, AI, Security system etc. everywhere,

because it is an effort toward making the biggest basket of information and data

which we call internet so intelligent that we can get the information what we want, as

close as a person does when asked for something, he will tell exactly what is asked

for and thus saving both time and effort. Semantic Web is going to bring a new

dimension in the field of information technology by giving a better search facility on

the web and is the fire that increasingly generating it’s flame in the IT industry. It is a

collective effort towards making the biggest database from which we can retrieve

information in an intelligent and meaningful manner. This paper focuses the various

aspects of semantic web like: History, Introduction, Tools, Architecture layers .


Almost everyone uses the internet today for their specific purposes, they surf, they

chat, they search, if you look on the internet as a sea of data, data here can be

anything but the medium is the water (the internet), we know what data to search,

most of the time, but yes then also we don’t know the exact location of the data

where it is floating, the water (the internet) is just a medium, you get in, and go on

the search of the data you want but where?, and How? And what exactly our data

looks like? The semantic web, what I can understand in a very broad sense, will

make those dead data live, and tell us where they are (location), what they look like

(the exact data we are looking for) on simple call, i.e. data understand us, it is smart

now, it carries the information about itself, can tell us what it is and why and from

where it is, like a normal person gives his introduction. Now in the unending sea I

can see where my data is, and in my call it will respond. Now, the question is how

the machine will understand what data it is communicating with? That is we have

made the data so smart that it can give the introduction about itself, but to whom?

Not me, as I’m the end user, I’m not interested in knowing the data, I want to use it.

The entity or the person who may be interested in knowing the data is the machine,

now we have to make our machine understand what data is saying, but definitely not

by changing the hardware, instead if I can make the information, the data is giving

so, which can be directly understand by the machine then everything will work fine.

So basically our focus is on the data and the information it carries. Now, the data

from a dead entity which has no information about itself, to live and smart entity,

which carries enough semantic information so that a machine can understand it (or

can process it), has gone through many changes, in the way we thought of it, the way

it was used.


The original idea of the Semantic Web was to bring machine-readable descriptions to

the data and documents already on the Web, in order to improve search and data

usage. The Web was, and in most cases still is, a vast set of static and dynamically

generated Web pages linked together. Pages are written in HTML (Hyper Text

Markup Language), a language that is useful for publishing information intended

only for human consumption. Humans can read Web pages and understand them, but

the inherent meaning is not available in a way that allows interpretation by

computers. The Semantic Web aims at defining ways to allow Web information to be

used by computers not only for display purposes, but also for interoperability and

integration between systems and applications. One way to enable machine-to-

machine exchange and automated processing is to provide the information in such a

way that computers can understand it. To give meaning to Web information, new

standards and languages are being investigated and developed. Well-known

examples include the Resource Description Framework (RDF) (RDF 2002) and the

Web Ontology Language (OWL) (OWL 2004). The descriptive information made

available by these languages allows characterizing individually and precisely the

type of resources in the Web and the relationships between resources.

Today, the Semantic Web is not only about increasing the expressiveness of Web

information to enable the automatic or semiautomatic processing of Web resources

and Web pages. Academia and industry have realized that the Semantic Web can

facilitate the integration and interoperability of intra- and inter-business processes

and systems, as well as enable the creation of global infrastructures for sharing

documents and data, make searching and reusing Information easier.


A major drawback of XML is that XML documents do not convey the meaning of

the data contained in the document. Exchange of XML documents over the Web is

only possible if the parties participating in the exchange agree beforehand on the

exact syntactical format (expressed in XML Schema) of the data. The Semantic Web

allows the representation and exchange of The information in a meaningful way,

facilitating automated processing of descriptions on the Web. Annotations on the

Semantic Web express links between information resources on the Web and connect

information resources to formal terminologies – these connective structures are

called ontologies. Ontologies form the backbone of the Semantic Web; they allow

machine understanding of information through the links between the information

resources and the terms in the ontologies. Furthermore, ontologies facilitate

interoperation between information resources through links to the same ontology or

links between ontologies. The term “ontology” originates from philosophy and has

been adopted in the field of Computer Science with a slightly different meaning :

An ontology is a formal explicit specification of a shared conceptualization.

In the late 1990s the idea of a Semantic Web boosted interest in the development of

ontologies even further. The general conviction held by the W3C is that the Semantic

Web needs an ontology language that is compatible with current Web standards and

is in fact layered on top of them. The language needs to be expressed in XML and,

preferably, should be layered on top of RDF(S)

Ontologies and the Semantic Web

A key feature of ontologies is that, through formal, real-world semantics and

consensual terminologies, they interweave human and machine understanding This

important property of ontologies facilitates the sharing and reuse of ontologies

among humans, as well as among machines.

A major reason for the recent increasing interest in ontologies is the development of

the Semantic Web , which can be seen as knowledge management on a global scale.

Tim Berners-Lee, inventor of the current World Wide Web and director of the World

Wide Web Consortium (W3C), envisions the Semantic Web as the next generation of

the current Web. This “next generation” will expand upon the prowess of the current

Web by adding machinereadable information and automated services. According to ,

“The explicit representation of the semantics underlying data, programs, pages, and

other Web resources will enable a knowledge-based Web that provides a

qualitatively new level of service.” Ontologies provide such an explicit

representation of semantics. The combination of ontologies with the Web has the

potential to overcome many of the problems in knowledge sharing and reuse and in

information integration. Ontologies interweave human and computer understanding

of symbols. These symbols, also called terms and relations, can be interpreted by

both humans and machines. The meaning for a human is represented by the term

itself, which is usually a word in natural language, and by the semantic relationships

between terms. An example of such a human-understandable relationship is a

superconcept – subconcept relationship (often referred to by the term “is-a”). Such a

relationship denotes the fact that one concept (the superconcept) is more general than

another (the subconcept). For instance, the concept Person is more general than

Student. Figure below shows an example “is-a” hierarchy (or taxonomy), where the

more general concepts are located above the more specialized concepts.

Concepts describe a set of objects in the real world. For example, the concept PhD-

Student aims to capture all existing PhD students. One such PhD student is Mary,

who is modeled in Fig. as a box, and has an instance of relation to the concept PhD-

Student. This instance-of relationship means that the actual object is captured by the

concept PhD-Student. And because of the formal is-a relationships between the

concepts PhD-Student, Researcher, Student, and Person, John must also be an

instance of the concepts Researcher, Student, and Person. These relationships are

fairly easy to understand for the human reader and, because the meanings of the

relationships are formally defined, a machine can reason with them and draw the

same conclusions as a human can. These relationships, which are implicitly known to

humans (e.g. a human knows that every student is a person) are encoded in a

formally explicitly way so that they can be understood by a machine. In a sense, the

machine does not gain real “understanding”, but the understanding of humans is

encoded in such a way that a machine can process it and draw conclusions through

logical reasoning.

The Resource Description Framework

The Resource Description Framework (RDF) is the first language developed

especially for the Semantic Web. RDF was developed as a language for adding

machine-readable metadata to existing data on the Web. RDF uses XML for its

serialization in order to realize the layering depicted in the Semantic Web language

layer cake (Fig. 3.1). RDF Schema [20] extends RDF with some basic (frame-based)

ontological modeling primitives. There are primitives such as classes, properties, and

instances. Also, the instance-of, subclass-of, and subproperty-of relationships have

been introduced, allowing structured class and property hierarchies. RDF has the

subject–predicate–object triple, commonly written as P(S,O), as its basic data model.

An object of a triple can, in turn, function as the subject of another triple, yielding a

directed labeled graph, where resources (subjects and objects) correspond to nodes,

and predicates correspond to edges. Furthermore, RDF allows a form of reification (a

statement about a statement), which means that any RDF statement can be used as a

subject in a triple.


Tim Berner’s Lee proposed a nine layer architectureas shown above in figure 1. It

includes Unicode, URI, XML, Namespace, XMLSchema, RDF, RDF schema

Ontology, Digital signature, Logic, Proof and Trust

Following is the description of various Layers:-

Unicode Unicode is a standard way of allowing computers to consistently

representing and manipulating text expressed in most of the world’s writing systems.

URI Uniform Resource Identifier (URI) is a compact string of characters used to

identify or name a resource on the Internet.

XML Extensible Markup Language (XML) is general-purpose specification for

creating custom markup languages. It is classified as an extensible language because

it allows its users to define their own elements. Its primary purpose is to help

information systems share structured data, particularly via the Internet. XML Schema

An XML schema is a description of a type of XML document, typically expressed in

terms of constraints on the structure.

XML Namespace An XML namespace is a collection of Names (identified by a URI)

used in XML document as element types and attribute names. RDF Resource

description framework actually creates the metadata about the document as a single

entity, i.e. the author of the document, its creation date, its type etc.

Ontology Vocabulary It is main layer, consist of hierarchical distribution of

important concepts in thedomain and describing about the properties. Some basic

ontology languages are OWL,DAML-ONT and DAML+OIL etc

Digital Signature Digital Signature Support the notion of trust.


a)Is to digitally sign the document

b) Encryption can be applied to prevent unauthorized access

Logic It is a monotonic Logic. In this layer any rule can export the code but can’t be


Proof Goal is to make the smarter content, so to make machine understandable.

Trust This is the top most layer, where the trustworthiness of information to be

subjectively evaluated


As with every technological evolution, the Semantic Web and ontologies need to

promote their unique value proposition for specific target groups in order to achieve

adoption. A common pitfall made in the studies of the Semantic Web is the limited

focus on “technological perspectives” or, in the other extreme, the difficulty to

communicate the underlying capacity of semantics and ontologies to meet critical

real world challenges.

Some of the challenges for the Semantic Web include vastness, vagueness,

uncertainty, inconsistency and deceit. Automated reasoning systems will have to deal

with all of these issues in order to deliver on the promise of the Semantic Web.

Vastness: The World Wide Web contains at least 48 billion pages as of this writing

(August 2, 2009). The SNOMED CT medical terminology ontology contains

370,000 class names, and existing technology has not yet been able to eliminate all

semantically duplicated terms. Any automated reasoning system will have to deal

with truly huge inputs.

Vagueness: These are imprecise concepts like "young" or "tall". This arises from the

vagueness of user queries, of concepts represented by content providers, of matching

query terms to provider terms and of trying to combine different knowledge bases

with overlapping but subtly different concepts. Fuzzy logic is the most common

technique for dealing with vagueness.

Uncertainty: These are precise concepts with uncertain values. For example, a patient

might present a set of symptoms which correspond to a number of different distinct

diagnoses each with a different probability. Probabilistic reasoning techniques are

generally employed to address uncertainty.

Inconsistency: These are logical contradictions which will inevitably arise during the

development of large ontologies, and when ontologies from separate sources are

combined. Deductive reasoning fails catastrophically when faced with inconsistency,

because "anything follows from a contradiction". Defeasible reasoning and

paraconsistent reasoning are two techniques which can be employed to deal with


Deceit: This is when the producer of the information is intentionally misleading the

consumer of the information. Cryptography techniques are currently utilized to

ameliorate this threat.

This list of challenges is illustrative rather than exhaustive, and it focuses on the

challenges to the "unifying logic" and "proof" layers of the Semantic Web. The

World Wide Web Consortium (W3C) Incubator Group for Uncertainty Reasoning

for the World Wide Web (URW3-XG) final report lumps these problems together

under the single heading of "uncertainty". Many of the techniques mentioned here

will require extensions to the Web Ontology Language (OWL) for example to

annotate conditional probabilities. This is an area of active research,[ So3.

Semantic Web Application Areas

As a result of the pervasive and user-friendly digital technologies emerging within

our information society, web content is increasingly multiform, inconsistent and very

dynamic. Such content is unsuitable for machine processing, and necessitates human

interpretation and its respective costs in time and money for business. To remedy

this, approaches aim at abstracting of this complexity (e.g. by using ontologies) and

offering new and enriched services able to process those abstractions (e.g., by

mechanized reasoning) in a fully automated way. This abstraction layer is the subject

of a very dynamic activity in research, industry and standardization which is usually

called "Semantic Web".

The initial application of Semantic Web technology has focused on Information

Retrieval (IR) where access through semantically annotated content, instead of

classical (even sophisticated) statistical analysis, aimed to give far better results (in

terms of precision and recall indicators). The next natural extension was to apply IR

in the integration of enterprise legacy databases in order to leverage existing

company information in new ways. Present research has turned to focusing on the

seamless integration of heterogeneous and distributed applications and services.

Some of the application areas of Semantic Web are:

1) Knowledge Management

Knowledge is one of the key success factors for enterprises, both today and in the

future. Therefore, company knowledge management has been identified as a strategic

tool. However, if information technology is one of the foundational elements of KM;

KM, in turn, is also interdisciplinary by its nature. In particular, it includes human

resource management, enterprise organization and culture. We view KM as the

management of the knowledge arising from business activities, aiming at leveraging

both the use and the creation of that knowledge for two main objectives:

capitalization of corporate knowledge and durable innovation fully aligned with the

strategic objectives of the organization. The development of knowledge portals

serving the needs of companies or communities is still a manual process. Ontologies

and related metadata provide a promising conceptual basis for generating parts of

such knowledge portals. Obviously, among others, conceptual

models of the domain, of the users and of the tasks are needed. The generation of

knowledge portals has to be supplemented with the semi-automated evolution of

portals. As business environments and strategies change rather rapidly, KM portals

have to be kept up-to date.Evolution of portals should also include some mechanisms

to ‘forget’ outdated knowledge.

KM solutions based on a combination of intranet-based functionalities and mobile

functionalities will be available very near future. Semantic Web technology is a

promising approach to meet the needs of mobile environments, like location-aware

personalization and adaptation of the presentation to the specific needs of mobile

devices, i.e. the presentation of the required information at an appropriate level of


Knowledge Management is obviously a very promising area for exploiting Semantic

Web technology. Document-based KM solutions have already reached their limits,

whereas semantic technology opens the way to meet KM requirements in the future.

2) E-Commerce

Electronic commerce is mainly based on the exchange of information between

involved stakeholders using a telecommunication infrastructure.There are two main

scenarios: Business-to-Customer (B2C) and Business-to-Business (B2B). B2C

applications enable service providers to promote their offers, and for customers to

find offers which match their demands. By providing unified access to a large

collection of frequently updated offers and customers, an electronic marketplace can

match the demand and supply processes within a commercial mediation

environment. B2B applications have a long history of using electronic messaging

to exchange information related to services previously agreed among two or more

businesses. A knowledge-based approach has the potential to significantly accelerate

the penetration of electronic commerce within vertical industry sectors, by enabling

interoperability at the business level, and reducing the need for standardization at the

technical level. This will enable services to adapt to the rapidly changing online

environment. Knowledge based applications of this kind use one or more shared

ontologies to integrate heterogeneous information systems and allow common access

for humans and computers. This enforces the shared ontology as the standard

ontology for all participating systems, thereby removing the semantic heterogeneity

from the information system. The heterogeneity is a problem because the systems to

be integrated are already operational and it is too costly to redevelop them. A

linguistic ontology is sometimes used to assist in the generation of the shared

ontology or is used as a top-level ontology, describing very general concepts like

space, time, matter, object, event, action, etc, which the shared ontologies can inherit

from. Benefits are the integration of heterogeneous information sources, which can

improve interoperability, and more effective use and reuse of knowledge resources.

3) Biosciences and Medical Applications

The medical domain is a favourite target for Semantic Web applications just as the

expert system was for Artificial Intelligence application 20 years ago. The medical

domain is very complex: medical knowledge is difficult to represent in a computer

format, making the sharing of information even more difficult. Semantic Web

solutions have become very promising in this context. One of the main mechanisms

of the Semantic Web - resource description using annotation principles - is of major

importance in the medical informatics domain, especially as regards the sharing of

these resources (e.g. medical knowledge in the Web or genomic database). The web

services technology allows us to imagine some solutions to the interoperability

problem, which is substantial in medical informatics.

4) Other Areas

The diverse application areas of Semantic Technologies also include the following:

• Ambient Intelligence

• Cognitive Systems

• Data Integration

• Multimedia Data Management

• Software Engineering

• Cognitive Systems

• Machine Learning

• eScience

• Information Extraction

• Grid Computing

• Peer-to-Peer Systems

• eGovernment


Following are the key milestones year wise in the history of Semantic Web:

1989 Tim Berner Lee proposed WWW to CERN as a Development project.

1991 Portable browser available and distributed.

1994 • Netscape was released as a commercial browser

• Yahoo acted as search engine

• There were 2500 web servers at that time.

1995 • There were 73500 web servers at that time.

• Microsoft released IE and W3C was established as a standard body

1996 Semantic web initiated

1997 First working draft of the RDF language to define metadata was available

1998 Tim Berners Lee published a roadmap to the semantic web that includes query

language, inference rules and proof validation

1999 RDF became a W3C recommendation-a crucial step towards the web’s

interoperability and functionality.

2001 A vision of semantic web has broadened the vision further to include trus


The World Wide Web is an interesting paradox -- it's made with computers but for

people. The sites you visit every day use natural language, images and page layout to

present information in a way that's easy for you to understand. Even though they are

central to creating and maintaining the Web, the computers themselves really can't

make sense of all this information. They can't read, see relationships or make

decisions like you can.

The Semantic Web proposes to help computers "read" and use the Web. The big idea

is pretty simple -- metadata added to Web pages can make the existing World Wide

Web machine readable. This won't bestow artificial intelligence or make computers

self-aware, but it will give machines tools to find, exchange and, to a limited extent,

interpret information. It's an extension of, not a replacement for, the World Wide


That probably sounds a little abstract, and it is. While some sites are already using

Semantic Web concepts, a lot of the necessary tools are still in development. In this

article, we'll bring the concepts and tools behind the Semantic Web down to earth by

applying them to a galaxy far, far away.

Comprehensive Listing of Few New Semantic Web Tools

The AMALGAM (Automatic Mapping Among Lexico-

Grammatical Annotation Models) project is an attempt to

create a set of mapping algorithms to map between the

main tagsets and phrase structure grammar schemes used in

various research corpora. Software has been developed to

tag text with up to 8 annotation schemes

Amine is a Multi-Layer Platform implemented in Java. It

provides various Engines and GUIs to build a wide variety

of Ontology-based applications, Conceptual Graph based

applications, Intelligent Systems and Multi-Agents Systems

Anacubis is a visual analysis tool the lets its users visualize

the relationships between entities in a collection of

information. The visualization is rather similar to concept


Exteca The Exteca and document categorisation. It can be used in

conjunction with search engines platform is an ontology-

based technology written in Java for high-quality

knowledge management

Jena is a Java framework to construct Semantic Web

Applications. It provides a programmatic environment for

RDF, RDFS and OWL, SPARQL and includes a rule-based

inference engine. It also has the ability to be used as an

RDF database via its Joseki layer. See the Jena discussion

list for more information

Pedro is an application that creates data entry forms based

on a data model written in a particular style of XML

Schema. Users can enter data through the forms to create

data files that conform to the schema. They can use

controlled vocabularies to mark-up text fields and have the

application perform basic validation on field data

Platypus Wiki is an enhanced Wiki Wiki Web with ideas

taken from Semantic Web. It offers a simple user interface

Platypus Wiki to create a Wiki Page plus metadata according with W3C

standards. It uses RDF/RDFS and OWL to create

ontologies and manage metadata

Protege+OWL+Ruby (POR) Utilities provides an ontology,

a set of ruby classes and methods to simplify the

development of Protege+OWL Ontology Driven

applications. At the moment project is limited to JRuby

ASIS (Ada Semantic Interface Specification) for GNAT on

gcc. ASIS is a published international ISO standard

(ISO/IEC 15291:1999). ASIS based tools are available as


ATLAS (Architecture and Tools for Linguistic Analysis

Systems) is a joint initiative of NIST, MITRE and the LDC

ATLAS to build a general purpose annotation architecture and a

data interchange format. The starting point is the annotation

graph model, with some significant generalizations

Swoogle A semantic Web search engine with 1.5 M resources

SWOOP A lightweight ontology editor

The Raptor RDF parser toolkit is a free software / Open

Source C library that provides a set of parsers and

serializers that generate Resource Description Framework

(RDF) triples by parsing syntaxes or serialize the triples

into >a syntax. The supported parsing syntaxes are

RDF/XML, N-Triples, Turtle, RSS tag soup including

Atom 1.0 and 0.3, GRDDL for XHTML and XML. The

serializing syntaxes are RDF/XML (regular, and

abbreviated), N-Triples, RSS 1.0, Atom 1.0 and Adobe


Rasqual Rasqal is a C library for querying RDF, supporting the

RDQL and SPARQL languages. It provides APIs for

creating a query and parsing query syntax. It features

pluggable triple-store source and matching interfaces, an

engine for executing the queries and an API for

manipulating results as bindings. It uses the Raptor RDF

parser to return triples from RDF content and can

alternatively work with the Redland RDF library’s

persistent triple stores. It is portable across many POSIX



Protégé provides a growing user community with a suite of tools to construct domain

models and knowledge-based applications with ontologies. Protégé is a free, open-

source platform developed by Stanford Medical Informatics with support from:

- Defense Advance Research Projects Agency

-National Cancer Institute

- National Institute of Standards and Technology

- National Library of Medicine

- National Science Foundation

with additional support from its affiliates:

- DaimlerChrysler

- iSOCO: Intelligent Software for the Networked Economy

At its core, Protégé implements a rich set of knowledge-modeling structures and

actions that support the creation, visualization, and manipulation of ontologies in

various representation formats. Protégé can be customized to provide domain-

friendly support for creating knowledge models and entering data. Further, Protégé

can be extended by way of a plug-in architecture and a Java-based Application

Programming Interface (API) for building knowledge-based tools and applications.

The Protégé platform supports two main ways of modeling ontologies:

Protégé-Frames editor enables users to build and populate ontologies that are frame-

based, in accordance with the Open Knowledge Base Connectivity protocol (OKBC).

In this model, an ontology consists of a set of classes organized in a subsumption

hierarchy to represent a domain's salient concepts, a set of slots associated to classes

to describe their properties and relationships, and a set of instances of those classes -

individual exemplars of the concepts that hold specific values for their properties.

Protégé-OWL editor enables users to build ontologies for the Semantic Web, in

particular in the W3C's Web Ontology Language (OWL). "An OWL ontology may

include descriptions of classes, properties and their instances. Given such an

ontology, the OWL formal semantics specifies how to derive its logical

consequences, i.e. facts not literally present in the ontology, but entailed by the

semantics. These entailments may be based on a single document or multiple

distributed documents that have been combined using defined OWL mechanisms".

Protégé ontologies can be exported into a variety of formats including RDF(S),

OWL, and XML Schema.

Outstanding Protégé features

Some of the particular features of Protégé, not available in many of the other

ontology building tools, are following:

Automatic generation of graphical-user interfaces, based on user-defined models, for

acquiring domain instances

Extensible knowledge model and architecture

Scalability to very large knowledge bases

Semantic MediaWiki (Semantic annotation tool)

Semantic MediaWiki (SMW) is a free extension of MediaWiki – the wiki-system

powering Wikipedia – that helps to search, organize, tag, browse, evaluate, and share

the wiki's content. While traditional wikis contain only texts which computers can

neither understand nor evaluate, SMW adds semantic annotations that bring the

power of the Semantic Web to the wiki.

Introduction to Semantic Mediawiki

Wikis have become a great tool for collecting and sharing knowledge in

communities. This knowledge is mostly contained within texts and multimedia files,

and is thus easily accessible for human readers. But wikis get bigger and bigger, and

it can be very time-consuming to look for an answer inside a wiki. As a simple

example, consider the following question a user might have:

« What are the hundred world-largest cities with a female mayor? »

Wikipedia should be able to provide the answer: it contains all large cities, their

mayors, and articles about the mayor that tell us about their gender. Yet the question

is almost impossible to answer for a human, since one would have to read all articles

about all large cities first! Even if the answer is found, it might not remain valid for

very long. Computers can deal with large datasets much easier, yet they are not able

to support us very much when seeking answers from a wiki: Even sophisticated

programs cannot yet read and «understand» human-language texts unless the topic

and language of the text is very restricted. The wiki's keyword search does not help

either in discovering complex relationships.

Semantic MediaWiki enables wiki communities to make some of their knowledge

computer-processable, e.g. to answer the above question. The hard problem for the

computer is to find out what the words in a wiki page (e.g. about cities) mean.

Articles contain many names, but which one is the current mayor? Humans can

easily grasp the problem by looking into a language edition of Wikipedia that they do

not understand (Korean is a good start unless you are fluent there). While single

tokens (names, numbers,…) might be readable, it is impossible to understand their

relevance in the article. Similarly, computers need some help for making sense of

wiki texts.

In Semantic MediaWiki, editors therefore add «hints» to the information in wiki

pages. For example, someone can mark a name as being the name of the current

mayor. This is done by editors who modify a page and put some special text-markup

around the mayor's name. After this, computers can access this information (of

course they still do not «understand» it, but they can search for it if we ask them to),

and support users in many different ways.

Where SMW can help

Semantic MediaWiki introduces some additional markup into the wiki-text which

allows users to add "semantic annotations" to the wiki. While this first appears to

make things more complex, it can also greatly simplify the structure of the wiki, help

users to find more information in less time, and improve the overall quality and

consistency of the wiki. To illustrate this, we provide some examples from the daily

business of Wikipedia:

Manually generated lists. Wikipedia is full of manually edited listings such as this

one. Those lists are prone to errors, since they have to be updated manually.

Furthermore, the number of potentially interesting lists is huge, and it is impossible

to provide all of them in acceptable quality. In SMW, lists are generated

automatically like this. They are always up-to-date and can easily be customized to

obtain further information.

Searching information. Much of Wikipedia's knowledge is hopelessly buried within

millions of pages of text, and can hardly be retrieved at all. For example, at the time

of this writing, there is no list of female physicists in Wikipedia. When trying to find

all women of this profession that are featured in Wikipedia, one has to resort to

textual search. Obviously, this attempt is doomed to fail miserably. Note that among

the 20 first results, only 5 are about people at all, and that Marie Curie is not

contained in the whole result set (since "female" does not appear on her page).

Again, querying in SMW easily solves this problem (in this case even without further

annotation, since existing categories suffice to find the results).

Inflationary use of categories. The need for better structuring becomes apparent by

the enormous use of categories in Wikipedia. While this is generally helpful, it has

also led to a number of categories that would be mere query results in SMW. For

some examples consider the categories Rivers in Buckinghamshire, Asteroids named

for people, and 1620s deaths, all of which could easily be replaced by simple queries

that use just a handful of annotations. Indeed, in this example Category:Rivers,

Property:located in, Category:Asteroids, Category:People, Property:named after, and

Property:date of death would suffice to create thousands of similar listings on the fly,

and to remove hundreds of Wikipedia categories.

Inter-language consistency. Most articles in Wikipedia are linked to according pages

in different languages, and this can be done for SMW's semantic annotation as well.

With this knowledge, you can ask for the population of Beijing that is given in

Chinese Wikipedia without reading a single word of this language. This can be

exploited to detect possible inconsistencies that can then be resolved by editors. For

example, the population of Edinburgh at the time of this writing is different in

English, German, and French Wikipedia.

External reuse. Some desktop tools today make use of Wikipedia's content, e.g. the

media player Amarok displays articles about artists during playback. However, such

reuse is limited to fetching some article for immediate reading. The program cannot

exploit the information (e.g. to find songs of artists that have worked for the same

label), but can only show the text in some other context. SMW leverages a wiki's

knowledge to be useable outside the context of its textual article. Since semantic data

can be published under a free license, it could even be shipped with software to save

bandwidth and download time.

AceWiki (A Natural and Expressive Semantic Wiki)

AceWiki is a semantic wiki that is powerful and at the same time easy to use.

Making use of the controlled natural language Attempto Control English, ACE, the

formal statements of the wiki are shown in a way that looks like natural English. The

use of controlled natural language makes it easy for everybody to understand the

semantics of the wiki.


AceWiki shows the formal semantics in controlled English. Thus, the users do not

need to cope with complicated formal languages like RDF or OWL. Unlike most

other semantic wikis, the semantics are contained directly in the article texts and not

in some form of annotations. Ontological entities like individuals, concepts, and

properties are mapped one-to-one to linguistic entities like proper names, nouns, of-

constructs, and verbs. In order to help the users to write correct ACE sentences,

AceWiki provides a predictive editor.


The main goal of AceWiki is to improve knowledge aggregation and representation.

AceWiki should be easier to use and understand than other semantic wikis. In

addition, it should support a higher degree of expressivity. Unlike other semantic

wikis, the formal statements are not contained in “annotations” and are not

considered “metadata”, but they are the main content of our wiki. In order to achieve

a good usability and still support a high degree of expressivity, AceWiki follows

three design principles: naturalness, uniformity, and strict user guidance.

By naturalness we mean that the formal semantics has a direct connection to natural

language. Uniformity means that only one language is used at the user-interface

level. Strict user guidance, finally, means that a predictive editor ensures that only

well-formed statements are created by the user. We will now discuss these three

principles and show how they are achieved in AceWiki.


AceWiki is natural in the sense that the ontology is represented in a form that is very

close to natural language. This requires a direct mapping of ontological entities to

natural language words. In AceWiki, individuals are represented as proper names

(e.g. “Switzerland”), concepts5 are represented as nouns (e.g. “country”), and roles6

are represented as transitive verbs (e.g. “overlaps-with”) or as of-constructs (e.g.

“part of”). Using those words together with the predefined function words of ACE

(e.g. “every”, “if”, “then”, “something”, “and”, “or”, “not”), we can express

ontological statements as ACE sentences. Since every ACE sentence is a valid

English sentence, those ontological statements can be immediately understood by

any English speaker.


The Semantic Web community defines three categories of languages on the logic

level of the Semantic Web stack: ontology languages (e.g. OWL), rule languages

(e.g. SWRL), and query languages (e.g. SPARQL). Most languages cover only one

of those categories, and languages of different categories look usually very different.

We claim that at the user-interface level ideally one single language should cover all

those categories. In the background, there might be several internal languages, but

the users should need to learn only one. For many users who are not familiar with

formal conceptualizations, learning one formal language is already a hard task. We

should not make this learning effort harder than necessary.

ACE is able to represent those different kinds of formal statements in a very natural

way. In the case of queries, this distinction does not need to be made explicit: If a

sentence ends with a question mark then it is clear for the user that this is a query and

not an assertion. However, queries are still future work for AceWiki.

AceWiki classifies declarative ACE sentences into three categories: Some can be

translated into OWL, others can be translated into SWRL, and finally there are ACE

sentences that have no representation in OWL or SWRL at all. In ACE, this

distinction is not visible and we think that users should not bother about it. The only

thing they need to know is that if using an OWL reasoner only the OWL-compliant

sentences are considered.

Strict User Guidance

Learning a new formal language is normally accompanied by frequent syntax error

messages from the parser. Wikis are supposed to enable easy and quick

modifications of the content, and syntax errors can certainly be a major hindrance in

this respect, especially for new users.

This problem can be solved by guiding the users during the creation of new

statements in a strict manner. By strict we mean that the creation of syntactically

incorrect sentences is simply made impossible. This can be achieved by a predictive

editor that guides the user step by step and ensures the syntactic correctness.

Syntactic correctness can be subdivided into lexical correctness and grammatical

correctness. Lexical correctness means that only the words that are defined in a

certain lexicon are used. Grammatical correctness means that the grammar rules are


To some degree, a predictive editor could also take care of the semantic correctness.

It could prevent the users from adding statements that introduce inconsistency into an

underlying ontology. If the verb “meets”, for example, is defined in the ontology as a

relation between humans then the predictive editor could prevent the user from

writing sentences like “a man meets a car”, assuming that the ontology says that

“car” is not human.

AceWiki has a predictive editor that is used for the creation and modification of ACE

sentences. It ensures lexical and grammatical correctness of the resulting sentences.

The semantic correctness is not enforced, but the words that seem to be semantically

suitable are shown first in the list. The suitable words are retrieved on the basis of the

hierarchy of concepts and roles and the domain and range restrictions of roles. For

example, if a user creates the incomplete sentence

“Limmat flows-through” and there is a range restriction that says “if something

flows-through something Y then Y is a city” then the individuals that are known to

be cities are shown first in the list.


The Evolution of Semantic Web has opened a new window in IT and a hope for

better search results on Web. Tim Berner’s Lee rightly says that Semantic web will

be the next generation of current web and the next IT revolution. It is based on the

fundamental idea that web resources should be annotated with “Semantic Markup”

that captures information about their meaning. Semantic Web is not far away when

we understand and work on the various ways to make the current web more

meaningful and intelligent web. This can be achieved by knowing about the various

tools, technologies, layers etc of Semantic web which has been summarized in this











9. web ontology working group (


11. The Sementic Web - The Real World Applications from Industry



You might also like