You are on page 1of 32

WEB TECHNOLOGIES

UNIT-II

Kishore Kumar M

Contents:

 XML:

 Introduction to XML, Defining XML tags, their attributes and


values
 Document Type Definition
 XML Schemas
 Document Object Model
 XHTML

 Parsing XML Data:

 DOM and SAX Parsers in Java

Kishore.mamidala@gmail.com Page 1
XML stands for Extensible Markup Language. It is a text-based markup language
derived from Standard Generalized Markup Language (SGML).
XML tags identify the data and are used to store and organize the data, rather than
specifying how to display it like HTML tags, which are used to
display the data. XML is not going to replace HTML in the near
future, but it introduces new possibilities by adopting many
successful features of HTML.
There are three important characteristics of XML that make it useful in a variety
of systems and solutions:
 XML is extensible: XML allows you to create your own
self-descriptive tags, or language, that suits your
application.
 XML carries the data, does not present it: XML allows
you to store the data irrespective of how it will be presented.
 XML is a public standard: XML was developed by an
organization called the World Wide Web Consortium
(W3C) and is available as an open standard.
XMLUsage
A short list of XML usage says it all:

 XML can work behind the scene to simplify the creation of

 HTML documents for large web sites.

XML can be used to exchange the information between organizations and


systems.

 XML can be used for offloading and reloading of databases.

 XML can be used to store and arrange the data, which can customize your
data handling needs.
 XML can easily be merged with style sheets to create almost any desired output.
 Virtually, any type of data can be expressed as an XML document.

Kishore.mamidala@gmail.com Page 2
What isMarkup?
XML is a markup language that defines set of rules for encoding documents in a format
that is both human-readable and machine-readable. So what exactly is a markup language?
Markup is information added to a document that enhances its meaning in certain ways,
in that it identifies the parts and how they relate to each other. More specifically, a markup
language is a set of symbols that can be placed in the text of a document to demarcate and
label the parts of that document.
Following example shows how XML markup looks, when embedded in a piece of
text:

<message>
<text>Hello, world!</text>
</message>
This snippet includes the markup symbols, or the tags such as
<message>...</message> and <text>...</text>. The tags <message> a nd
</message> mark the start and the end of the XML code fragment. The tags <text> &
</text> surround the text Hello,World!.
Is XMLa ProgrammingLanguage?
A programming language consists of grammar rules and its own vocabulary which is used to
create computer programs. These programs instructs computer to perform specific tasks.
perform any computation or algorithms. It is usually stored in a simple text file and
is processed by special software that is capable of interpreting XML.
Tags and Elements
An XML file is structured by several XML-elements, also called XML-nodes or XML- tags.
XML-elements' names are enclosed by triangular brackets < > as shown below:
<element>

Syntax Rules for Tags and Elements


Element Syntax: Each XML-element needs to be closed either with start or with end
elements as shown below:
<element>....</element>

or in simple-cases, just this way:

Kishore.mamidala@gmail.com Page 3
<element />

Nesting of elements: An XML-element can contain multiple XML-elements as its children,


but the children elements must not overlap. i.e., an end tag of an element must have the same
name as that of the most recent unmatched start tag.
Following example shows incorrect nested tags:
<?xml version="1.0"?>
<contact-info>
<company>V I T S
<contact-info>
</company>

Following example shows correct nested tags:


<?xml version="1.0"?>
<contact-info>
<company>VITS</company>
<contact-info>
Let us learn about one of the most important part of XML, the XML tags. XML
tags form the foundation of XML. They define the scope of an element in the
XML. They can also be used to insert comments, declare settings required for
parsing the environment and to insert special instructions.
We can broadly categorize XML tags as follows:

Start Tag: The beginning of every non-empty XML element is marked by a


start-tag. An example of start-tag is:
<address>

End Tag: Every element that has a start tag should end with an end-tag. An
example of end- tag is:
</address>

Note that the end tags include a solidus ("/") before the name of an element

EmptyTag

Kishore.mamidala@gmail.com Page 4
The text that appears between start-tag and end-tag is called content. An element which has
no content is termed as empty. An empty element can be represented in two ways as below:
(1) A start-tag immediately followed by an end-tag as shown below:
<hr> </hr>
(2) A complete empty-element tag is as shown below:
<hr />

Empty-element tags may be used for any element which has no content.
XML TagsRules
Following are the rules that need to be followed to use XML tags:
Rule 1
XML tags are case-sensitive. Following line of code is an example of wrong syntax
</Address>, because of the case difference in two tags, which is treated as erroneous syntax in
XML.
<address>This is wrong syntax</Address>
Following code shows a correct way, where we use the same case to name the start and
theend tag.
<address>This is correct syntax</address>
Rule 2
XML tags must be closed in an appropriate order, i.e., an XML tag opened inside another
element must be closed before the outer element is closed. For example:
<outer_element>
<internal_element>
This tag is closed before the outer_element
</internal_element>
</outer_element>
XMLElements
XML elements can be defined as building blocks of an XML. Elements can
behave as containers to hold text, elements, attributes, media objects or all of
these.
Each XML document contains one or more elements, the scope of which are
either delimited by start and end tags, or for empty elements, by an empty-
element tag.
Kishore.mamidala@gmail.com Page 5
Syntax
Following is the syntax to write an XML element:
<element-name attribute1 attribute2>
....content
</element-name>

where
 element-name is the name of the element. The name its case in the start and end tags
must match.
 attribute1, attribute2 are attributes of the element separated by white spaces.
An attribute defines a property of the element. It associates a name with a value, which
is a string of characters. An attribute
is written as:
name = "value"

The name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.
EmptyElement
An empty element (element with no content) has following syntax:
<name attribute1 attribute2.../>

Example of an XML document using various XML element:


<?xml version="1.0"?>
<contact-info>
<address category="residence">
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
<address/>
</contact-info>

Example of XML Document

XML documents uses a self-describing and simple syntax:

Kishore.mamidala@gmail.com Page 6
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

The first line is the XML declaration. It defines the XML version (1.0) and the encoding used
(ISO-8859-1 = Latin-1/West European character set).

The next line describes the root element of the document (like saying: "this document is a note"):

The next 4 lines describe 4 child elements of the root (to, from, heading, and body).

XML ElementsRules
Following rules are required to be followed for XML elements:
 An element name can contain any alphanumeric characters. The only punctuation
marks allowed in names are the hyphen (-), under-score (_) and period (.).
 Names are case sensitive. For example, Address, address, and ADDRESS are
different names.
 Start and end tags of an element must be identical.
 An element, which is a container, can contain text or elements as seen in the above
example.
Root element: An XML document can have only one root element. For example, following
is not a correct XML document, because both the x and y elements occur at the top level
without a root element:
<x>...</x>
<y>...</y>
The following example shows a correctly formed XML
document:
<root>
<x>...</x>
<y>...</y>
</root>

Kishore.mamidala@gmail.com Page 7
Case sensitivity: The names of XML-elements are case-sensitive. That means the name
of the start and the end elements need to be exactly in the same case.
For example, <contact-info> is different from <Contact-
Info>.
XML Attributes

XML elements can have attributes. By the use of attributes we can add the information about the
element.

XML attributes enhance the properties of the elements.

Let us take an example of a book publisher. Here, book is the element and publisher is the
attribute.

<book publisher="Tata McGraw Hill"></book>

Metadata should be stored as attribute and data should be stored as element.

<book>
<book category="computer">
<author> A & B </author>
</book>

Why should we avoid XML attributes

o Attributes cannot contain multiple values but child elements can have multiple values.
o Attributes cannot contain tree structure but child element can.
o Attributes are not easily expandable. If you want to change in attribute's vales in future, it
may be complicated.
o Attributes cannot describe structure but child elements can.
o Attributes are more difficult to be manipulated by program code.

Attributes values are not easy to test against a DTD, which is used to define the legal
elements of an XML document.
XML Comments

XML comments are just like HTML comments. We know that the comments are used to make
codes more understandable other developers.

XML Comments add notes or lines for understanding the purpose of an XML code. Although
XML is known as self-describing data but sometimes XML comments are necessary.

Kishore.mamidala@gmail.com Page 8
Syntax

An XML comment should be written as:

<!-- Write your comment-->


XML Tree Structure

An XML document has a self descriptive structure. It forms a tree structure which is referred as
an XML tree. The tree structure makes easy to describe an XML document.

A tree structure contains root element (as parent), child element and so on. It is very easy to
traverse all succeeding branches and sub-branches and leaf nodes starting from the root.

Let's see the tree-structure representation of the above

In the above example, first line is the XML declaration. It defines the XML version 1.0. Next
line shows the root element (college) of the document. Inside that there is one more element
(student). Student element contains five branches named <firstname>, <lastname>, <contact>,
<Email> and <address>.

<address> branch contains 3 sub-branches named <city>, <state> and <pin>.

XML Tree Rules

These rules are used to figure out the relationship of the elements. It shows if an element is a
child or a parent of the other element.

Descendants: If element A is contained by element B, then A is known as descendant of B. In


the above example "College" is the root element and all the other elements are the descendants of
"College".

Kishore.mamidala@gmail.com Page 9
Ancestors: The containing element which contains other elements is called "Ancestor" of other
element. In the above example Root element (College) is ancestor of all other elements.

XML Validation

A well formed XML document can be validated against DTD or Schema.

A well-formed XML document is an XML document with correct syntax. It is very necessary to
know about valid XML document before knowing XML validation.

Valid XML document

It must be well formed (satisfy all the basic syntax condition)

It should be behave according to predefined DTD or XML schema

Rules for well formed XML

o It must begin with the XML declaration.


o It must have one unique root element.
o All start tags of XML documents must match end tags.
o XML tags are case sensitive.
o All elements must be closed.
o All elements must be properly nested.
o All attributes values must be quoted.
o XML entities must be used for special characters

XML DTD

A DTD defines the legal elements of an XML document

In simple words we can say that a DTD defines the document structure with a list of legal
elements and attributes.

XML schema is a XML based alternative to DTD.

Actually DTD and XML schema both are used to form a well formed XML document.

We should avoid errors in XML documents because they will stop the XML programs.

XML schema

It is defined as an XML language

Kishore.mamidala@gmail.com Page 10
Uses namespaces to allow for reuses of existing definitions

It supports a large number of built in data types and definition of derived data types

XML Document Type Declaration, commonly known as DTD, is a way to describe XML
language precisely. DTDs check vocabulary and validity of the structure of XML
documents against grammatical rules of appropriate XML language.
An XML DTD can be either specified inside the document, or it can be kept in a
separate document and then liked separately.
Synt
ax
Basic syntax of a DTD is as
follows:
<!DOCTYPE element DTD identifier
 DTD identifier is an identifier for the document type definition, which may be
the path to a file on the system or URL to a file on the internet. If the DTD is
pointing to external path, it is called External Subset.
 The square brackets [ ] enclose an optional list of entity declarations called
Internal Subset.

Purpose of DTD

Its main purpose is to define the structure of an XML document. It contains a list of legal
elements and define the structure with the help of them.

Checking Validation

Before proceeding with XML DTD, you must check the validation. An XML document is called
"well-formed" if it contains the correct syntax.

A well-formed and valid XML document is one which have been validated against DTD.

Visit http://www.xmlvalidation.com to validate the XML file.

Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files.
To refer it as internal DTD, standalone attribute in XML declaration must be set to
Kishore.mamidala@gmail.com Page 11
yes. This means, the declaration works independent of external source.
Syntax
The syntax of internal DTD is as shown:
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and element-declarations is where you
declare the elements.
Example
Following is a simple example of internal DTD:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Kishore Kumar</name>
<company>VITS</company>
<phone>9666699000</phone>
</address>
Let us go through the above code:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
Start Declaration- Begin the XML declaration with following statement

DTD- Immediately after the XML header, the document type declaration follows,
commonly referred to as the DOCTYPE:
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name.
The DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body- The DOCTYPE declaration is followed by body of the DTD, where you
declare elements, attributes, entities, and notations:
<!ELEMENT address (name,company,phone)>

Kishore.mamidala@gmail.com Page 12
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>

Several elements are declared here that make up the vocabulary of the <name> document.

<!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA".


Here #PCDATA means parse-able text data.
End Declaration - Finally, the declaration section of the DTD is closed using a closing
bracket and a closing angle bracket (]>). This effectively ends the definition, and
thereafter, the XML document follows immediately.
Rules
 The document type declaration must appear at the start of the document
(preceded only by the XML header) — it is not permitted anywhere else within the
document.
 Similar to the DOCTYPE declaration, the element declarations must start with
an exclamation mark.
 The Name in the document type declaration must match the element type of the
root element.
External DTD
In external DTD elements are declared outside the XML file. They are accessed
by specifying the system attributes which may be either the legal .dtd file or a valid URL. To
refer it as external DTD, standalone attribute in the XML declaration must be set as no.
This means, declaration includes information from the external source.
Syntax
Following is the syntax for external
DTD:
<!DOCTYPE root-element SYSTEM "file-name">

where file-name is the file with .dtd extension.

Example
The following example shows external DTD usage:

Kishore.mamidala@gmail.com Page 13
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>Kishore Kumar</name>
<company>VITS</company>
<phone>9666699000</phone>

</address>
The content of the DTD file address.dtd are as shown:
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENTphone(#PCDATA)>
Types
You can refer to an external DTD by using either system identifiers or
public identifiers.
SYSTEM IDENTIFIERS
A system identifier enables you to specify the location of an external
file containing DTD declarations. Syntax is as follows:
<!DOCTYPE name SYSTEM "address.dtd" [...]>
As you can see, it contains keyword SYSTEM and a URI
reference pointing to the location of the document.
PUBLIC IDENTIFIERS
Public identifiers provide a mechanism to locate DTD resources
and are written as below:
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">

As you can see, it begins with keyword PUBLIC, followed by a


specialized identifier. Public identifiers are used to identify an
entry in a catalog. Public identifiers can follow any format,
however, a commonly used format is called Formal Public
Identifiers, or FPIs.

Kishore.mamidala@gmail.com Page 14
XML CSS

Purpose of CSS in XML

CSS (Cascading Style Sheets) can be used to add style and display information to an XML
document. It can format the whole XML document.

How to link XML file with CSS

To link XML files with CSS, you should use the following syntax:

<?xml-stylesheet type="text/css" href="cssemployee.css"?>

XML CSS Example

Let's see the css file.

cssemployee.css

employee
{
background-color: pink;
}
firstname,lastname,email
{
font-size:25px;
display:block;
color: blue;
margin-left: 50px;
}

Let's create the DTD file.

employee.dtd

<!ELEMENT employee (firstname,lastname,email)>


<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>

Kishore.mamidala@gmail.com Page 15
Let's see the xml file using CSS and DTD.

employee.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="cssemployee.css"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
</employee>

XML Schema

XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe
and validate the structure and the content of XML data. XML schema defines the elements,
attributes and data types. Schema element supports Namespaces. It is similar to a database
schema that describes the data in a database.

Checking Validation

An XML document is called "well-formed" if it contains the correct syntax. A well-formed and
valid XML document is one which have been validated against Schema.

Visit http://www.xmlvalidation.com to validate the XML file against schema or DTD.

Syntax
You need to declare a schema in your XML document as
follows:

<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"> Example
The following example shows how to use
schema:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contact">

Kishore.mamidala@gmail.com Page 16
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The basic idea behind XML Schemas is that they describe the legitimate format that an
XML
document can take.
Elements
As we saw in the chapter XML - Elements, elements are the building blocks of XML
document. An element can be defined within an XSD as follows:
<xs:element name="x" type="y"/>
Definition Types
You can define XML schema elements in following ways:
<xs:element name="phone_number" type="xs:int" />
Simple Type - Simple type element is used only in the context of the text. Some of
predefined simple types are: xs:integer, xs:boolean, xs:string, xs:date. For example:
Complex Type - A complex type is a container for other element definitions. This allows you
to specify which child elements an element can contain and to provide some structure within
your XML documents. For example:
<xs:element name="Address">
<xs:complexType>

<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>

Kishore.mamidala@gmail.com Page 17
</xs:element>

In the above example, Address element consists of child elements. This is a container for
other
<xs:element> definitions, that allows to build a simple hierarchy of elements in the
XML
document.
Global Types - With global type, you can define a single type in your document, which can be
used by all other references. For example, suppose you want to generalize the person and
company for different addresses of the company. In such case, you can define a general type
as below:
<xs:element name="AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element
name="Address1">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone1" type="xs:int" />
</xs:sequence>
</xs:complexType>

</xs:element>
<xs:element name="Address2">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
Kishore.mamidala@gmail.com Page 18
<xs:element name="phone2" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>

Instead of having to define the name and the company twice (once for Address1 and once for
Address2), we now have a single definition. This makes maintenance simpler, i.e., if you
decide to add "Postcode" elements to the address, you need to add them at just one place.
At ributes
Attributes in XSD provide extra information within an element. Attributes
have name and type property as shown below:
<xs:attribute name="x" type="y"/>

XML Schema Example

Let's create a schema file.

employee.xsd

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace=" https://www.w3schools.com/"
xmlns=" https://www.w3schools.com/"
elementFormDefault="qualified">

<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>

</xs:schema>

Kishore.mamidala@gmail.com Page 19
Let's see the xml file using XML schema or XSD file.

employee.xml

<?xml version="1.0"?>
<employee
xmlns=" https://www.w3schools.com/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=" https://www.w3schools.com/ employee.xsd">

<firstname>Kishore</firstname>
<lastname>Mamidala</lastname>
<email>kishore.mamidala@gmail.com</email>
</employee>

DTD vs XSD

There are many differences between DTD (Document Type Definition) and XSD (XML Schema
Definition). In short, DTD provides less control on XML structure whereas XSD (XML schema)
provides more control.

The important differences are given below:

No. DTD XSD

1) DTD stands for Document Type XSD stands for XML Schema Definition.
Definition.

2) DTDs are derived XSDs are written in XML.


from SGML syntax.

3) DTD doesn't support datatypes. XSD supports datatypes for elements and
attributes.

4) DTD doesn't support namespace. XSD supports namespace.

5) DTD doesn't define order for XSD defines order for child elements.
child elements.

6) DTD is not extensible. XSD is extensible.

Kishore.mamidala@gmail.com Page 20
7) DTD is not simple to learn. XSD is simple to learn because you don't need
to learn new language.

8) DTD provides less control on XSD provides more control on XML structure.
XML structure.

CDATA

CDATA: (Unparsed Character data): CDATA contains the text which is not parsed further in an
XML document. Tags inside the CDATA text are not treated as markup and entities will not be
expanded.

Let's take an example for CDATA:

<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<![CDATA[
<firstname>kishore</firstname>
<lastname>Mamidala</lastname>
<email>kishore.mamidala@gmail.com</email>
]]>
</employee>

In the above CDATA example, CDATA is used just after the element employee to make the
data/text unparsed, so it will give the value of employee:

<firstname>kishore</firstname><lastname>Mamidala</lastname><email>kishore.mamidala@g
mail.com</email>

PCDATA

PCDATA: (Parsed Character Data): XML parsers are used to parse all the text in an XML
document. PCDATA stands for Parsed Character data. PCDATA is the text that will be parsed
by a parser. Tags inside the PCDATA will be treated as markup and entities will be expanded.

In other words you can say that a parsed character data means the XML parser examine the data
and ensure that it doesn't content entity if it contains that will be replaced.

Let's take an example:

<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">

Kishore.mamidala@gmail.com Page 21
<employee>
<firstname>kishore</firstname>
<lastname>mamidala</lastname>
<email>kishore.mamidala@gmail.com</email>
</employee>
Test it Now

In the above example, the employee element contains 3 more elements 'firstname', 'lastname',
and 'email', so it parses further to get the data/text of firstname, lastname and email to give the
value of employee as:

Kishore mamidala kishore.mamidala@gmail.com

Document Object Model

The document object represents the whole html document.

When html document is loaded in the browser, it becomes a document object. It is the root
element that represents the html document. It has properties and methods. By the help of
document object, we can add dynamic content to our web page.

Properties of document object

Let's see the properties of document object that can be accessed and modified by the document
object.

Kishore.mamidala@gmail.com Page 22
Methods of document object

We can access and change the contents of document by its methods.

The important methods of document object are as follows:

Method Description

write("string") writes the given string on the doucment.

writeln("string") writes the given string on the doucment with newline character at
the end.

getElementById() returns the element having the given id value.

getElementsByName() returns all the elements having the given name value.

getElementsByTagName() returns all the elements having the given tag name.

getElementsByClassName() returns all the elements having the given class name.

Let's see the simple example of document object that prints name with welcome message.

<script type="text/javascript">
function printvalue(){
var name=document.form1.name.value;
alert("Welcome: "+name);
}
</script>

<form name="form1">
Enter Name:<input type="text" name="name"/>
<input type="button" onclick="printvalue()" value="print name"/>
</form>

Kishore.mamidala@gmail.com Page 23
Output of the above example
Enter Name:

What is XHTML

XHTML stands for EXtensible HyperText Markup Language. It is a cross between HTML
and XML language.

XHTML is almost identical to HTML but it is stricter than HTML. XHTML is HTML defined as
an XML application. It is supported by all major browsers.

Although XHTML is almost the same as HTML but It is more important to create your code
correctly, because XHTML is stricter than HTML in syntax and case sensitivity. XHTML
documents are well-formed and parsed using standard XML parsers, unlike HTML, which
requires a lenient HTML-specific parser.

Why use XHTML

XHTML was developed to make HTML more extensible and increase interoperability with other
data formats. There are two main reasons behind the creation of XHTML:

o It creates a stricter standard for making web pages, reducing incompatibilities between
browsers. So it is compatible for all major browsers.
o It creates a standard that can be used on a variety of different devices without changes.

Let's take an example to understand it.

HTML is mainly used to create web pages but we can see that many pages on the internet
contain "bad" HTML (not follow the HTML rule).

This HTML code works fine in most browsers (even if it does not follow the HTML rules).

For example:

<html>
<head>
<title>This is an example of bad HTML</title>
<body>
<h1>Bad HTML
<p>This is a paragraph
</body>

Kishore.mamidala@gmail.com Page 24
The above HTML code doesn't follow the HTML rule although it runs. Now a day, there are
different browser technologies. Some browsers run on computers, and some browsers run on
mobile phones or other small devices. The main issue with the bad HTML is that it can't be
interpreted by smaller devices.

So, XHTML is introduced to combine the strengths of HTML and XML.

XHTML is HTML redesigned as XML. It helps you to create better formatted code on your site.

XHTML doesn't facilitate you to make badly formed code to be XHTML compatible. Unlike
with HTML (where simple errors (like missing out a closing tag) are ignored by the browser),
XHTML code must be exactly how it is specified to be.

HTML vs XHTML

There are some changes in XHTML as compared to HTML. These changes can be categorized in
three parts:

1. Changes in Document Structure

o All documents must have a DOCTYPE.


o The xmlns attribute in <html> is mandatory and must specify the xml namespace for the
document.
o <html>, <head>, <title>, and <body> are mandatory with their respective closing tags.

2. Changes in XHTML Tags

o All XHTML tags must be in lower case.


o All XHTML tags must be closed.
o All XHTML tags must be properly nested.
o The XHTML documents must have one root element.

3. Changes in XHTML Tags

o All XHTML attributes must be added properly.


o All XHTML attributes must be in lower case.
o The name attribute has changed.

Kishore.mamidala@gmail.com Page 25
o XHTML attributes cannot be shortened.
o XHTML attribute values must be quoted.

XHTML Syntax

XHTML syntax is very similar to HTML syntax and all the valid HTML elements are also valid
in XHTML. But XHTML is case sensitive so you have to pay a bit extra attention while writing
an XHTML document to make your HTML document compliant to XHTML.

You must remember the following important points while writing a new XHTML document or
converting existing HTML document into XHTML document:

o All documents must have a DOCTYPE.


o All tags must be in lower case.
o All documents must be properly formed.
o All tags must be closed.
o All attributes must be added properly.
o The name attribute has changed.
o Attributes cannot be shortened.
o All tags must be properly nested.

Tags must be in lower case

XHTML is case-sensitive markup language. So, all the XHTML tags and attributes must be
written in lower case.

<!-- Invalid in XHTML -->


<A Href="/xhtml/xhtml_tutorial.html">XHTML Tutorial</A>
<!-- Valid in XHTML -->
<a href="/xhtml/xhtml_tutorial.html">XHTML Tutorial</a>

Closing Tags are mandatory

An XHTML must have an equivalent closing tag. Even empty elements should also have closing
tags. Let's see an example:

<!-- Invalid in XHTML -->


<p>This paragraph is not written according to XHTML syntax.

Kishore.mamidala@gmail.com Page 26
<!-- Invalid in XHTML -->
<img src="/images/xhtml.gif" >
<!-- Valid in XHTML -->
<p>This paragraph is not written according to XHTML syntax.</p>
<!-- Valid in XHTML-->
<img src="/images/xhtml.gif" />

Attribute Quotes

All the XHTML attribute's values must be quoted. Otherwise, your XHTML document is
assumed as an invalid document.

See this example:

<!-- Invalid in XHTML -->


<img src="/images/xhtml.gif" width=250 height=50 />
<!-- Valid in XHTML -->
<img src="/images/xhtml.gif" width="250" height="50" />

Attribute Minimization

XHTML doesn't allow you to minimize attributes. You have to explicitly state the attribute and
its value.

See this example:

<!—Invalid in XHTML -->


<option selected>
<!-- valid in XHTML-->
<option selected="selected">

A list of minimized attributes in HTML and the way you need to write them in XHTML.

The id Attribute

The id attribute is used to replace the name attribute. Instead of using name = "name", XHTML
prefers to use id = "id".

See this example:

Kishore.mamidala@gmail.com Page 27
<!-- Invalid in XHTML -->
<img src="/images/xhtml.gif" name="xhtml_logo" />
<!-- Valid in XHTML -->
<img src="/images/xhtml.gif" id="xhtml_logo" />

The language attribute

In XHTML, the language attribute of script tag is deprecated so you have to use type attribute
instead of this.

See this example:

<!-- Invalid in XHTML -->


<script language="JavaScript" type="text/JavaScript">
document.write("Hello XHTML!");
</script>
<!-- Valid in XHTML -->
<script type="text/JavaScript">
document.write("Hello XHTML!");
</script>

Nested Tags

XHTML tags must be nested properly. Otherwise your document is assumed as an incorrect
XHTML document.

See this example:

<!-- Invalid in XHTML -->


<b><i> This text is bold and italic</b></i>
<!-- Valid in XHTML -->
<b><i> This text is bold and italic</i></b>

XHTML Doctypes

There are three types of Document Type Definitions (DTDs). The easiest and most commonly
used is the XHTML Transitional document.

A list of the XHTML Doctypes:

o Strict

Kishore.mamidala@gmail.com Page 28
o Transitional
o Frameset

You should be very careful while writing XHTML document because the few XHTML elements
and attributes, which are available in one DTD but not available in another DTD. So, you should
select your XHTML elements or attribute carefully

XHTML 1.0 Strict DTD

It is recommended to use when you want to use Cascading Style Sheet (CSS) strictly and
avoiding to write most of the XHTML attributes.

Add the following DTD at the top of your XHTML document.

Syntax:

<!DOCTYPE html PUBLIC "-


//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
strict.dtd">

XHTML 1.0 Transitional DTD

It is recommended to use when you want to use many XHTML attributes as well as few
Cascading Style Sheet (CSS) properties.

Add the following DTD at the top of your XHTML document.

Syntax:

<!DOCTYPE html PUBLIC "-


//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
transitional.dtd">

XHTML 1.0 Frameset DTD

It is recommended to use when you want to use HTML Frames to partition the browser window
into two or more frames.

Add the following DTD at the top of your XHTML document.

Syntax:

Kishore.mamidala@gmail.com Page 29
<!DOCTYPE html PUBLIC "-
//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
frameset.dtd">

Note: You can use anyone of the DTD to write your XHTML document; if it is a valid XHTML
document, then your document is considered as a good quality document.

The Hello World page of XHTML looks like this:

<?xml version="1.0" encoding="iso-8859-1"?>


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Hello World</title>
</head>
<body>
<p>My first Web page.</p>
</body>
</html>

XML Parsers

An XML parser is a software library or package that provides interfaces for client applications to
work with an XML document. The XML Parser is designed to read the XML and create a way
for programs to use XML.

XML parser validates the document and check that the document is well formatted.

Let's understand the working of XML parser by the figure given below:

Kishore.mamidala@gmail.com Page 30
Types of XML Parsers

These are the two main types of XML Parsers:

1. DOM
2. SAX

DOM (Document Object Model)

A DOM document is an object which contains all the information of an XML document. It is
composed like a tree structure. The DOM Parser implements a DOM API. This API is very
simple to use.

Features of DOM Parser

A DOM Parser creates an internal structure in memory which is a DOM document object and the
client applications get information of the original XML document by invoking methods on this
document object.

DOM Parser has a tree based structure.

Advantages

1) It supports both read and write operations and the API is very simple to use.

2) It is preferred when random access to widely separated parts of a document is required.

Disadvantages

1) It is memory inefficient. (consumes more memory because the whole XML document needs
to loaded into memory).

2) It is comparatively slower than other parsers.

SAX (Simple API for XML)

A SAX Parser implements SAX API. This API is an event based API and less intuitive.

Features of SAX Parser

It does not create any internal structure.

Clients does not know what methods to call, they just overrides the methods of the API and place
his own code inside method.

It is an event based parser, it works like an event handler in Java.

Kishore.mamidala@gmail.com Page 31
Advantages

1) It is simple and memory efficient.

2) It is very fast and works for huge documents.

Disadvantages

1) It is event-based so its API is less intuitive.

2) Clients never know the full information because the data is broken into pieces.

Kishore.mamidala@gmail.com Page 32

You might also like