You are on page 1of 26

Introduction to ASP.

NET

ASP.NET is more than the next version of Active Server Pages (ASP); it
provides a unified Web development model that includes the services
necessary for developers to build enterprise-class Web applications. While
ASP.NET is largely syntax compatible with ASP, it also provides a new
programming model and infrastructure for more scalable and stable
applications that help provide greater protection. You can feel free to
augment your existing ASP applications by incrementally adding ASP.NET
functionality to them.

ASP.NET is a compiled, .NET-based environment; you can author


applications in any .NET compatible language, including Visual Basic
.NET, C#, and JScript .NET. Additionally, the entire .NET Framework is
available to any ASP.NET application. Developers can easily access the
benefits of these technologies, which include the managed common
language runtime environment, type safety, inheritance, and so on.

ASP.NET has been designed to work seamlessly with WYSIWYG HTML


editors and other programming tools, including Microsoft Visual Studio
.NET. Not only does this make Web development easier, but it also provides
all the benefits that these tools have to offer, including a GUI that developers
can use to drop server controls onto a Web page and fully integrated
debugging support.

Developers can use Web Forms or XML Web services when creating an
ASP.NET application, or combine these in any way they see fit. Each is
supported by the same infrastructure that allows you to use authentication
schemes, cache frequently used data, or customize your application's
configuration, to name only a few possibilities.
• Web Forms allow you to build powerful forms-based Web pages.
When building these pages, you can use ASP.NET server controls to
create common UI elements, and program them for common tasks.
These controls allow you to rapidly build a Web Form out of reusable
built-in or custom components, simplifying the code of a page. For
more information, see Web Forms Pages. For information on how to
develop ASP.NET server controls, see Developing ASP.NET Server
Controls.
• An XML Web service provides the means to access server
functionality remotely. Using XML Web services, businesses can
expose programmatic interfaces to their data or business logic, which
in turn can be obtained and manipulated by client and server
applications. XML Web services enable the exchange of data in
client-server or server-server scenarios, using standards like HTTP
and XML messaging to move data across firewalls. XML Web
services are not tied to a particular component technology or object-
calling convention. As a result, programs written in any language,
using any component model, and running on any operating system can
access XML Web services. For more information, see XML Web
Services Created Using ASP.NET and XML Web Service Clients.

Each of these models can take full advantage of all ASP.NET features, as well as the power of
the .NET Framework and .NET Framework common language runtime. These features and how
you can use them are outlined as follows:

• If you have ASP development skills, the new ASP.NET programming model will seem
very familiar to you. However, the ASP.NET object model has changed significantly
from ASP, making it more structured and object-oriented. Unfortunately this means
that ASP.NET is not fully backward compatible; almost all existing ASP pages will have
to be modified to some extent in order to run under ASP.NET. In addition, major
changes to Visual Basic .NET mean that existing ASP pages written with Visual Basic
Scripting Edition typically will not port directly to ASP.NET. In most cases, though, the
necessary changes will involve only a few lines of code. For more information, see
Migrating from ASP to ASP.NET.

• Accessing databases from ASP.NET applications is an often-used technique for


displaying data to Web site visitors. ASP.NET makes it easier than ever to access
databases for this purpose. It also allows you to manage the database from your code.
For more information, see Accessing Data with ASP.NET.

• ASP.NET provides a simple model that enables Web developers to write logic that runs
at the application level. Developers can write this code in the Global.asax text file or in
a compiled class deployed as an assembly. This logic can include application-level
events, but developers can easily extend this model to suit the needs of their Web
application. For more information, see ASP.NET Applications.

• ASP.NET provides easy-to-use application and session-state facilities that are familiar
to ASP developers and are readily compatible with all other .NET Framework APIs. For
more information, see ASP.NET State Management.

• For advanced developers who want to use APIs as powerful as the ISAPI programming
interfaces that were included with previous versions of ASP, ASP.NET offers the
IHttpHandler and IHttpModule interfaces. Implementing the IHttpHandler interface
gives you a means of interacting with the low-level request and response services of
the IIS Web server and provides functionality much like ISAPI extensions, but with a
simpler programming model. Implementing the IHttpModule interface allows you to
include custom events that participate in every request made to your application. For
more information, see HTTP Runtime Support.

• ASP.NET takes advantage of performance enhancements found in the .NET Framework


and common language runtime. Additionally, it has been designed to offer significant
performance improvements over ASP and other Web development platforms. All
ASP.NET code is compiled, rather than interpreted, which allows early binding, strong
typing, and just-in-time (JIT) compilation to native code, to name only a few of its
benefits. ASP.NET is also easily factorable, meaning that developers can remove
modules (a session module, for instance) that are not relevant to the application they
are developing. ASP.NET also provides extensive caching services (both built-in
services and caching APIs). ASP.NET also ships with performance counters that
developers and system administrators can monitor to test new applications and gather
metrics on existing applications. For more information, see ASP.NET Caching Features
and ASP.NET Optimization.

• Writing custom debug statements to your Web page can help immensely in
troubleshooting your application's code. However, they can cause embarrassment if
they are not removed. The problem is that removing the debug statements from your
pages when your application is ready to be ported to a production server can require
significant effort. ASP.NET offers the TraceContext class, which allows you to write
custom debug statements to your pages as you develop them. They appear only when
you have enabled tracing for a page or entire application. Enabling tracing also
appends details about a request to the page, or, if you so specify, to a custom trace
viewer that is stored in the root directory of your application. For more information,
see ASP.NET Trace.

• The .NET Framework and ASP.NET provide default authorization and authentication
schemes for Web applications. You can easily remove, add to, or replace these
schemes, depending upon the needs of your application. For more information, see
Securing ASP.NET Web Applications.

• ASP.NET configuration settings are stored in XML-based files, which are human
readable and writable. Each of your applications can have a distinct configuration file
and you can extend the configuration scheme to suit your requirements. For more
information, see ASP.NET Configuration.

• Applications are said to be running side by side when they are installed on the same
computer but use different versions of the .NET Framework. To learn how to use
different versions of ASP.NET for separate applications on your server, see Side-by-
Side Support in ASP.NET.

• IIS 6.0 uses a new process model called worker process isolation mode, which is
different from the process model used in previous versions of IIS. ASP.NET uses this
process model by default when running on Windows Server 2003. For information
about how to migrate ASP.NET process model settings to worker process isolation
mode, see IIS 6.0 Application Isolation Modes.

History of Asp.net
History of ASP.NET Web Application Development
• In order to understand why there are different types of Web
applications, a brief history lesson in
Visual Studio .NET is required.
• Visual Studio.NET 2002/2003 required the developer to use Internet
Information Server (IIS) when
building ASP.NET projects.
• Typically, IIS was installed on the local development machine. Front
Page Server Extensions (FPSE)
had to be installed on IIS in order for Visual Studio to create and modify
Web sites.
• Creating a new Web application project involved creating a new IIS
virtual directory, which the IDE
would create at the root of an IIS Web site.
• You could also optionally connect to an existing IIS virtual directory
by creating a new “empty Web
project”.
• This dependency on IIS was problematic in many situations:
• Many IT shops preferred not to deploy IIS to workstations in the
enterprise because it is a security
risk to expose many unnecessary IIS Web servers.
• If there was a mismatch between the project declaration and the IIS
configuration on the local machine,
the project would simply fail to load.
• Every time a code-behind file was changed, the entire Web project had
to be recompiled and its
Dynamic Link Library (DLL) had to be redeployed to the /bin/ folder on the
Web server.
• Some IT shops required all developers to use one installation of IIS on
a dedicated server that all
developers had to share, which required developers to “take turns” attaching
to the IIS process to debug the
Web site!
• Acknowledging the problems above, Microsoft released Visual Studio
2005 with ASP.NET 2.0.
• Microsoft altered the development architecture by changing Visual
Studio “Web Application Projects”
to Visual Studio “Web Sites”.
• A “Web Site” allows you to develop your Web site on a local file
directory – IIS does not even have to
be installed.
• A “Web Site” allows each Web page to be “Just in Time” (JIT)
compiled as they are requested.
Changes to a Web page do not require a complete recompilation and
deployment of the Web site!
• Web page requests were handled by a light weight development Web
server that shipped with ASP.
NET 2.0.
• The Web server (called WebDev.WebServer.exe) was automatically
started when you launched a new
debugging session in Visual Studio 2005.
• This lightweight server only handled a limited number of requests and
ran silently in the notification
area (tray).
• It automatically chose a port number that was unused on the local
machine.

• Web deployment was made easier, too.


• Developers could either create the Web pages using a “single file” or
“code-behind” model.
• Developers could connect to and deploy ASP.NET Web sites to any
File Transfer Protocol (FTP) Web
Server.
• Developers could also connect and deploy their Web sites to any IIS
server, so long as FPSEs were
installed.
• These deployment tools were meant to be used by Web developers
only, for moving Web pages to
staging servers.
• However, many enterprises were upset by this entire change to the
Web development and deployment
architecture.
• Some considered the solution to be less secure because they did not
want to deploy the code-behind
files (.NET source code) to their Web servers.
• Many organizations had created hundreds of “Web Application
Projects” and the upgrade to “Web
Sites” was going to be very expensive.
• Some decided to put off upgrading their Web sites from ASP.NET 1.1
altogether until Microsoft
“fixed” this problem.

• Hearing the voice of the enterprises, Microsoft created an interim
solution called the Web Application
Project (WAP) Visual Studio Add-In.
• This allowed developers to continue using the classic “Web
Application Project” architecture, making
upgrades from ASP.NET 1.1 less painful.
• The WAP Add-In was eventually rolled into Visual Studio 2005
Service Pack 1 (SP1). Installing SP1
automatically installed the WAP Add-In.
• Microsoft released Visual Studio 2008 with .NET 3.5. It supports both
development architectures that
were available in Visual Studio 2005 SP1.
• Developers still have the choice of developing a new Web Site (a
collection of JIT-compiled files in a
folder) or a new Web Application Project (all code files compiled into one
DLL).
• Developers also have the choice of what version of ASP.NET they
want to use with their Web sites (.
NET 2.0, 3.0 and 3.5).
• Additional support was added to both ASP.NET application types, as
you’ll soon see.
• As of Visual Studio 2008, the J# language has been completely
dropped, but support for Linq,
Silverlight, AJAX and the W’s (WPF, WCF and WF) has been added.
• ASP.NET 3.5 is actually just ASP.NET 2.0 with additional BCLs that
run along side of it.
• Some of the Web framework libraries were replaced in .NET 3.5 while
others are still version 2.0.
• For example, ASP.NET 3.5 added support for new Web Server
controls, Web server extensions,
handlers, modules, DataSet extensions and Linq.
• However, System.Web, the root-level namespace in ASP.NET, is still
version 2.0.
• Thus, it is safe to state that:
• If the .NET 3.0 framework or .NET 3.5 framework is installed on a
Web server, you are guaranteed that
the .NET 2.0 framework is also installed.

Overview of XML

What Is XML?
Extensible Markup Language (XML) is a markup language used to describe the content
and structure of data in a document. It is a simplified version of Standard Generalized
Markup Language (SGML). XML is an industry standard for delivering content on the
Internet. Because it provides a facility to define new tags, XML is also extensible.
Like HTML, XML uses tags to describe content. However, rather than focusing on the
presentation of content, the tags in XML describe the meaning and hierarchical structure
of data. This functionality allows for the sophisticated data types that are required for
efficient data interchange between different programs and systems. Further, because
XML enables separation of content and presentation, the content, or data, is portable
across heterogeneous systems.
The XML syntax uses matching start and end tags (such as <name> and </name>) to mark
up information. Information delimited by tags is called an element. Every XML
document has a single root element, which is the top-level element that contains all the
other elements. Elements that are contained by other elements are often referred to as
sub-elements. An element can optionally have attributes, structured as name-value pairs,
that are part of the element and are used to further define it.
The following sample XML file describes the contents of an address book:
<?xml version="1.0"?>

<address_book>
<person gender="f">
<name>Jane Doe</name>
<address>
<street>123 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>94117</zip>
</address>
<phone area_code=415>555-1212</phone>
</person>
<person gender="m">
<name>John Smith</name>
<phone area_code=510>555-1234</phone>
<email>johnsmith@somewhere.com</email>
</person>
</address_book>
The root element of the XML file is address_book. The address book currently contains
two entries in the form of person elements: Jane Doe and John Smith. Jane Doe's entry
includes her address and phone number; John Smith's includes his phone and email
address. Note that the structure of the XML document defines the phone element as
storing the area code using the area_code attribute rather than a sub-element in the body
of the element. Also note that not all sub-elements are required for the person element.

How Do You Describe an XML


Document?
There are two ways to describe an XML document: DTDs and XML Schemas.
Document Type Definitions (DTDs) define the basic requirements for the structure of a
particular XML document. A DTD describes the elements and attributes that are valid in
an XML document, and the contexts in which they are valid. In other words, a DTD
specifies which tags are allowed within certain other tags, and which tags and attributes
are optional.
The following example shows a DTD that describes the preceding address book sample
XML document:
<!DOCTYPE address_book [
<!ELEMENT person (name, address?, phone?, email?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (street, city, state, zip)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ATTLIST person gender CDATA #REQUIRED>
<!ATTLIST phone area_code CDATA #REQUIRED>
]>
Schemas are a recent development in XML specifications and are intended to supersede
DTDs. They describe XML documents with more flexibility and detail than DTDs do,
and are XML documents themselves, which DTDs are not. The schema specification,
currently under development, is a product of the World Wide Web Consortium (W3C)
and is intended to address many limitations of DTDs. For detailed information on XML
schemas, see http://www.w3.org/TR/xmlschema-0/.
The following example shows a schema that describes the preceding address book sample
XML document:
<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<xsd:element name="address_book" type="bookType"/>
<xsd:complexType name="bookType">
<xsd:element name=name="person" type="personType"/>
</xsd:complexType>
<xsd:complexType name="personType">
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="address" type="addressType"/>
<xsd:element name="phone" type="phoneType"/>
<xsd:element name="email" type="xsd:string"/>
<xsd:attribute name="gender" type="xsd:string"/>
</xsd:complexType>
<xsd:complexType name="addressType">
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:string"/>
</xsd:complexType>
<xsd:simpleType name="phoneType">
<xsd:restriction base="xsd:string"/>
<xsd:attribute name="area_code" type="xsd:string"/>
</xsd:simpleType>
</xsd:schema>
An XML document can include a DTD or Schema as part of the document itself,
reference an external DTD or Schema using the DOCTYPE declaration, or not include or
reference a DTD or Schema at all. The following excerpt from an XML document shows
how to reference an external DTD called address.dtd:
<?xml version=1.0?>
<!DOCTYPE address_book SYSTEM "address.dtd">
<address_book>
...
XML documents only need to be accompanied by a DTD or Schema if they need to be
validated by a parser or if they contain complex types. An XML document is considered
valid if 1) it has an associated DTD or Schema, and 2) it complies with the constraints
expressed in the associated DTD or Schema. If, however, an XML document only needs
to be well-formed, then the document does not have to be accompanied by a DTD or
Schema. A document is considered well-formed if it follows all the rules in the W3C
Recommendation for XML 1.0. For the full XML 1.0 specification, see
http://www.w3.org/XML/.

Why Use XML?


An industry typically uses data exchange methods that are meaningful and specific to that
industry. With the advent of e-commerce, businesses conduct an increasing number of
relationships with a variety of industries and, therefore, must develop expert knowledge
of the various protocols used by those industries for electronic communication.
The extensibility of XML makes it a very effective tool for standardizing the format of
data interchange among various industries. For example, when message brokers and
workflow engines must coordinate transactions among multiple industries or departments
within an enterprise, they can use XML to combine data from disparate sources into a
format that is understandable by all parties.

TECHNICAL OVERVIEW OF XML


What is XML?

XML is a markup language for documents containing structured information.


Structured information contains both content (words, pictures, etc.) and some indication
of what role that content plays (for example, content in a section heading has a different
meaning from content in a footnote, which means something different than content in a
figure caption or content in a database table, etc.). Almost all documents have some
structure.
A markup language is a mechanism to identify structures in a document. The XML
specification defines a standard way to add markup to documents.

What's a Document?
The number of applications currently being developed that are based on, or make use of,
XML documents is truly amazing (particularly when you consider that XML is not yet a
year old)! For our purposes, the word "document" refers not only to traditional
documents, like this one, but also to the myriad of other XML "data formats". These
include vector graphics, e-commerce transactions, mathematical equations, object meta-
data, server APIs, and a thousand other kinds of structured information.

So XML is Just Like HTML?


No. In HTML, both the tag semantics and the tag set are fixed. An <h1> is always a first
level heading and the tag <ati.product.code> is meaningless. The W3C, in
conjunction with browser vendors and the WWW community, is constantly working to
extend the definition of HTML to allow new tags to keep pace with changing technology
and to bring variations in presentation (stylesheets) to the Web. However, these changes
are always rigidly confined by what the browser vendors have implemented and by the
fact that backward compatibility is paramount. And for people who want to disseminate
information widely, features supported by only the latest releases of Netscape and
Internet Explorer are not useful.
XML specifies neither semantics nor a tag set. In fact XML is really a meta-language for
describing markup languages. In other words, XML provides a facility to define tags and
the structural relationships between them. Since there's no predefined tag set, there can't
be any preconceived semantics. All of the semantics of an XML document will either be
defined by the applications that process them or by stylesheets.

So XML Is Just Like SGML?


No. Well, yes, sort of. XML is defined as an application profile of SGML. SGML is the
Standard Generalized Markup Language defined by ISO 8879. SGML has been the
standard, vendor-independent way to maintain repositories of structured documentation
for more than a decade, but it is not well suited to serving documents over the web (for a
number of technical reasons beyond the scope of this article). Defining XML as an
application profile of SGML means that any fully conformant SGML system will be able
to read XML documents. However, using and understanding XML documents does not
require a system that is capable of understanding the full generality of SGML. XML is,
roughly speaking, a restricted form of SGML.
For technical purists, it's important to note that there may also be subtle differences
between documents as understood by XML systems and those same documents as
understood by SGML systems. In particular, treatment of white space immediately
adjacent to tags may be different.

Why XML?
In order to appreciate XML, it is important to understand why it was created. XML was
created so that richly structured documents could be used over the web. The only viable
alternatives, HTML and SGML, are not practical for this purpose.
HTML, as we've already discussed, comes bound with a set of semantics and does not
provide arbitrary structure.
SGML provides arbitrary structure, but is too difficult to implement just for a web
browser. Full SGML systems solve large, complex problems that justify their expense.
Viewing structured documents sent over the web rarely carries such justification.
This is not to say that XML can be expected to completely replace SGML. While XML is
being designed to deliver structured content over the web, some of the very features it
lacks to make this practical, make SGML a more satisfactory solution for the creation and
long-time storage of complex documents. In many organizations, filtering SGML to
XML will be the standard procedure for web delivery.
XML Development Goals
The XML specification sets out the following goals for XML: [Section 1.1] (In this
article, citations of the form [Section 1.1], these are references to the W3C
Recommendation Extensible Markup Language (XML) 1.0. If you are interested in more
technical detail about a particular topic, please consult the specification)

1. It shall be straightforward to use XML over the Internet. Users must be able to
view XML documents as quickly and easily as HTML documents. In practice,
this will only be possible when XML browsers are as robust and widely available
as HTML browsers, but the principle remains.
2. XML shall support a wide variety of applications. XML should be beneficial to a
wide variety of diverse applications: authoring, browsing, content analysis, etc.
Although the initial focus is on serving structured documents over the web, it is
not meant to narrowly define XML.
3. XML shall be compatible with SGML. Most of the people involved in the XML
effort come from organizations that have a large, in some cases staggering,
amount of material in SGML. XML was designed pragmatically, to be compatible
with existing standards while solving the relatively new problem of sending richly
structured documents over the web.
4. It shall be easy to write programs that process XML documents. The colloquial
way of expressing this goal while the spec was being developed was that it ought
to take about two weeks for a competent computer science graduate student to
build a program that can process XML documents.
5. The number of optional features in XML is to be kept to an absolute minimum,
ideally zero. Optional features inevitably raise compatibility problems when users
want to share documents and sometimes lead to confusion and frustration.
6. XML documents should be human-legible and reasonably clear. If you don't have
an XML browser and you've received a hunk of XML from somewhere, you
ought to be able to look at it in your favorite text editor and actually figure out
what the content means.
7. The XML design should be prepared quickly. Standards efforts are notoriously
slow. XML was needed immediately and was developed as quickly as possible.
8. The design of XML shall be formal and concise. In many ways a corollary to rule
4, it essentially means that XML must be expressed in EBNF and must be
amenable to modern compiler tools and techniques.
There are a number of technical reasons why the SGML grammar cannot be
expressed in EBNF. Writing a proper SGML parser requires handling a variety of
rarely used and difficult to parse language features. XML does not.
9. XML documents shall be easy to create. Although there will eventually be
sophisticated editors to create and edit XML content, they won't appear
immediately. In the interim, it must be possible to create XML documents in other
ways: directly in a text editor, with simple shell and Perl scripts, etc.
10. Terseness in XML markup is of minimal importance. Several SGML language
features were designed to minimize the amount of typing required to manually
key in SGML documents. These features are not supported in XML. From an
abstract point of view, these documents are indistinguishable from their more
fully specified forms, but supporting these features adds a considerable burden to
the SGML parser (or the person writing it, anyway). In addition, most modern
editors offer better facilities to define shortcuts when entering text.

How Is XML Defined?


XML is defined by a number of related specifications:
Extensible Markup Language (XML) 1.0
Defines the syntax of XML. The XML specification is the primary focus of this
article.
XML Pointer Language (XPointer) and XML Linking Language (XLink)
Defines a standard way to represent links between resources. In addition to simple
links, like HTML's <A> tag, XML has mechanisms for links between multiple
resources and links between read-only resources. XPointer describes how to
address a resource, XLink describes how to associate two or more resources.
Extensible Style Language (XSL)
Defines the standard stylesheet language for XML.
As time goes on, additional requirements will be addressed by other specifications.
Currently (Sep, 1998), namespaces (dealing with tags from multiple tag sets), a query
language (finding out what's in a document or a collection of documents), and a schema
language (describing the relationships between tags, DTDs in XML) are all being
actively pursued.

1. XML is for structuring data

Structured data includes things like spreadsheets, address books,


configuration parameters, financial transactions, and technical drawings. XML is
a set of rules (you may also think of them as guidelines or conventions) for
designing text formats that let you structure your data. XML is not a programming
language, and you don't have to be a programmer to use it or learn it. XML
makes it easy for a computer to generate data, read data, and ensure that the
data structure is unambiguous. XML avoids common pitfalls in language design:
it is extensible, platform-independent, and it supports internationalization and
localization. XML is fully Unicode-compliant.

2. XML looks a bit like HTML

Like HTML, XML makes use of tags (words bracketed by '<' and
'>') and attributes (of the form name="value"). While HTML specifies what each
tag and attribute means, and often how the text between them will look in a
browser, XML uses the tags only to delimit pieces of data, and leaves the
interpretation of the data completely to the application that reads it. In other
words, if you see "<p>" in an XML file, do not assume it is a paragraph.
Depending on the context, it may be a price, a parameter, a person, a p... (and
who says it has to be a word with a "p"?).

3. XML is text, but isn't meant to be read

Programs that produce spreadsheets, address books, and other


structured data often store that data on disk, using either a binary or text format.
One advantage of a text format is that it allows people, if necessary, to look at
the data without the program that produced it; in a pinch, you can read a text
format with your favorite text editor. Text formats also allow developers to more
easily debug applications. Like HTML, XML files are text files that people
shouldn't have to read, but may when the need arises. Compared to HTML, the
rules for XML files allow fewer variations. A forgotten tag, or an attribute without
quotes makes an XML file unusable, while in HTML such practice is often
explicitly allowed. The official XML specification forbids applications from trying to
second-guess the creator of a broken XML file; if the file is broken, an application
has to stop right there and report an error.

4. XML is verbose by design

Since XML is a text format and it uses tags to delimit the data, XML
files are nearly always larger than comparable binary formats. That was a
conscious decision by the designers of XML. The advantages of a text format are
evident (see point 3), and the disadvantages can usually be compensated at a
different level. Disk space is less expensive than it used to be, and compression
programs like zip and gzip can compress files very well and very fast. In addition,
communication protocols such as modem protocols and HTTP/1.1, the core
protocol of the Web, can compress data on the fly, saving bandwidth as
effectively as a binary format.
5. XML is a family of technologies

XML 1.0 is the specification that defines what "tags" and


"attributes" are. Beyond XML 1.0, "the XML family" is a growing set of modules
that offer useful services to accomplish important and frequently demanded
tasks. XLink describes a standard way to add hyperlinks to an XML file. XPointer
is a syntax in development for pointing to parts of an XML document. An
XPointer is a bit like a URL, but instead of pointing to documents on the Web, it
points to pieces of data inside an XML file. CSS, the style sheet language, is
applicable to XML as it is to HTML. XSL is the advanced language for expressing
style sheets. It is based on XSLT, a transformation language used for
rearranging, adding and deleting tags and attributes. The DOM is a standard set
of function calls for manipulating XML (and HTML) files from a programming
language. XML Schemas 1 and 2 help developers to precisely define the
structures of their own XML-based formats. There are several more modules and
tools available or under development. Keep an eye on W3C's technical reports
page.

6. XML is new, but not that new

Development of XML started in 1996 and it has been a W3C


Recommendation since February 1998, which may make you suspect that this is
rather immature technology. In fact, the technology isn't very new. Before XML
there was SGML, developed in the early '80s, an ISO standard since 1986, and
widely used for large documentation projects. The development of HTML started
in 1990. The designers of XML simply took the best parts of SGML, guided by
the experience with HTML, and produced something that is no less powerful than
SGML, and vastly more regular and simple to use. Some evolutions, however,
are hard to distinguish from revolutions... And it must be said that while SGML is
mostly used for technical documentation and much less for other kinds of data,
with XML it is exactly the opposite.
7. XML leads HTML to XHTML

There is an important XML application that is a document format:


W3C's XHTML, the successor to HTML. XHTML has many of the same elements
as HTML. The syntax has been changed slightly to conform to the rules of XML.
A format that is "XML-based" inherits the syntax from XML and restricts it in
certain ways (e.g, XHTML allows "<p>", but not "<r>"); it also adds meaning to
that syntax (XHTML says that "<p>" stands for "paragraph", and not for "price",
"person", or anything else).

8. XML is modular

XML allows you to define a new document format by combining


and reusing other formats. Since two formats developed independently may have
elements or attributes with the same name, care must be taken when combining
those formats (does "<p>" mean "paragraph" from this format or "person" from
that one?). To eliminate name confusion when combining formats, XML provides
a namespace mechanism. XSL and RDF are good examples of XML-based
formats that use namespaces. XML Schema is designed to mirror this support for
modularity at the level of defining XML document structures, by making it easy to
combine two schemas to produce a third which covers a merged document
structure.

9. XML is the basis for RDF and the Semantic Web

W3C's Resource Description Framework (RDF) is an XML text


format that supports resource description and metadata applications, such as
music playlists, photo collections, and bibliographies. For example, RDF might let
you identify people in a Web photo album using information from a personal
contact list; then your mail client could automatically start a message to those
people stating that their photos are on the Web. Just as HTML integrated
documents, images, menu systems, and forms applications to launch the original
Web, RDF provides tools to integrate even more, to make the Web a little bit
more into a Semantic Web. Just like people need to have agreement on the
meanings of the words they employ in their communication, computers need
mechanisms for agreeing on the meanings of terms in order to communicate
effectively. Formal descriptions of terms in a certain area (shopping or
manufacturing, for example) are called ontologies and are a necessary part of
the Semantic Web. RDF, ontologies, and the representation of meaning so that
computers can help people do work are all topics of the Semantic Web Activity.

10. XML is license-free, platform-independent and well-


supported

By choosing XML as the basis for a project, you gain access to a


large and growing community of tools (one of which may already do what you
need!) and engineers experienced in the technology. Opting for XML is a bit like
choosing SQL for databases: you still have to build your own database and your
own programs and procedures that manipulate it, but there are many tools
available and many people who can help you. And since XML is license-free, you
can build your own software around it without paying anybody anything. The
large and growing support means that you are also not tied to a single vendor.
XML isn't always the best solution, but it is always worth considering.

What is XML?

• XML stands for EXtensible Markup Language


• XML is a markup language much like HTML.
• XML was designed to describe data.
• XML tags are not predefined in XML. You must define your own tags.
• XML is self describing.
• XML uses a DTD (Document Type Definition) to formally describe the data.

The main difference between XML and HTML


XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
HTML is about displaying information, XML is about describing information.

XML is extensible
The tags used to markup HTML documents and the structure of HTML documents are predefined. The
author of HTML documents can only use tags that are defined in the HTML standard.
XML allows the author to define his own tags and his own document structure.
XML is a complement to HTML
It is important to understand that XML is not a replacement for HTML. In the future development of the
Web it is most likely that XML will be used to structure and describe the Web data, while HTML will be used
to format and display the same data.

XML in future Web development


We have been participating in XML development since its creation. It has been amazing to see how quickly
the XML standard has been developed, and how quickly a large number of software vendors have adopted
the standard.
We strongly believe that XML will be as important to the future of the Web as HTML has been to the
foundation of the Web. XML is the future for all data transmission and data manipulation over the Web.

XML can keep data separated from your HTML


HTML pages are used to display data. Data is often stored inside HTML pages. With XML this data can now
be stored in a separate XML file. This way you can concentrate on using HTML for formatting and display,
and be sure that changes in the underlying data will not force changes to any of your HTML code.

XML can also store data inside HTML documents


XML data can also be stored inside HTML pages as "Data Islands". You can still concentrate on using
HTML for formatting and displaying the data.

XML can be used to exchange data


In the real world, computer systems and databases contain data in incompatible formats. One of the most
time consuming challenges for developers has been to exchange data between such systems over the
Internet. Converting the data to XML can greatly reduce this complexity and create data that can be read by
different types of applications.

XML can be used to store data


XML can also be used to store data in files or in databases. Applications can be written to store and retrieve
information from the store, and generic applications can be used to display the data.

An example XML document:


<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The first line in the document: The XML declaration should always be included. It defines the XML version of
the document. In this case the document conforms to the 1.0 specification of XML:

<?xml version="1.0"?>
The next line defines the first element of the document (the root element):

<note>
The next lines defines 4 child elements of the root (to, from, heading, and body):

<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
The last line defines the end of the root element:

</note>

All XML elements must have a closing tag


In HTML some elements do not have to have a closing tag. The following code is legal in HTML:

<p>This is a paragraph
<p>This is another paragraph
In XML all elements must have a closing tag like this:

<p>This is a paragraph</p>
<p>This is another paragraph</p>

XML tags are case sensitive


XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.
Opening and closing tags must therefore be written with the same case:

<Message>This is incorrect</message>

<message>This is correct</message>

All XML elements must be properly nested


In HTML some elements can be improperly nested within each other like this:

<b><i>This text is bold and italic</b></i>


In XML all elements must be properly nested within each other like this

<b><i>This text is bold and italic</i></b>

All XML documents must have a root tag


All XML documents must contain a single tag pair to define the root element. All other elements must be
nested within the root element. All elements can have sub (children) elements. Sub elements must be in
pairs and correctly nested within their parent element:

<root>
<child>
<subchild>
</subchild>
</child>
</root>

Attribute values must always be quoted


XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must
always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:
<?xml version="1.0"?>
<note date=12/11/99>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

<?xml version="1.0"?>
<note date="12/11/99">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

XML Attributes
XML attributes are normally used to describe XML elements, or to provide additional information about
elements. From HTML you can remember this construct: <IMG SRC="computer.gif">. In this HTML example
SRC is an attribute to the IMG element. The SRC attribute provides additional information about the
element.
Attributes are always contained within the start tag of an element. Here are some examples:

HTML examples:

<img src="computer.gif">
<a href="demo.asp">
XML examples:

<file type="gif">
<person id="3344">
Usually, or most common, attributes are used to provide information that is not a part of the content of the
XML document. Did you understand that? Here is another way to express that: Often attribute data is more
important to the XML parser than to the reader. Did you understand it now? Anyway, in the example above,
the person id is a counter value that is irrelevant to the reader, but important to software that wants to
manipulate the person element.

Use of Elements vs. Attributes


Take a look at these examples:

Using an Attribute for sex:

<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

Using an Element for sex:

<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
In the first example sex is an attribute. In the last example sex is an element. Both examples provides the
same information to the reader.
There are no fixed rules about when to use attributes to describe data, and when to use elements. My
experience is however; that attributes are handy in HTML, but in XML you should try to avoid them, as long
as the same information can be expressed using elements.
Here is another example, demonstrating how elements can be used instead of attributes. The following three
XML documents contain exactly the same information. A date attribute is used in the first, a date element is
used in the second, and an expanded date element is used in the third:

<?xml version="1.0"?>
<note date="12/11/99">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<?xml version="1.0"?>
<note>
<date>12/11/99</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<?xml version="1.0"?>
<note>
<date>
<day>12</day>
<month>11</month>
<year>99</year>
</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

Avoid using attributes? (I say yes!)


Why should you avoid using attributes? Should you just take my word for it? These are some of the
problems using attributes:

• attributes can not contain multiple values (elements can)


• attributes are not expandable (for future changes)
• attributes can not describe structures (like child elements can)
• attributes are more difficult to manipulate by program code
• attribute values are not easy to test against a DTD

If you start using attributes as containers for XML data, you might end up with documents that are both
difficult to maintain and to manipulate. What I'm trying to say is that you should use elements to describe
your data. Use attributes only to provide information that is not relevant to the reader. Please don't end up
like this:
<?xml version="1.0"?>
<note day="12" month="11" year="99"
to="Tove" from="Jani" heading="Reminder"
body="Don't forget me this weekend!">
</note>
This don't look much like XML. Got the point?

An Exception to my Attribute rule


Rules always have exceptions. My rule about not using attributes has one too:
Sometimes I assign ID references to elements in my XML documents. These ID references can be used to
access XML element in much the same way as the NAME or ID attributes in HTML. This example
demonstrates this:

<?xml version="1.0"?>
<messages>
<note ID="501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

<note ID="502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not!</body>
</note>
</messages>

The ID in these examples is just a counter, or a unique identifier, to identify the different notes in the XML
file.

Well Formed" XML documents


A "Well Formed" XML document is a document that conforms to the XML syntax rules that we described in
the previous chapter.
The following is a "Well Formed" XML document:

<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

"Valid" XML documents


A "Valid" XML document is a "Well Formed" XML document which conforms to the rules of a Document
Type Definition (DTD).
The following is the same document as above but with an added reference to a DTD:
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document
structure with a list of legal elements. A DTD can be declared inline in your XML document, or as an external
reference.

Internal DTD
This is an XML document with a Document Type Definition: (Open it in IE5, and select view source)

<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The DTD is interpreted like this:
!ELEMENT note (in line 2) defines the element "note" as having four elements: "to,from,heading,body".
!ELEMENT to (in line 3) defines the "to" element to be of the type "CDATA".
!ELEMENT from (in line 4) defines the "from" element to be of the type "CDATA"
and so on.....

External DTD
This is the same XML document with an external DTD: (Open it in IE5, and select view source)

<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
This is a copy of the file "note.dtd" containing the Document Type Definition:

<?xml version="1.0"?>
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>

Why use a DTD?


XML provides an application independent way of sharing data. With a DTD, independent groups of people
can agree to use a common DTD for interchanging data. Your application can use a standard DTD to verify
that data that you receive from the outside world is valid. You can also use a DTD to verify your own data.

USES OF XML (Advantages of xml)

XML has a variety of uses, including the following:

• Web publishing: XML allows you to create interactive pages, allows the
customer to customize those pages, and makes creating e-commerce applications
more intuitive. With XML, you store the data once and then render that content
for different viewers or devices based on style sheet processing using an
Extensible Style Language (XSL)/XSL Transformation (XSLT) processor.
• Web searching and automating Web tasks: XML defines the type of
information contained in a document, making it easier to return useful results
when searching the Web:

For example, using HTML to search for books authored by Tom Brown is likely
to return instances of the term 'brown' outside of the context of author. Using
XML restricts the search to the proper context (for example, the information
contained in the <author> tag) and returns only the desired type of information.
By using XML, Web agents and robots (programs that automate Web searches or
other tasks) are more efficient and produce more useful results.

• General applications: XML provides a standard method to access information,


making it easier for applications and devices of all kinds to use, store, transmit,
and display data.
• e-business applications: XML implementations make electronic data interchange
(EDI) more accessible for information interchange, business-to-business
transactions, and business-to-consumer transactions.
• Metadata applications: XML makes it easier to express metadata in a portable,
reusable format.
• Pervasive computing: XML provides portable and structured information types
for display on pervasive (wireless) computing devices such as personal digital
assistants (PDAs), cellular phones, and others. For example, WML (Wireless
Markup Language) and VoiceXML are currently evolving standards for
describing visual and speech-driven wireless device interfaces.
The building blocks of XML documents
XML documents (and HTML documents) are made up by the following building blocks:
Elements, Tags, Attributes, Entities, PCDATA, and CDATA
This is a brief explanation of each of the building blocks:

Elements
Elements are the main building blocks of both XML and HTML documents.
Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and
"message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are
"hr", "br" and "img".

Tags
Tags are used to markup elements.
A starting tag like <element_name> mark up the beginning of an element, and an ending tag like
</element_name> mark up the end of an element.
Examples:
A body element: <body>body text in between</body>.
A message element: <message>some message in between</message>

Attributes
Attributes provide extra information about elements.
Attributes are placed inside the start tag of an element. Attributes come in name/value pairs. The following
"img" element has an additional information about a source file:

<img src="computer.gif" />


The name of the element is "img". The name of the attribute is "src". The value of the attribute is
"computer.gif". Since the element itself is empty it is closed by a " /".

PCDATA
PCDATA means parsed character data.
Think of character data as the text found between the start tag and the end tag of an XML element.
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities
will be expanded.

CDATA
CDATA also means character data.
CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and
entities will not be expanded.

Entities
Entities as variables used to define common text. Entity references are references to entities.
Most of you will known the HTML entity reference: "&nbsp;" that is used to insert an extra space in an
HTML document. Entities are expanded when a document is parsed by an XML parser.
The following entities are predefined in XML:
Entity References Character
&lt; <
&gt; >
&amp; &
&quot; "
&apos; '

You might also like