Professional Documents
Culture Documents
Lecture 8
presented by
Werner Wild, CEO Evolution Innsbruck, San Francisco, Zurich
Contact: info@evolution.at
Todays Lecture
XML overview (XML eXtensible Markup Language) Building Documents and DTDs (DTD Document Type Declaration)
Links
http://www.w3.org/XML/ http://xml.apache.org/xerces-j/index.html
Family of technologies
XML, XLink, XPath, XPointer, XSL, XSLT, ...
History
Forerunner: SGML (Standard Generalized Markup Language) SGML is very powerful and complicated XML is a subset XML 1.0 standard released 02/2000 (W3C Recommendation)
XML Benefits
<?xml version="1.0"?> <mail-addressbook> <mail-address category="professor"> <name>Werner Wild</name> <email>info@evolution.at</email> </mail-address> <mail-address category="assistant"> <name>Landon Bradshaw</name> <email>landon@bradshaw.org</email> </mail-address> </mail-addressbook>
Readable: meaningful tags support understanding Open: platform-independent data representation Structured: easy to parse and process
XML Syntax
XML Elements
Form of elements
Enclosed between a start tag (<...>) and an end tag (</...>) Start and end tag must have the same name End tags may not be left out as in HTML Special form for tags that don't enclose content (empty tags)
HTML: <br> (empty tag, no corresponding end tag) XML: <br/> (empty tag)
XML Elements
<tag>
<nop>
<other>
<doc>
<tag>
<nop>
Alternative tree representation (emphasizes node ordering) <other>
This is illegal!
XML Elements
String literals
Either "..." or '...'
Don't mix delimiters!
Attributes
If tags are nouns, attributes are adjectives
<mail-address category="professor">
XML Elements
Character references
Represent a displayable character that can't be used literally Form
Entity references
Form: &name;, where name is a legal XML name Predefined entities
&: & character <, >: < and > characters ', ": ' and " characters
XML Elements
Processing instructions
Form: <?target instruction?>
Information for XML file processors target is required, instruction is just some string Usage depends on processors
Comments
Form: <!--...--> May contain any string except --
XML Elements
CDATA sections
For including text containing markup characters Form: <![CDATA[...]]> CDATA sections can not be nested!
Usage example
<doc> <explanation>Next, we will see some XML code.</explanation> <![CDATA[ <?xml version="1.0"?> <some-xml> This is some XML code. Use with care! </some-xml> ]]> <explanation>That was it.</explanation> </doc>
Prolog (optional)
Signal the beginning of XML data, describe character encoding Provide additional information for parser/application
Body
Actual data
Epilog (optional)
"A real design error" (Tim Bray, W3C Rec. co-author) Not covered
<?xml version="1.0"?> <!DOCTYPE mail-addressbook [ <!ELEMENT mail-addressbook (mail-address*)> <!ELEMENT mail-address (name, email+)> <!ATTLIST mail-address category (professor | assistant) #IMPLIED> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> ]> <mail-addressbook> <mail-address category="professor"> <name>Werner Wild</name> <email>info@evolution.at</email> </mail-address> <mail-address category="assistant"> <name>Landon Bradshaw</name> <email>landon@bradshaw.org</email> </mail-address> </mail-addressbook>
XML declaration
Form:
<?xml version="1.0" encoding="..." standalone="..."?>
Declaration forms
<!DOCTYPE root SYSTEM "sys-id"> <!DOCTYPE root PUBLIC "pub-id" "sys-id">
Parameters
root is the name of the document's root node sys-id is a URI pointing to the file containing the DTD
who did it
what is it
DTDs Overview
Why vocabularies?
XML documents are snapshots of domain data structures Used for communication between applications Fixed vocabulary helps in development
Well-formedness
Correct syntax (no overlapping tags, correct nesting, ...) Only one root node No references to external entities (unless a DTD is given)
Validity
Well-formedness plus Document conforms to a given DTD
Writing DTDs
<?xml version="1.0" encoding="ISO-8859-1"?> <!ELEMENT mail-addressbook (mail-address*)> <!ELEMENT mail-address (name, email+)> <!ATTLIST mail-address category (professor | assistant) #IMPLIED> a <mail-address> element <!ELEMENT name (#PCDATA)> consists of a <name> and then of <!ELEMENT email (#PCDATA)> one or more <email> elements
<mail-address> elements have an optional attribute named category which can have one of the two values "professor" and "assistant"
Writing DTDs
Data is not meant to be parsed Specifying the application that handles the data Use with care: may be platform-dependent
Defining Entities
General entities
Used within the contents of a document Define: <!ENTITY copyright "&xA9; 2002 Werner Wild"> Use: ©right;
Defining Entities
Parameter entities
Parsed entities used solely within the DTD Help keeping commonly used constructs in place
<!ENTITY % attrlist "attr1 CDATA #REQUIRED attr2 CDATA #REQUIRED"> <!ATTLIST tag1 %attrlist;> use the percent <!ATTLIST tag2 %attrlist;> sign here as well ...and so forth...
Defining Notations
Defining Elements
Defining Elements
Defining Elements
Improved version
Cardinality operators
Option (may or may not appear, zero or one): ? Zero or more: * One or more: +
Defining Elements
William Henry Gates Carl Philipp Emanuel Bach Douglas "42" Adams Wolfgang Amadeus "Vielschreiber" Mozart
<!ELEMENT person-name (first, (middle* | nick), last)> <person-name> <first>Douglas</first> <nick>42</nick> <last>Adams</last> </person-name>
alternative ("or")
someone with a nickname
Defining Elements
Then again...
Someone may have loads of middle and nicknames grouped Go for it:
<!ELEMENT person-name (first, (middle | nick)*, last)>
first, middle, nick and last contain text Possibilities: is the text intended to be parsed or not? Don't parse: #CDATA Parse: #PCDATA Markup in #PCDATA must be defined in the DTD
<!ELEMENT first (#PCDATA)> <!ELEMENT middle (#PCDATA)> ...and so forth...
Defining Attributes
Any XML element may have attributes (even empty ones) Basic structure of an attribute declaration:
name of the element the attribute belongs to name of the attribute itself attribute type default declaration
Default declarations
#REQUIRED: the attribute must appear #IMPLIED: optional attribute #FIXED "default": must have this value, can be left out "default": if the attribute is not present, default is assumed
CDATA
Attribute may contain simple text without markup No elements, entities etc. allowed
ID
Attributes of that type are intended to have unique value Must be #REQUIRED or #IMPLIED Value must conform to XML naming rules IDs can be referenced
IDREF, IDREFS
References to IDs Use with IDs to model data relationships with unique keys Values must conform to XML naming rules IDREFS attributes contain space-separated lists of IDs
<!ELEMENT <!ATTLIST <!ELEMENT <!ATTLIST person (person-name, address, ...)> person id ID #REQUIRED> customer EMPTY> customer id IDREF #REQUIRED>
ENTITY, ENTITIES
Entities are constructions that appear several times Can also be used as attribute values
<!NOTATION gif SYSTEM "/usr/bin/xv"> <!ENTITY myPicture SYSTEM "pic.gif"> <!ATTLIST elem pic ENTITY #IMPLIED>
Usage:
<elem pic="myPicture">...</elem>
Enumerations
Attribute values may be restricted to several values Values must be valid XML names
NMTOKEN, NMTOKENS
Quite like enumerations, but none are predefined Values are not part of the grammar New ones can be added easily without modifying the DTD Here too, values must be valid XML names
<!ATTLIST mail-address category NMTOKEN #IMPLIED>
DTDs Discussion
DTD syntax
It's SGML, not XML Shouldn't XML documents be described in XML?
Related Technologies
Namespaces
Help avoid name clashes Improve vocabulary reuse
default namespace
<?xml version="1.0"?> <someRoot xmlns="someRoot.dtd" xmlns:otherSpace="otherSpace.dtd"> <someElement> someElement was This belongs to someRoot! defined in someRoot.dtd </someElement> <otherSpace:elem> elem was defined in This belongs to otherSpace! otherSpace.dtd </otherSpace:elem> <someElement otherSpace:attr="imported attribute"/> </someRoot> yes, this is possible
Related Technologies
XLink
Linking to other resources from within an XML document Roughly analogous to hyperlinks
XPath
General specification (W3C) for access to document parts Defines an addressing mechanism
XPointer
Pointing to particular locations in or portions of documents Wraps XPath: standard mechanism to use addresses
XSL, XSLT
Transforming XML documents
Simply using XML to represent data does not make documents more expressive. They must also be well-structured.
>
<
>
The End
Sources
For the preparation of this lecture a lot of sources where used my special thanks go to :
Univ. Helsinki Univ. California San Diego (UCSD) Univ. Darmstadt many others