Saturday, March 25, 2006

XML by David Hozner : notes chapter 1

Main features:
1. Easy Data Handling
2. Customizing Markup Language
3. Self Describing Data
4. Structured (well-formed) & Integrated (Valid) data

Well formed: each element must be nested properly within any enclosing element.

Valid : Whether the XML document complies with the associated DTD.

DTD : Data type definition. DTDs can be stored in a separate file or in the document itself using a element.

XML parsing instructions start with "< ?" & close with "? >"

XML Tags : Starts with a letter or '_' and may contain letters, digits, underscores, dots (.) or hyphen (-) but no spaces.

Most browsers check for well-formedness but few check for validity.

The most powerful use of XML is to parse the XML document and to break it down into component parts. These component parts can then handling them in a customized way.

CSS : Cascading Style Sheet
XSL : Extensible Style Language
DOM : Document Object Model
CML : Chemical Markup Language (customized markup language using XML)
XHTML : Extensible Hyper Text Markup Language

.xul extension is used for XML based user interface language document.

XML Parsers:

XML4J : XML for Java by IBM Alphaworks. Conforms well to W3C standards. It is written in Java so it connects well to Java code. Download from

SAX : Simple API for XML

SXP : Silfide XML parser. It is a complete XML API in Java.

The Microsoft parser used in Internet Explorer is implemented as a COM component and can be found at

Java standard Extension for XML : A java package for XML.


CSS : it can set format & placement of elements.

XSL : it can reorder elements in a document, change them entirely, display some but hide others, select style based not just on elements but also element attributes, select style based on element location and so on.

XSL : XSL transformations
XSL formatting objects

XLINK : more powerful than simple hyperlink, these can be bi-directional as well as multi-directional &

sophisticated enough to point to the nearest mirror site from which a resource can be fetched.

XPOINTERS : points to a part in the document. These are smart enough to point to a specific element or to nth

occurance of the element, or to the first child element of any element & so on.


ASCII : 1 byte/256 codes : UTF-8
Unicode : 2 Bytes/64K codes : UTF-16 or UCS-2 or ISO-10646-UCS-2
UCS (Universal Character System): 4 Bytes/2 billion codes : UCS-4 or ISO-10646-UCS-4

One can write a document in local language and then use a translator to convert the same into Unicode.

To insert a particular symbol insert its Unicode with "&#". for example to insert a
π with Unicode value "03c0" in hexadecimal use &# x03c0;

List of Character Set at Internet Assigned Number Authority (IANI)

Java program to convert ASCII to unicode (found in Java SDK) :
C:/ native2ascii file.txt file.uni
It can also convert to other codes besides unicode.

XML Applications :

CHANNEL DEFINATION FORMAT (.cdf) : use a .cdf file to add a site to the user's favorite file folder & subscribe to the channel, checking back periodically for updates.

SMIL : Synchronized Multimedia Integration Language : It is W3C standard at
SMIL applet in Java at

SMIL has become a core part of the Real Networks streaming software and Apple Quicktime.

Comments: Post a Comment

Links to this post:

Create a Link

<< Home
Advertisements - The World's Online Marketplace


This page is powered by Blogger. Isn't yours?

Recent Postings