Saturday 24 January 2015

XML Basics -1

What's the best way of exchanging data between different sources without worrying about how the receiver will use it? What's the best way of creating documents with the right content without worrying how it should be displayed on
the web and then able to display them with all the flexibility one could get?

XML and related topics viz. XSL, DTD, DOM, SAX and Schemas.

XML stands for eXtensible Markup Language. In contrast to HTML that describes visual presentation, XML
describes data in an easily readable format but without any indication of how the data is to be displayed. It is a
database-neutral and device-neutral format. Since XML is truly extensible, rather than a fixed set of elements like
HTML, use of XML will eventually eliminate the need for browser developers and middle-ware tools to add special
HTML tags (extensions).

Listing 1 is an example of a simple XML document.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="Employee.XSL"?>
<Employees>
<Empl id="1">
<FirstName>
Chuck
</FirstName>
<LastName>
White
</LastName>
<Dept>
Finance
</Dept>
</Empl>
</Employees>
Listing 1: Employee.xml


XSL
As explained earlier, XML is more focussed on defining the data, therefore, we need a mechanism to define how this
data should be displayed in browsers, cell phones or any other such devices. This is exactly what XSL (eXtensible Style
Language) does. It defines the rules to interpret the elements of the XML document.
XSL at its most basic provides a capability similar to a "mail merge." The style sheet contains a template of the
desired result structure, and identifies data in the source document to insert into this template. This model for
merging data and templates is referred to as the template-driven model and works well on regular and repetitive data.
Listing 2 is an example of an XSL document for the XML document shown in Listing 1.
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
Design, Develop, and Deploying Your Applications
Page 2 of 11 Paper # 214
<xsl:template match="/">
<HTML>
<BODY>
<h1>Employee Details</h1>
<xsl:for-each select="Employees/Empl">
<b>Empl #
<xsl:value-of select="@id" /> </b>
<i>First Name :
<xsl:value-of select="FirstName" /> </i>
<i>Last Name :
<xsl:value-of select="LastName" /> </i>
<i>Dept :
<xsl:value-of select="Dept" /> </i>
</xsl:for-each>
</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>
Listing 2: Employee.xsl
Figure 1 shows the way the browser interprets the Employee.xml document when combined with the Employee.xsl
document.
Figure 1: Output in browser when Employees.xml is called.
Design, Develop, and Deploying Your Applications


DTD

DTD (Document Type Definition) is a set of rules or grammar that we define to construct our own XML rules (also called
a "vocabulary"). In other words, a DTD provides the rules that define the elements and structure of our new
language.
This is comparable to defining table structures in Oracle for a new system. As we define the columns of a table,
determine the datatypes of the columns, determine if the column is 'Null' allowed or not, the DTD defines the
structure for the XML document. Listing 3 is an example of a basic DTD. The detailed syntax of DTD is covered
later in the paper.
<Employees>
<Empl>
<FirstName>
</FirstName>
<LastName>
</LastName>
<Dept>
</Dept>
</Empl>
</Employees>
Listing 3: Employee DTD


DOM
The Document Object Model (DOM) is a simple, hierarchical naming system that makes all of the objects in the
page, such as text, images, forms etc accessible to us. It is merely a set of plans that allow us to reconstruct the
document to a greater or lesser extent.
By definition, a complete model is one that allows us to reconstruct the whole document down to the smallest detail.
An incomplete DOM is anything less than that.
For the reader's information, the W3 DOM recognizes seventeen types of node objects for XML: Attribute,
CDATASection, Comment, DOMImplementation, Data, Document, DocumentType, DocumentFragment, Element,
Entity, EntityReference, NamedNodeMap, Node, NodeList, Notation, ProcessingInstruction, Text
For a detailed description of other node types, the reader is encouraged to visit the W3 web site at
http://www.w3.org/TR/WD-DOM/object-index.html.


SAX
Simple API for XML (SAX) is one of the two basic APIs for manipulating XML. It is used primarily on the server
side because of its characteristics of not storing the entire document in memory and processing it very fast. However,
SAX should be used mainly for reading XML documents or changing simple contents. Using it to do large-scale
manipulations like re-ordering chapters in a book or any such activities will make it extremely complicated, not that it
cannot be done.


SCHEMA
It’s a mechanism by which rules can be defined to govern the structure and content relationship within a document.
XML Schema Structures specifies the XML Schema definition language, which offers facilities for describing the
structure and constraining the contents of XML 1.0 documents. The schema language, which is itself represented in
XML 1.0 and uses namespaces, substantially reconstructs and considerably extends the capabilities, found in XML 1.0
document type definitions (DTDs). This specification depends on XML Schema Part 2: Datatypes.
Design, Develop, and Deploying Your Applications
Page 4 of 11 Paper # 214
XML Schema Datatypes is part 2 of the specification of the XML Schema language. It defines facilities for defining
datatypes. The datatype language, which is itself represented in XML 1.0, provides a superset of the capabilities found
in XML 1.0 document type definitions (DTDs) for specifying datatypes on elements and attributes.

NAMESPACES
With XML namespaces developers can qualify element names uniquely on the Web and thus avoid conflicts between
elements with the same name. The association of a Universal Resource Identifier (URI) with a namespace is purely to
ensure that two elements with the same name can remain unambiguous; no matter what the URI points to.

WELL-FORMED VS. VALID XML DOCUMENTS:Well-formed documents are those that conform to basic rules of XML such as a) the document must have only one
root element, b) it must have start and end tags for every element etc.
Valid documents are not only well-formed but also have been validated against a DTD (or a schema). A parser usually
does the validations.

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete