3. What is XML?
• Stands for Extensible Markup Language
• First draft was published in 1996
• A revised version as recommendation on Feb
10, 1998 (by W3C)
• XML derived as a subset of SGML (Standard
Generalized Markup Language)
4. XML’s goals
• Before XML
– Data formats were proprietary
• Goals:
– To make data more interchangeable
– Is readable by both humans and machines
6. Advantages
• A clear separation between data and
presentation
• Easy extensibility of XML files
• Hierarchical Data Representation
• Interoperability
8. XML In Practice
• Configuration Files
• Web Services
• Web Content (XHTML)
• Document Management
• Database Systems
• Image Representation
• Business Interoperability
11. Well-Formed XML - XML Prolog
• Optional
• Must come first
• version
• 1.0 (default) or 1.1
• encoding
• UTF-8 (default) or
variety of Unicode
• standalone
• yes (default) or no
17. Well-Formed XML – Elements
• Basic building blocks
• Can be used to show
individual or repetitive
items of data
• 2 ways to define
<element>
content
</element>
<element />
18. Well-Formed XML – Elements
• All elements must be
nested underneath the
root element
• You can’t have the end tag
of an element before the
end tag of one nested
below it
<myElement>
<elementA>
<elementB>
</elementA>
</elementB>
</myElement>
20. Well-Formed XML – Naming
Specifications
• can begin with either an
underscore or an
uppercase or lowercase
letter from the Unicode
character set
• Subsequent characters can
also be a dash (-) or a digit
• Case-sensitive
• the start and end tags
must match exactly
• cannot contain spaces
23. Well-Formed XML – Exercise
• <list><title>The first list</title><item>An item</list>
• <item>An item</item><item>Another item</item>
• <para>Bathing a cat is a <emph>relatively</emph>
easy task as long as the cat is willing.</para>
• <bibl><title>How to Bathe a Cat<author></title>Merlin
Bauer<author></bibl>
25. Well-Formed XML - Attributes
• name-value pairs
associated with an
element
26. Well-Formed XML – Attributes - Rules
• consist of a name and a
value separated by an
equals sign
• The name follows the
same rules as element
names
• The value must be in
quotes
• There must be a value part
• Attribute names must be
unique per element
30. Well-Formed XML – Character content
- Restrictions
• Ampersand (&)
• Left angle bracket (<)
31. Well-Formed XML – Entity and
Character References
• There are two ways of inserting characters into a
document that cannot be used directly
– Entity references
• Start with an ampersand (&)
and finish with a semicolon (;)
• There are five built-in
entity references in XML
– Character references
• Begin with &# and end with a semicolon (;)
• Example: the Greek letter omega (Ω) as a reference it would
be Ω in hexadecimal or Ω in decimal
36. Well-Formed XML – Elements Versus
Attributes
Attributes
• There is only one piece of
data
• Names cannot be repeated
• Make file size is smaller
– Good to sent across network
Elements
• The data is not a simple
type
• Items may need to be
repeated
• Items can be ordered
• A large amount of content
that is just text
39. Well-Formed XML – Processing
Instructions
• is used to communicate with the application
that is consuming the XML
– It is not used directly by the XML parser at all
40. Well-Formed XML – CDATA
• These are used as a way to avoid repetitive
escaping of characters
• Starts with <![CDATA[ and ends with ]]>
• Example: you want data in your document
1 kilometer < 1 mile
1 pint < 1 liter
1 pound < 1 kilogram
49. XML Namespaces – Why do you need
namespaces?
• You won’t always be using own XML formats
entirely within your own systems
50. XML Namespaces – How do you
choose a namespace?
• In Java, are called packages
• In C#, are called namespaces
– System.Windows.Forms.Timer
– System.Timers.Timer
– System.Threading.Timer
51. XML Namespaces – How do you
choose a namespace?
• You can choose virtually any string of
characters to make sure your element’s full
name is unique
• W3C recommend
– URIs
52. URLs, URIs, and URNs
• URL is a Uniform Resource Locator, tells you the
how and where of something
– [Scheme]://[Domain]:[Port]/[Path]?[QueryString]#[Fra
gmentId]
– http://www.wrox.com/remtitle.cgi?isgn=0470114878
• URN is a Uniform Resource Name, is simply a
unique name
– urn:[namespace identifier]:[namespace specific string]
– urn:isbn:9780470114872
• URI is a Uniform Resource Identifier, is URL or
URN
53. XML Namespaces – How to declare a
namespace?
• If you want all elements to be under the
namespace
– Declare a default namespace
54. XML Namespaces – How to declare a
namespace?
• If you want specific elements to be under the
namespace
– Declare a namespace explicitly
– Choose prefix to represent namespace
• Some prefixes are reserved, such as xml, xmlns, and any
other combinations beginning with the characters xml
55. XML Namespaces – How to declare a
namespace?
Qualified
Name
(QName)
Local Name
57. XML Namespaces – Declaring more
than one namespace
• <applicationUsers> element belongs to
http://wrox.com/namespaces/applications/hr
/config namespace
• <user> elements belong to
http://wrox.com/namespaces/general/entities
namespace
61. XML Namespaces – Real world
• XML Schemas
– Defining the structure of a document
• Combination documents
– Merging documents from more than one source
• Versioning
– Differentiating between different versions of an
XML format
63. XML Namespaces – Versioning
• Differentiating between different versions of an XML
format
• Go back to employees.xml
– Namespace is
http://wrox.com/namespaces/general/employee
– Newer version:
http://wrox.com/namespaces/general/employee/
v2
64. XML Namespaces – Versioning
How do I want the application to
handle the two different versions?
65. XML Namespaces – Versioning
• Version one of the application opens a version
one file
• Version one of the application opens a version
two file
• Version two of the application opens a version
one file
• Version two of the application opens a version
two file
67. XML Namespaces – When to use and
not use namespaces
When namespaces are needed
• When there’s no choice
• When you need
interoperability
• When you need validation
When namespaces are not
needed
• When you have the need to
store or exchange data for
relatively small documents
that will be seen only by a
limited number of systems
68. XML Namespaces – Common
namespaces
• The XML Namespace
http://www.w3.org/XML/1998/namespace
– Attributes:
• xml:lang
• xml:space
• xml:base
• xml:id
• xml:Father
69. XML Namespaces – Common
namespaces
• The XMLNS Namespace
http://www.w3.org/2000/xmlns/
• The XML Schema Namespace
http://www.w3.org/2001/XMLSchema
• The XSLT Namespace (xsl or xslt)
http://www.w3.org/1999/XSL/Transform
• The SOAP Namespace (soap, soap12)
http://schemas.xmlsoap.org/soap/envelope/ (SOAP 1.1)
http://www.w3.org/2003/05/soap-envelope (SOAP 1.2)
• The WSDL Namespace (wsdl)
http://www.w3.org/ns/wsdl (1.0, 2.0)
70. XML Namespaces – Common
namespaces
• The Atom Namespace
http://www.w3.org/2005/Atom
• The MathML Namespace
http://www.w3.org/1998/Math/MathML
• The Docbook Namespace
http://docbook.org/ns/docbook
Use for:Represent low-level data: ConfigurationAdd to metadata to documents: <i>, <b>Passing data between different components much easier
-This means that the same underlying data can be used in multiple presentation scenarios. It alsomeans that when moving data, across a network for example, bandwidth is not wasted by havingto carry redundant information concerned only with the look and feelNameFolderBusiness, < word 2003
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
- when you’re designing an XML format for your own data so that you can be sure that any standard XML parser canhandle your document; when you are designing a system that will accept XML input from an external source so you’ll be sure that the data you receive is legitimate XML W3C’s XML Recommendation
For a human reader this isn’t a problemIf asked to find the employee’s title, for example a report showing the title, first name, and last name, there could be a conflict because it can’t choose the correct title without further help
For a human reader this isn’t a problemIf asked to find the employee’s title, for example a report showing the title, first name, and last name, there could be a conflict because it can’t choose the correct title without further help
XML’s main purposes is to share data across systems and organizations
Namespaces is that the declarations themselves look very much like attributes.
Some prefixes are reserved, such as xml, xmlns, and any other combinations beginning with the characters xmlthis just means that you have a namespace URI that is identified by a prefi x of hr; so far none of the elements or attributes are grouped in that namespace. To associate the elements with the namespace you have to add the prefi x to the elements’ tags
The reason for this is that attributes are always associated with an element; they can’t stand alone. Therefore, if the element itself is in a namespace,the attribute is already uniquely identifi able and there’s really no need for it to be a namespace
Remember that the namespace declaration must come either on the element that uses it or on one higher in the tree, an ancestor as it’s often called.Picture 3: How exactly Does Scope Work?
If version one of the software opens a version two fi le, would you expect it to be able to read it or not?Will it just ignore any elements it does not recognize and process the rest as normal, or just reject the fi le out of hand?If, however, you want the applications to be able to cope with both the earlier and the later formats it’s important that the two namespaces are the same. If this isn’t the case, the systems would need to know the namespaces of all possible future XML formats that could be accepted.
When there’s no choice: if you choose to use a format designed by someone else to represent your data, the chances are that the format insists on the elements being in a namespace.When you need interoperability: “Do I need to share this data with other systems, particularly those not developed externally?”When you need validation: XML schema
The prefix xml is bound to the URI http://www.w3.org/XML/1998/namespace and this is hard-coded into all XML parsers so you don’t need to declare it Yourselfxml:space: You met this in Chapter 2. It is used so the author of the document can specifywhether whitespace is signifi cant or not. It takes the values preserve or default.➤ xml:base: This is used in a similar way to the base attribute in HTML. It enables you tospecify a base URL from which any relative URLs in the document will be resolved.➤ xml:id: This specifies that the value of this attribute is a unique identifier within the document.➤ xml:Father: Although rarely seen in practice, its existence proves that the W3C’s XMLcommittee is actually human. It refers to Jon Bosak, a leading light in the XML communitywho chaired the original XML working group. It could be used, for example, when specifyinga document’s author such as <document author=”xml:Father” />
XMLNS: As you’ve seen throughout this chapter the xmlns prefix is used to declare a prefixed namespace in an XML document.XML Schema: is used in schema documents describing the legitimate structure of a particular XML formatXSLT: is primarily used to convert XML into a different format, either a differently-formatted XML or perhaps HTML or just plain textSOAP: It’s an XML format designed to enable method calls between a client and a web serviceWSDL: is used to describe a web service in such a way that clients can programmatically connect to a server, know what methods are available, and formattheir method calls appropriately
Atom: is used for publishing information (such as newsfeeds) and has also been adopted by Microsoft for use in ODATAMathML is used to describe mathematical notations such as equations and their content and structureDocbook namespace is normally used to mark up such things as technical publications and software and hardware manuals