SlideShare una empresa de Scribd logo
1 de 115
Descargar para leer sin conexión
How to Use XML Parsing to
Enhance
Electronic Commnuication

              Ted Leung
      Chairman, ASF XML PMC
  Principal, Sauria Associates, LLC
          twl@sauria.com
Thank You
    ASF
n

    xerces-j-dev
n

    xerces-j-user
n
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
Overview
    Focus on server side XML
n

    Not document processing
n

    Benefits to servers / e-business
n

        Model-View separation for *ML
    n

        Ubiquitous format for data exchange
    n

             Hope for network effects from data availability
         n
xml.apache.org
    Xerces XML parser
n
        Java
    n

        C++
    n

        Perl
    n

    Xalan XSLT processor
n
        Java
    n

        C++
    n

    FOP
n

    Cocoon
n

    Batik
n

    SOAP/Axis
n
Xerces-J
    Why Java?
n

        Unicode
    n

        Code motion
    n

        Server side cross platform support
    n


    C++ version has lagged Java version
n
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
Basic XML Concepts
    Well formedness
n

    Validity
n

        DTD’s
    n

        Schemas
    n


    Entities
n
Example: RSS
<?xml version=quot;1.0quot; encoding=quot;ISO-8859-1quot;?>

<!DOCTYPE rss PUBLIC quot;-//Netscape Communications//DTD RSS .91//ENquot;
            quot;http://my.netscape.com/publish/formats/rss-0.91.dtdquot;>

<rss version=quot;0.91quot;>

 <channel>
  <title>freshmeat.net</title>
  <link>http://freshmeat.net</link>
  <description>the one-stop-shop for all your Linux software
needs</description>
  <language>en</language>
  <rating>(PICS-1.1 quot;http://www.classify.org/safesurf/quot; 1 r (SS~~000
1))</rating>
  <copyright>Copyright 1999, Freshmeat.net</copyright>
  <pubDate>Thu, 23 Aug 1999 07:00:00 GMT</pubDate>
  <lastBuildDate>Thu, 23 Aug 1999 16:20:26 GMT</lastBuildDate>
  <docs>http://www.blahblah.org/fm.cdf</docs>
RSS 2
<image>
 <title>freshmeat.net</title>
 <url>http://freshmeat.net/images/fm.mini.jpg</url>
 <link>http://freshmeat.net</link>
 <width>88</width>
 <height>31</height>
 <description>This is the Freshmeat image stupid</description>
</image>

<item>
 <title>kdbg 1.0beta2</title>
 <link>http://www.freshmeat.net/news/1999/08/23/935449823.html</link>
 <description>This is the Freshmeat image stupid</description>
</item>

<item>
 <title>HTML-Tree 1.7</title>
 <link>http://www.freshmeat.net/news/1999/08/23/935449856.html</link>
 <description>This is the Freshmeat image stupid</description>
</item>
RSS 3
 <textinput>
  <title>quick finder</title>
  <description>Use the text input below to search freshmeat</description>
  <name>query</name>
  <link>http://core.freshmeat.net/search.php3</link>
 </textinput>

 <skipHours>
  <hour>2</hour>
 </skipHours>

 <skipDays>
  <day>1</day>
 </skipDays>

 </channel>
</rss>
RSS DTD
<!ELEMENT rss (channel)>
<!ATTLIST rss
          version     CDATA #REQUIRED> <!-- must be quot;0.91quot;> -->

<!ELEMENT channel (title | description | link | language | item+ |
                   rating?| image? | textinput? | copyright? | pubDate? |
                   lastBuildDate? | docs? | managingEditor? |
                   webMaster? | skipHours? | skipDays?)*>
<!ELEMENT title (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT link (#PCDATA)>
<!ELEMENT language (#PCDATA)>
<!ELEMENT rating (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT pubDate (#PCDATA)>
<!ELEMENT lastBuildDate (#PCDATA)>
<!ELEMENT docs (#PCDATA)>
<!ELEMENT managingEditor (#PCDATA)>
<!ELEMENT webMaster (#PCDATA)>
<!ELEMENT hour (#PCDATA)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT skipHours (hour+)>
<!ELEMENT skipDays (day+)>
RSS DTD 2
<!ELEMENT item (title | link | description)*>
<!ELEMENT textinput (title | description | name | link)*>
<!ELEMENT name (#PCDATA)>

<!ELEMENT image (title | url | link | width? | height? |
          description?)*>
<!ELEMENT url (#PCDATA)>
<!ELEMENT width (#PCDATA)>
<!ELEMENT height (#PCDATA)>
XML Application Tasks
    Convert XML into application object
n

    Convert application object to XML
n

    Read or write XML into a database
n

    XSLT conversion to XHTML or HTML
n
RSS Objects
      Logical View

         RSSChannel             RSSItem



       +getTitle()         +getTitle()
       +getDescription()   +getDescription()
       +getLink()          +getLink()
       +getItems()
       +getImage()
       +getTextInput()




          RSS Image

                             RSSTextInput
       +getTitle()
       +getDescription()
       +getLink()          +getTitle()
       +getURL()           +getDescription()
       +getHeight()        +getLInk()
       +getWidth()         +getName()
RSSItem 1
public class RSSItem implements java.io.Serializable {

 private String title;
 private String description;
 private String link;

 public String getTitle () {
   return title;
 }

 public void setTitle (String value) {
   String oldValue = title;
   title = value;
 }

 public String getLink () {
   return link;
 }

 public void setLink (String value) {
   String oldValue = link;
   link = value;
 }
RSSItem 2
public String getDescription () {
    return description;
  }

    public void setDescription (String value) {
      String oldValue = description;
      description = value;
    }
}
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
Parser API’s
    What is the job of a parser API?
n

        Make the structure and contents of a
    n

        document available
        Report errors
    n
SAX API
    Event callback style
n

    Representation of elements & atts
n

        must build own stack
    n


    SAX Processing Pipeline model
n

    Development model
n

        Reasonably open
    n

        xml-dev
    n

        http://www.megginson.com/SAX/SAX2
    n
ContentHandler
    The basic SAX2 handler for XML
n

    Callbacks for
n

        Start of document
    n

        End of document
    n

        Start of element
    n

        End of element
    n

        Character data
    n

        Ignorable whitespace
    n
Handler Strategies
    Single Monolithic
n

        Requires the handler to know entirely about the
    n

        grammar
    One per application object and multiplexer
n

        Each application object has its own
    n

        DocumentHandler
        The Multiplexer handler is registered with the
    n

        parser
        Multiplexer is responsible for swapping SAX
    n

        handlers as the context changes.
SAX 2 RSS Handler
public void startElement(String namespaceURI, String localName,
                         String qName, Attributes atts)
                         throws SAXException {
      currentText = new StringBuffer();
      textStack.push(currentText);
      if (localName.equals(quot;rssquot;)) {
          elementStack.push(localName);
      } else if (localName.equals(quot;channelquot;)) {
          elementStack.push(localName);
          currentChannel = new RSSChannel();
          currentItems = new Vector();
          currentChannel.setItems(currentItems);
          currentImage = new RSSImage();
          currentChannel.setImage(currentImage);
          currentTextInput = new RSSTextInput();
          currentChannel.setTextInput(currentTextInput);
          currentChannel.setSkipHours(new Vector());
          currentChannel.setSkipDays(new Vector());
      } else if (localName.equals(quot;titlequot;)) {
      } else if (localName.equals(quot;descriptionquot;)) {
      } else if (localName.equals(quot;linkquot;)) {
      } else if (localName.equals(quot;languagequot;)) {
SAX 2 RSS Handler 2
    } else if (localName.equals(quot;itemquot;)) {
        elementStack.push(localName);
        currentItem = new RSSItem();
    } else if (localName.equals(quot;ratingquot;)) {
    } else if (localName.equals(quot;imagequot;)) {
        elementStack.push(localName);
    } else if (localName.equals(quot;textinputquot;)) {
        elementStack.push(localName);
    } else if (localName.equals(quot;copyrightquot;)) {
    } else if (localName.equals(quot;pubDatequot;)) {
    } else if (localName.equals(quot;lastBuildDatequot;)) {
    } else if (localName.equals(quot;docsquot;)) {
    } else if (localName.equals(quot;managingEditorquot;)) {
    } else if (localName.equals(quot;webMasterquot;)) {
    } else if (localName.equals(quot;hourquot;)) {
    } else if (localName.equals(quot;dayquot;)) {
    } else if (localName.equals(quot;skipHoursquot;)) {
        elementStack.push(localName);
    } else if (localName.equals(quot;skipDaysquot;)) {
        elementStack.push(localName);
    } else {}
}
SAX 2 RSS Handler 3
public void endElement(String namespaceURI, String localName,
                       String qName) throws SAXException {
       try {
           String stackValue = (String) elementStack.peek();
           String text = ((StringBuffer)textStack.pop()).toString();
           if (localName.equals(quot;rssquot;)) {
               elementStack.pop();
           } else if (localName.equals(quot;channelquot;)) {
               elementStack.pop();
           } else if (localName.equals(quot;titlequot;)) {
               if (stackValue.equals(quot;channelquot;)) {
                   currentChannel.setTitle(text);
               } else if (stackValue.equals(quot;imagequot;)) {
                   currentImage.setTitle(text);
               } else if (stackValue.equals(quot;itemquot;)) {
                   currentItem.setTitle(text);
               } else if (stackValue.equals(quot;textinputquot;)) {
                   currentTextInput.setTitle(text);
               } else {}
SAX 2 RSS Handler 4
     } else if (localName.equals(quot;descriptionquot;)) {
         if (stackValue.equals(quot;channelquot;)) {
             currentChannel.setDescription(text);
         } else if (stackValue.equals(quot;imagequot;)) {
             currentImage.setDescription(text);
         } else if (stackValue.equals(quot;itemquot;)) {
             currentItem.setDescription(text);
         } else if (stackValue.equals(quot;textinputquot;)) {
             currentTextInput.setDescription(text);
         } else {}
     } else if (localName.equals(quot;linkquot;)) {
         if (stackValue.equals(quot;channelquot;)) {
             currentChannel.setLink(text);
         } else if (stackValue.equals(quot;imagequot;)) {
             currentImage.setLink(text);
         } else if (stackValue.equals(quot;itemquot;)) {
             currentItem.setLink(text);
         } else if (stackValue.equals(quot;textinputquot;)) {
             currentTextInput.setLink(text);
         } else {}
SAX 2 RSS Handler 5
     } else if (localName.equals(quot;languagequot;)) {
         currentChannel.setLanguage(text);
     } else if (localName.equals(quot;itemquot;)) {
         currentItems.add(currentItem);
         elementStack.pop();
     } else if (localName.equals(quot;ratingquot;)) {
         currentChannel.setRating(text);
     } else if (localName.equals(quot;imagequot;)) {
         currentChannel.setImage(currentImage);
         elementStack.pop());
     } else if (localName.equals(quot;heightquot;)) {
         currentImage.setHeight(text);
     } else if (localName.equals(quot;widthquot;)) {
         currentImage.setWidth(text);
     } else if (localName.equals(quot;urlquot;)) {
         currentImage.setURL(text);
     } else if (localName.equals(quot;textinputquot;)) {
         currentChannel.setTextInput(currentTextInput);
         elementStack.pop());
     } else if (localName.equals(quot;namequot;)) {
         currentTextInput.setName(text);
SAX 2 RSS Handler 6
       } else if (localName.equals(quot;copyrightquot;)) {
           currentChannel.setCopyright(text);
       } else if (localName.equals(quot;pubDatequot;)) {
           currentChannel.setPubDate(text);
       } else if (localName.equals(quot;lastBuildDatequot;)) {
           currentChannel.setLastBuildDate(text);
       } else if (localName.equals(quot;docsquot;)) {
           currentChannel.setDocs(text);
       } else if (localName.equals(quot;managingEditorquot;)) {
           currentChannel.setManagingEditor(text);
       } else if (localName.equals(quot;webMasterquot;)) {
           currentChannel.setWebMaster(text);
       } else if (localName.equals(quot;hourquot;)) {
           currentHour = text;
       } else if (localName.equals(quot;dayquot;)) {
           currentDay = text;
       } else if (localName.equals(quot;skipHoursquot;)) {
           currentChannel.getSkipHours().add(currentHour);
           elementStack.pop();
       } else if (localName.equals(quot;skipDaysquot;)) {
           currentChannel.getSkipDays().add(currentDay);
           elementStack.pop();
       } else {}
   } catch (Exception e) {
       e.printStackTrace();
SAX 2 RSS Handler 7
public void characters(char[] ch,int start,int length) throws SAXException {
    currentText.append(ch, start, length);
}
SAX 2 Basic Event Handling
    Basic Event Handling
n

        attribute handling in startElement
    n

        characters() for content
    n

        character data handling in endElement
    n

             Multiple characters calls
         n
SAX 2 RSS ErrorHandler
public class SAX2RSSErrorHandler implements ErrorHandler {
  public void warning(SAXParseException ex) throws SAXException
{
    System.err.println(quot;[Warning] ”+getLocationString(ex)+quot;:”+
      ex.getMessage());
  }

  public void error(SAXParseException ex) throws SAXException {
    System.err.println(quot;[Error] quot;+getLocationString(ex)+quot;:”+
      ex.getMessage());
  }

  public void fatalError(SAXParseException ex) throws
SAXException {
    System.err.println(quot;[Fatal Error]quot;+getLocationString(ex)+quot;:”
      +ex.getMessage());
    throw ex;
  }
SAX 2 RSS ErrorHandler 2
private String getLocationString(SAXParseException ex) {
  StringBuffer str = new StringBuffer();
  String systemId = ex.getSystemId();
  if (systemId != null) {
    int index = systemId.lastIndexOf('/');
    if (index != -1)
      systemId = systemId.substring(index + 1);
      str.append(systemId);
    }
    str.append(':');
    str.append(ex.getLineNumber());
    str.append(':');
    str.append(ex.getColumnNumber());
    return str.toString();
  }
}
SAX 2 Error Handling
    Error Handling
n

        warning
    n

        error – parser may recover
    n

        fatalError – XML 1.0 spec errors
    n


    Locators
n
SAX 2 RSS Driver
public static void main(String args[]) {
  XMLReader r = new SAXParser();
  ContentHandler h = new SAX2RSSHandler();
  r.setContentHandler(h);
  ErrorHandler eh = new SAX2RSSErrorHandler();
  r.setErrorHandler(eh);
  try {
    r.parse(quot;file:///Work/fm0.91_full.rdfquot;);
  } catch (SAXException se) {
    System.out.println(quot;SAX Error quot;+se.getMessage());
    se.printStackTrace();
  } catch (IOException e) {
    System.out.println(quot;I/O Error quot;+e.getMessage());
    e.printStackTrace();
  } catch (Exception e) {
    System.out.println(quot;Error quot;+e.getMessage());
    e.printStackTrace();
  }
  System.out.println(((SAX2RSSHandler) h).getChannel().toString());
}
Entity Handling
    Entity / Catalog handling
n

        Catalog handling is mostly for document
    n

        processing (SGML) applications
        Pluggable entity handling is important for
    n

        server applications.
             Provides access point to DBMS, LDAP, etc.
         n


    predefined character entities
n

        amp, lt, gt, apos, quot
    n
Entity Handler
    <!DOCTYPE rss PUBLIC quot;-//Netscape Communications//DTD RSS91//ENquot;
n
    quot;http://my.netscape.com/publish/formats/rss-0.91.dtdquot;>




    This will do a network fetch!
n

public class SAX1RSSEntityHandler implements EntityResolver {
  public SAX1RSSEntityHandler() {
  }
  public InputSource resolveEntity(String publicId,String systemId)
   throws SAXException, IOException {
    if (publicId.equals(quot;-//Netscape Communications//DTD RSS
   91//ENquot;)) {
      FileReader r = new FileReader(quot;/usr/share/xml/rss91.dtdquot;);
      return new InputSource(r);
    }
    return null;
  }
}
EntityDriver
public static void main(String args[]) {
  XMLReader r = new SAXParser();
  ContentHandler h = new SAX2RSSHandler();
  r.setContentHandler(h);
  ErrorHandler eh = new SAX2RSSErrorHandler();
  r.setErrorHandler(eh);
  EntityResolver er = new SAX2RSSEntityResolver();
  r.setEntityResolver(er);
  try {
    r.parse(quot;file:///Work/fm0.91_full.rdfquot;);
  } catch (SAXException se) {
    System.out.println(quot;SAX Error quot;+se.getMessage());
    se.printStackTrace();
  } catch (IOException e) {
    System.out.println(quot;I/O Error quot;+e.getMessage());
    e.printStackTrace();
  } catch (Exception e) {
    System.out.println(quot;Error quot;+e.getMessage());
    e.printStackTrace();
  }
  System.out.println(((SAX2RSSHandler) h).getChannel().toString());
}
DefaultHandler
    The kitchen sink
n

    Lets you do ContentHandler,
n

    ErrorHandler, EntityResolver and DTD
    Handler
    A good way to go if you are only
n

    overriding a small number of methods
Locators
public class LocatorSample extends DefaultHandler {
  Locator fLocator = null;

    public void setDocumentLocator(Locator l) {
        fLocator = l;
    }

  public void startElement(String namespaceURI,
         String localName, String qName, Attributes atts) throws
SAXException {
    System.out.println(localName+quot; @ quot;+fLocator.getLineNumber()+quot;,
quot;+fLocator.getColumnNumber());
    }

    public static void main(String args[]) {
      XMLReader r = new SAXParser();
      ContentHandler h = new LocatorSample();
      h.setDocumentLocator(((SAXParser) r).getLocator());
      r.setContentHandler(h);

        try {
          r.parse(quot;file:///Work/fm0.91_full.rdfquot;);
        } catch (Exception e) {}
    }
}
SAX 2 Extension Handlers
    LexicalHandler
n

        Entity boundaries
    n

        DTD bounaries
    n

        CDATA sections
    n

        Comments
    n


    DeclHandler
n

        ElementDecls and AttributeDecls
    n

        Internal and External Entity Decls
    n

        NO parameter entities
    n


    Compatibility Adapters
n
SAX 2 Configurable
    Uses strings (URI’s) as lookup keys
n

    Features - booleans
n

        validation
    n

        namespaces
    n


    Properties - Objects
n

        Extensibility by returning an object that
    n

        implements additional function
Configurable Example
  public static void main(String[] argv) throws Exception {
    // construct parser; set features
    XMLReader parser = new SAXParser();
    try {
      parser.setFeature(quot;http://xml.org/sax/features/namespacesquot;,
 true);
      parser.setFeature(quot;http://xml.org/sax/features/validationquot;,
 true);
     parser.setProperty(“http://xml.org/sax/properties/declaration-
 handler”, dh)
    } catch (SAXException e) {
      e.printStackTrace(System.err);
    }
 }
Xerces and Configurable
    Standard way to set all parser
n

    configuration settings.
    Applies to both SAX and DOM Parsers
n
Xerces Features
    Inclusion of general entities
n

    Inclusion of parameter entities
n

    Dynamic validation
n

    Extra warnings
n

    Allow use of java encoding names
n

    Lazy evaluating DOM
n

    DOM EntityReference creation
n

    Inclusion of igorable whitespace in DOM
n

    Schema validation [also full checking]
n

    Control of Non-Validating behavior
n
        Defaulting attributes & attribute type-checking
    n
Xerces Properties
    Name of DOM implementation class
n

    DOM node currently being parsed
n

    Values for
n

        noNamespaceSchemaLocation
    n

        SchemaLocation
    n
Validation
    When to use validation
n

        System boundaries
    n

        Xerces 2 pluggable futures
    n


    non-validating != wf checking
n

        Not required to read external entities
    n


    validation dial settings
n
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
DOM API
    Object representation of elements &
n

    atts
    Development Model
n

        W3C Recommendations and process
    n


    http:///www.w3.org/DOM
n
DOM Instance (XML)
<?xml version=quot;1.0quot; encoding=quot;US-ASCIIquot;?>
<a>
 <b>in b</b>
 <c>
  text in c
  <d/>
 </c>
</a>
DOM Tree
                   Document




                    Element
                       a




 Text    Element   Text                        Element             Text
  [lf]      b       [lf]                          c                 [lf]




          Text                   Text                    Element           Text
          in b             [lf]text in c[lf]                d               [lf]
DOM Architecture
   DocumentType
                                                                 Node



                                                          +getNodeName()
                                                          +getNodeValue()
                                                                                                   ProcessingInstruction
                                                          +getChildNodes()
                                                          +getFirstChild()
     Document
                                                          +getLastChild()
                                                          +getNextSibling()
                                                          +getPreviousSibling()
                                                          +insertBefore()
                                                          +appendChild()
                                                          +replaceChild()
                                                          +removeChild()




       Element                     Attr



+getAttribute()
+setAttribute()
+removeAttribute()
+getTagName()                             CharacterData               EntityReference    Entity            Notation
+getAttributeNode()
+setAttributeNode()
+removeAttributeNode()
+normalize()



                                                                                        NodeList        NamedNodeMap
                         Comment              Text                   CDataSection
DOM RSS
public static void main(String args[]) {
  DOMParser p = new DOMParser();

    try {
      p.parse(quot;file:///Work/fm0.91_full.rdfquot;);
      System.out.println(dom2Channel(p.getDocument()).toString());
    } catch (SAXException se) {
      System.out.println(quot;Error during parsing quot;+se.getMessage());
      se.printStackTrace();
    } catch (IOException e) {
      System.out.println(quot;I/O Error during parsing “+e.getMessage());
      e.printStackTrace();
    }
}
DOM2Channel
 public static RSSChannel dom2Channel(Document d) {
    RSSChannel c = new RSSChannel();
    Element channel = null;
    NodeList nl = d.getElementsByTagName(quot;channelquot;);
    channel = (Element) nl.item(0);
    for (Node n = channel.getFirstChild(); n != null;
         n = n.getNextSibling()) {
      if (n.getNodeType() != Node.ELEMENT_NODE)
        continue;
      Element e = (Element) n;
      e.normalize();
      String text = e.getFirstChild().getNodeValue();
      if (e.getTagName().equals(quot;titlequot;)) {
        c.setTitle(text);
      } else if (e.getTagName().equals(quot;descriptionquot;)) {
        c.setDescription(text);
      } else if (e.getTagName().equals(quot;linkquot;)) {
        c.setLink(text);
      } else if (e.getTagName().equals(quot;languagequot;)) {

...
DOM2Channel 2
…

        }
        nl = channel.getElementsByTagName(quot;itemquot;);
        c.setItems(dom2Items(nl));
        nl = channel.getElementsByTagName(quot;imagequot;);
        c.setImage(dom2Image(nl.item(0)));
        nl = channel.getElementsByTagName(quot;textinputquot;);
        c.setTextInput(dom2TextInput(nl.item(0)));
        nl = channel.getElementsByTagName(quot;skipHoursquot;);
        c.setSkipHours(dom2SkipHours(nl.item(0)));
        nl = channel.getElementsByTagName(quot;skipDaysquot;);
        c.setSkipDays(dom2SkipDays(nl.item(0)));
        return c;
    }
DOM Node Traversal Methods
    Two paradigms
n

    Directly from the Node
n

        Node#getFirstChild
    n

        Node#getNextSibling
    n


    From the Nodelist for a Node
n

        Node#getChildren
    n

        NodeList#item(int i)
    n
DOM2Items
  public static Vector dom2Items(NodeList nl) {
    Vector result = new Vector();    Node item = null;
    for (int x = 0; x < nl.getLength(); x++) {
      item = nl.item(x);
      if (item.getNodeType() != Node.ELEMENT_NODE) continue;
      RSSItem i = new RSSItem();
      for (Node n = item.getFirstChild(); n != null; n =
n.getNextSibling()) {
        if (n.getNodeType() != Node.ELEMENT_NODE) continue;
        Element e = (Element) n;
        e.normalize();
        String text = e.getFirstChild().getNodeValue();
        if (e.getTagName().equals(quot;titlequot;)) {
          i.setTitle(text);
        } else if (e.getTagName().equals(quot;descriptionquot;)) {
          i.setDescription(text);
        } else if (e.getTagName().equals(quot;linkquot;)) {
          i.setLink(text);
        }}
      result.add(i);
    }
    return result;
DOM Level 1
    Attr
n

        API for result as objects
    n

        API for result as strings
    n


    No DTD support
n

    Need Import - copying between DOMs
n

    is disallowed
DOM Level 2
    W3C Recommendation
n

    Core / Namespaces
n

        Import
    n


    Traversal
n

        Views on DOM Trees
    n


    Events
n

    Range
n

    Stylesheets
n
DOM L2 Traversal 1
public void parse() {
   DOMParser p = new DOMParser();
   try {
     p.parse(quot;file:///Work//fm0.91_full.rdfquot;);
     Document d = p.getDocument();

         TreeWalker treeWalker = ((DocumentTraversal)d).createTreeWalker(
                       d,
                       NodeFilter.SHOW_ALL,
                       new DOMFilter(),
                       true);
         for (Node n = treeWalker.getRoot(); n != null; n=treeWalker.nextNode()
{
             System.out.println(n.getNodeName());
          }
        } catch (IOException e) { e.printStackTrace(); }
    }

    public static void main(String args[]) {
        DOMRSSTraversal t = new DOMRSSTraversal();
        t.parse();
    }
DOM L2 Traversal 2
public class DOMFilter implements NodeFilter {
     public short acceptNode(Node n) {
         if (n.getNodeName().length() > 3)
             return FILTER_ACCEPT;
         else
             return FILTER_REJECT;
     }
 }
DOM L3
    W3C Working Draft
n

    Improve round trip fidelity
n

    Abstract Schemas / Grammar Access
n

    Load & Save
n

    XPath access
n
Xerces DOM extensions
    Feature controls insertion of ignorable
n

    whitespace
    Feature controls creation of
n

    EntityReference Nodes
    User data object provided on
n

    implementation class
    org.apache.xml.dom.NodeImpl
DOM Entity Reference Nodes 1
<?xml version=quot;1.0quot; encoding=quot;US-ASCIIquot;?>
<!DOCTYPE a [
<!ENTITY boilerplate quot;insert this herequot;>
]>
<a>
<b>in b</b>
<c>
 text in c but &boilerplate;
 <d/>
</c>
</a>
DOM Entity Reference Nodes 2
      Document Type Node
n

                                            Document




                                            Element
Document Type
                                               a




    Entity         Text           Element     Text     Element   Text
                    [lf]             b         [lf]       c       [lf]




     Text
insert this here           Text
                           in b
DOM Entity Reference Nodes 3
    Entity Reference Node
n

                                                         Element
                                                            c




           Text          Entity Reference        Text        Element   Text
     [lf]text in c but                            [lf]          d       [lf]




                                    Text
                              insert this here
Xerces Lazy DOM
    Delays object creation
n

        a node is accessed - its object is created
    n

        all siblings are also created.
    n


    Reduces time to complete parsing
n

    Reduces memory usage
n

    Good if you only access part of the
n

    DOM
Tricky DOM stuff
    Entity and EntityReference nodes –
n

    depends on parser settings
    Ignorable whitespace
n

    Subclassing the DOM
n

        Use the Document factory class
    n
DOM vs SAX
    DOM
n

        creates a document object tree
    n

        tree must the be reparsed & converted to
    n

        BO’s
        Could subclass BO’s from DOM classes
    n


    SAX
n

        event handler based API
    n

        stack management
    n

        no intermediate data structures generated
    n
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
JAXP
    Java API for XML Parsing
n

    Sun JSR-061
n

    Version 1.1
n

    Abstractions for SAX, DOM, XSLT
n

    Support in Xerces
n
SAX in JAXP
public static void main(String args[]) {
  XMLReader r = null;
  SAXParserFactory spf = SAXParserFactory.newInstance();
  try {
    r = spf.newSAXParser().getXMLReader();
  } catch (Exception e) {
    System.exit(1);
  }
  ContentHandler h = new SAX2RSSHandler();
  r.setContentHandler(h)
  ErrorHandler eh = new SAX2RSSErrorHandler();
  r.setErrorHandler(eh);
  try {
    r.parse(quot;file:///Work/fm0.91_full.rdfquot;);
  } catch (Exception e) {
    System.out.println(quot;Error during parsing quot;+e.getMessage());
    e.printStackTrace();
  }
  System.out.println(((SAX2RSSHandler)
h).getChannel().toString());
}
DOM in JAXP
public static void main(String args[]) {
  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
  DocumentBuilder db = null;
  try {
    db = dbf.newDocumentBuilder();
  } catch (ParserConfigurationException pce) {
    System.exit(1);
  }
  try {
    Document doc = db.parse(quot;file:///Work/fm0.91_full.rdfquot;);
  } catch (SAXException se) {
  } catch (IOException ioe) {
  }
  System.out.println(dom2Channel(doc).toString());
}
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
Namespaces
    Purpose
n
        Syntactic discrimination only
    n

        They don’t point to anything
    n

        They don’t imply schema composition/combination
    n

    Universal Names
n
        URI + Local Name
    n

        {http://www.mydomain.com}Element
    Declarations
n
        xmlns “attribute”
    n

    Prefixes
n

    Scoping
n

    Validation and DTDs
n
Namespaces
    Purpose
n
        Syntactic discrimination only
    n

        They don’t point to anything
    n

        They don’t imply schema composition/combination
    n

    Universal Names
n
        URI + Local Name
    n

        {http://www.mydomain.com}Element
    Declarations
n
        xmlns “attribute”
    n

    Prefixes
n

    Scoping
n

    Validation and DTDs
n
Namespace Example
<xsl:stylesheet version=quot;1.0“
                xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transform“
                xmlns:html=quot;http://www.w3.org/TR/xhtml1/strictquot;>
 <xsl:template match=quot;/quot;>
   <html:html>
     <html:head>
       <html:title>Title</html:title>
    </html:head>
    <html:body>
      <xsl:apply-templates/>
    </html:body>
 </xsl:template>
...
</xsl:stylesheet>
Default Namespace Example
<xsl:stylesheet version=quot;1.0“
                xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transform“
                xmlns=quot;http://www.w3.org/TR/xhtml1/strictquot;>
 <xsl:template match=quot;/quot;>
   <html>
     <head>
       <title>Title</title>
    </head>
    <body>
      <xsl:apply-templates/>
    </body>
 </xsl:template>
...
</xsl:stylesheet>
SAX2 ContentHandler
public class NamespaceDriver extends DefaultHandler implements ContentHandler {

    public void startPrefixMapping(String prefix,String uri) throws SAXException {
      System.out.println(quot;startMapping quot;+prefix+quot;, uri = quot;+uri);
    }

    public void endPrefixMapping(String prefix) throws SAXException {
      System.out.println(quot;endMapping quot;+prefix);
    }

    public void startElement(String namespaceURI,String localName,String
                             qName,Attributes atts) throws SAXException {
      System.out.println(quot;Start: nsURI=quot;+namespaceURI+quot; localName= quot;+localName+
                         quot; qName = quot;+qName);
      printAttributes(atts);
    }

    public void endElement(String namespaceURI,String localName,
                           String qName) throws SAXException {
      System.out.println(quot;End: nsURI=quot;+namespaceURI+quot; localName= quot;+localName+
                         quot; qName = quot;+qName);
    }
}
SAX2 ContentHandler 2
public void printAttributes(Attributes atts) {
    for (int i = 0; i < atts.getLength(); i++) {
      System.out.println(quot;tAttribute quot;+i+quot; URI=quot;+atts.getURI(i)+
                         quot; LocalName=quot;+atts.getLocalName(i)+
                         quot; Type=quot;+atts.getType(i)+quot; Value=quot;+atts.getValue(i));
    }
  }
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    Performance
n

    JDOM/DOM4J
n

    Xerces Architecture
n
XML Schema
    Richer grammar specification
n

    W3C Recommendation
n

    Structures
n

        XML Instance Syntax
    n

        Namespaces
    n

        Weird content models
    n

    Datatypes
n

        Lots
    n

    Support in Xerces 1.4
n
RSS Schema
<?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?>
<!DOCTYPE xs:schema
          PUBLIC quot;-//W3C//DTD XMLSCHEMA 200102//EN“
          quot;XMLSchema.dtdquot; >

<xs:schema
 targetNamespace=“http://my.netscape.com/publish/formats/rss-0.91”
 xmlns=quot;http://my.netscape.com/publish/formats/rss-0.91quot;>
 <xs:element name=quot;rssquot;>
  <xs:complexType>
   <xs:sequence>
    <xs:element ref=quot;channelquot;/>
   </xs:sequence>
   <xs:attribute name=quot;versionquot; type=quot;xs:stringquot; fixed=quot;0.91quot;/>
  </xs:complexType>
 </xs:element>
RSS Schema 2
<xs:element name=quot;channelquot;>
   <xs:complexType>
    <xs:choice minOccurs=quot;0quot; maxOccurs=quot;unbounded“>
      <xs:element ref=quot;titlequot;/>
      <xs:element ref=quot;descriptionquot;/>
      <xs:element ref=quot;linkquot;/>
      <xs:element ref=quot;languagequot;/>
      <xs:element ref=quot;itemquot; minOccurs=quot;1quot; maxOccurs=quot;unboundedquot;/>
      <xs:element ref=quot;ratingquot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;imagequot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;textinputquot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;copyrightquot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;pubDatequot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;lastBuildDatequot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;docsquot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;managingEditorquot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;webMasterquot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;skipHoursquot; minOccurs=quot;0quot;/>
      <xs:element ref=quot;skipDaysquot; minOccurs=quot;0quot;/>
     </xs:choice>
   </xs:complexType>
  </xs:element>
RSS Schema 3
…
 <xs:element   name=quot;urlquot; type=quot;xs:anyURIquot;/>
 <xs:element   name=quot;namequot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;ratingquot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;languagequot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;widthquot; type=quot;xs:integerquot;/>
 <xs:element   name=quot;heightquot; type=quot;xs:integerquot;/>
 <xs:element   name=quot;copyrightquot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;pubDatequot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;lastBuildDatequot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;docsquot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;managingEditorquot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;webMasterquot; type=quot;xs:stringquot;/>
 <xs:element   name=quot;hourquot; type=quot;xs:integerquot;/>
 <xs:element   name=quot;dayquot; type=quot;xs:integerquot;/>
</xs:schema>
RelaxNG
    Alternative to XML Schema
n

    Satisifies the big 3 goals
n

        XML Instance Syntax
    n

        Datatyping
    n

        Namespace Support
    n


    More simple and orthogonal than XML
n
    Schema
    OASIS TC
n

    Jing
n

        http://www.thaiopensource.com/jing
    n
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
Grammar Access
    No DOM standard
n

    SAX 2
n

    Common Applications
n

        Editors
    n

        Stub generators
    n


    Also experimental API in Xerces
n
DTD Access Example
public class DTDDumper implements DeclHandler {
  public void elementDecl (String name, String model) throws SAXException {
    System.out.println(quot;Element quot;+name+quot; has this content model: quot;+model);
  }

  public void attributeDecl (String eName, String aName, String type,
                             String valueDefault, String value) throws
SAXException {
    System.out.print(quot;Attribute quot;+aName+quot; on Element quot;+eName+quot; has type
quot;+type);
    if (valueDefault != null)
      System.out.print(quot;, it has a default type of quot;+valueDefault);
    if (value != null)
      System.out.print(quot;, and a default value of quot;+value);
    System.out.println();
  }
DTD Access Example 2
 public void internalEntityDecl (String name, String value) throws
SAXException {
    String peText = name.startsWith(quot;%quot;) ? quot;Parameter quot; : quot;quot;;
    System.out.println(quot;Internal quot;+peText+quot;Entity quot;+name+quot; has value:
quot;+value);
  }

  public void externalEntityDecl (String name, String publicId, String
systemId) throws SAXException {
    String peText = name.startsWith(quot;%quot;) ? quot;Parameter quot; : quot;quot;;
    System.out.print(quot;External quot;+peText+quot;Entity quot;+name);
    System.out.print(quot;is available at quot;+systemId);
    if (publicId != null)
      System.out.print(quot; which is known as quot;+publicId);
    System.out.println();
  }
DTD Access Driver
public static void main(String args[]) {
  try {
    XMLReader r = new SAXParser();
    r.setProperty(quot;http://xml.org/sax/properties/declaration-handlerquot;,
                  new DTDDumper());
      r.parse(args[0]);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}
Grammar Access in Schema
    In XML Schema or RelaxNG
n

    Easy, just parse an XML document
n
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
Round Tripping
    Use SAX2 - DOM can’t do it until L3
n

    XML Decl
n

    Comments (SAX2)
n

    PE Decls
n
“Serializing”
    Several choices
n

        XMLSerializer
    n

        TextSerializer
    n

        HTMLSerializer
    n

             XHTMLSerializer
         n


    Work as either:
n

        SAX DocumentHandlers
    n

        DOM serializers
    n
Serializer Example
DOMParser p = new DOMParser();
p.parse(quot;file:///twl/Work/ApacheCon/fm0.91_full.rdfquot;);
Document d = p.getDocument();
// XML
OutputFormat format = new OutputFormat(quot;xmlquot;, quot;UTF-8quot;, true);
XMLSerializer serializer = new XMLSerializer(System.out, format);
serializer.serialize(d);
// XHTML
format = new OutputFormat(quot;xhtmlquot;, quot;UTF-8quot;, true);
serializer = new XHTMLSerializer(System.out, format);
serializer.serialize(d);
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
Grammar Design
    Attributes vs Elements
n

        performance tradeoff vs programming
    n

        tradeoff
        At most 1 attribute, many subelements
    n


    ID & IDREF
n

    NMTOKEN vs CDATA
n

    CDATASECTIONs
n
        <![CDATA[ My data & stuff    ]]>
    n


    Ignorable Whitespace
n
Grammar Design 2
    Everything is Strings
n

    Data modelling
n

        Relationships as elements
    n

        modelling inheritance
    n


    Plan to use schema datatypes
n

        avoid weird types
    n
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
JDOM
    JDOM Goals
n

        Lightweight
    n

        Java Oriented
    n

             Uses Collections
         n


        Provides input and output (serialization)
    n


    Class not interface based
n
JDOM Input
public static void printElements(Element e, OutputStream out)
       throws IOException, JDOMException {
  out.write((quot;n===== quot; + e.getName() + quot;: nquot;).getBytes());
  out.flush();
  for (Iterator i=e.getChildren().iterator(); i.hasNext(); ) {
    Element child = (Element)i.next();
    printElements(child, out);
    }
  out.flush();
}

public static void main(String args[]) {
  SAXBuilder b = new
                 SAXBuilder(quot;org.apache.xerces.parsers.SAXParserquot;);
  try {
    Document d = b.build(quot;file:///Work/fm0.91_full.rdfquot;);
    printElements(d.getRootElement(),System.out);
  } catch (Exception e) {
  }
}
JDOM Output
public static void main(String args[]) {
  SAXBuilder builder = new
             SAXBuilder(quot;org.apache.xerces.parsers.SAXParserquot;);
  try {
    Document d = builder.build(quot;file:///Work/fm0.91_full.rdfquot;);
    XMLOutputter o = new XMLOutputter(quot;n> quot;); // indent
    o.output(d, System.out);
  } catch (Exception e) {
  }
}
DOM4J
    DOM4J History
n

        JDOM Fork
    n


    DOM4J Goals
n

        Java2 Support
    n

            Collections
        n


        Integrated XPath support
    n

        Interface based
    n

        Schema data type support
    n

            From schema file, not PSVI
        n
DOM4J XPath
public static void main(String args[]) {
  try {
    SAXReader r = new SAXReader(quot;org.apache.xerces.parsers.SAXParserquot;);
    Document d = r.read(quot;file:///Work/ASF/ApacheCon/fm0.91_full.rdfquot;);
    String xpath = quot;//titlequot;;
    List list = d.selectNodes( xpath );
    System.out.println( quot;Found: quot; + list.size() + quot; node(s)quot; );
    System.out.println( quot;Results:quot; );
    XMLWriter writer = new XMLWriter();
    for ( Iterator iter = list.iterator(); iter.hasNext(); ) {
      Object object = iter.next();
      writer.write( object );
      writer.println();
    }
    writer.flush();
  } catch (Exception e) {
    e.printStackTrace();
  }
}
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    JDOM/DOM4J
n

    Performance
n

    Xerces Architecture
n
Sosnoski Benchmarks
    http://www.sosnoski.com/opensrc/xmlbench/index.html
n


    Results
n

        Xerces-J DOM is better than JDOM/DOM4J
    n

        in almost all cases – both in time and
        space.
        One exception is serialization
    n

        Xerces-J Deferred DOM proves an excellent
    n

        technique where applicable.
Performance Tips
    skip the DOM
n

    reuse parser instances (code)
n

        reset() method
    n

    defaulting is slow
n

    external anythings are slower, entities
n
    are slower
    only validate if you have to, validate
n
    once for all time
    use fast encoding (US-ASCII)
n
Outline
    Overview
n

    Basic XML Concepts
n

    SAX Parsing
n

    DOM Parsing
n

    JAXP
n

    Namespaces
n

    XML Schema
n

    Grammar Access
n

    Round Tripping
n

    Grammar Design
n

    Performance
n

    JDOM/DOM4J
n

    Xerces Architecture
n
Xerces Architecture
    Use SAX based infrastructure where
n

    possible to avoid reinventing the wheel
    Modular / Framework design
n

    Prefer composition for system
n

    components
Xerces1 Architecture Diagram
 SAXParser           DOMParser        DOM Implementations




Error Handling       XMLParser         XMLDTDScanner        DTDValidator




                 XMLDocumentScanner    XSchemaValidator




                    Entity Handling




                       Readers
Concurrency
    “Thread safety”
n

        Xerces allows multiple parser instances,
    n

        one per thread
        Xerces does not allow multiple threads to
    n

        enter a single parser instance.

    org.apache.xerces.framework.XMLParser#parseSome()
n
Things we don’t do
    HTML parsing (yet)
n

        java Tidy
    n

        http://www3.sympatico.ca/ac.quick/jtidy.ht
        ml
    Grammar caching (yet)
n

    Parse several documents in a stream
n
Xerces 2
    Alpha
n

    XNI
n

    Remove deferred transcoding
n

    Next steps
n

        Conformance
    n

        Schema
    n

        Grammar Caching
    n

        DOM L3
    n

    Need contributors!
n
Xerces 2 Architecture Diagram
    Courtesy Andy Clark, IBM TRL
n



                 Component Manager

Symbol   Grammar Datatype                          Entity    Error
                            Scanner   Validator
 Table     Pool  Factory                          Manager   Reporter




 Regular Components             Configurable Components
Thank You!
    http://xml.apache.org
n

    twl@apache.org
n

Más contenido relacionado

La actualidad más candente

Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Holden Karau
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 
RMLL 2014 - LDAP Synchronization Connector
RMLL 2014 - LDAP Synchronization ConnectorRMLL 2014 - LDAP Synchronization Connector
RMLL 2014 - LDAP Synchronization ConnectorClément OUDOT
 
Relax NG, a Schema Language for XML
Relax NG, a Schema Language for XMLRelax NG, a Schema Language for XML
Relax NG, a Schema Language for XMLOverdue Books LLC
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingGerger
 
LarKC Tutorial at ISWC 2009 - Architecture
LarKC Tutorial at ISWC 2009 - ArchitectureLarKC Tutorial at ISWC 2009 - Architecture
LarKC Tutorial at ISWC 2009 - ArchitectureLarKC
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Anyscale
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkPatrick Wendell
 
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCClément OUDOT
 
Iswc 2009 LarKC Tutorial: Architecture
Iswc 2009 LarKC Tutorial: ArchitectureIswc 2009 LarKC Tutorial: Architecture
Iswc 2009 LarKC Tutorial: ArchitectureMichael Witbrock
 
Beyond shuffling global big data tech conference 2015 sj
Beyond shuffling   global big data tech conference 2015 sjBeyond shuffling   global big data tech conference 2015 sj
Beyond shuffling global big data tech conference 2015 sjHolden Karau
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?Miklos Christine
 
Synchronize OpenLDAP with Active Directory with LSC project
Synchronize OpenLDAP with Active Directory with LSC projectSynchronize OpenLDAP with Active Directory with LSC project
Synchronize OpenLDAP with Active Directory with LSC projectClément OUDOT
 

La actualidad más candente (20)

Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
RMLL 2014 - LDAP Synchronization Connector
RMLL 2014 - LDAP Synchronization ConnectorRMLL 2014 - LDAP Synchronization Connector
RMLL 2014 - LDAP Synchronization Connector
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
Clean code
Clean codeClean code
Clean code
 
Relax NG, a Schema Language for XML
Relax NG, a Schema Language for XMLRelax NG, a Schema Language for XML
Relax NG, a Schema Language for XML
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
PPT
PPTPPT
PPT
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
LarKC Tutorial at ISWC 2009 - Architecture
LarKC Tutorial at ISWC 2009 - ArchitectureLarKC Tutorial at ISWC 2009 - Architecture
LarKC Tutorial at ISWC 2009 - Architecture
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
 
Iswc 2009 LarKC Tutorial: Architecture
Iswc 2009 LarKC Tutorial: ArchitectureIswc 2009 LarKC Tutorial: Architecture
Iswc 2009 LarKC Tutorial: Architecture
 
Beyond shuffling global big data tech conference 2015 sj
Beyond shuffling   global big data tech conference 2015 sjBeyond shuffling   global big data tech conference 2015 sj
Beyond shuffling global big data tech conference 2015 sj
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Synchronize OpenLDAP with Active Directory with LSC project
Synchronize OpenLDAP with Active Directory with LSC projectSynchronize OpenLDAP with Active Directory with LSC project
Synchronize OpenLDAP with Active Directory with LSC project
 

Similar a IQPC Canada XML 2001: How to Use XML Parsing to Enhance Electronic Communication

Similar a IQPC Canada XML 2001: How to Use XML Parsing to Enhance Electronic Communication (20)

Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
 
REST dojo Comet
REST dojo CometREST dojo Comet
REST dojo Comet
 
Javazone 2010-lift-framework-public
Javazone 2010-lift-framework-publicJavazone 2010-lift-framework-public
Javazone 2010-lift-framework-public
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
2310 b 12
2310 b 122310 b 12
2310 b 12
 
Java XML Parsing
Java XML ParsingJava XML Parsing
Java XML Parsing
 
Wso2 Scenarios Esb Webinar July 1st
Wso2 Scenarios Esb Webinar July 1stWso2 Scenarios Esb Webinar July 1st
Wso2 Scenarios Esb Webinar July 1st
 
Javascript2839
Javascript2839Javascript2839
Javascript2839
 
Json
JsonJson
Json
 
Pxb For Yapc2008
Pxb For Yapc2008Pxb For Yapc2008
Pxb For Yapc2008
 
Introducing Struts 2
Introducing Struts 2Introducing Struts 2
Introducing Struts 2
 
Struts2
Struts2Struts2
Struts2
 
Fast SOA with Apache Synapse
Fast SOA with Apache SynapseFast SOA with Apache Synapse
Fast SOA with Apache Synapse
 
Ridingapachecamel
RidingapachecamelRidingapachecamel
Ridingapachecamel
 
5 xml parsing
5   xml parsing5   xml parsing
5 xml parsing
 
Xml & Java
Xml & JavaXml & Java
Xml & Java
 
Ajax
AjaxAjax
Ajax
 
Dax Declarative Api For Xml
Dax   Declarative Api For XmlDax   Declarative Api For Xml
Dax Declarative Api For Xml
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content Transformation
 

Más de Ted Leung

DjangoCon 2009 Keynote
DjangoCon 2009 KeynoteDjangoCon 2009 Keynote
DjangoCon 2009 KeynoteTed Leung
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency ConstructsTed Leung
 
Seeding The Cloud
Seeding The CloudSeeding The Cloud
Seeding The CloudTed Leung
 
Programming Languages For The Cloud
Programming Languages For The CloudProgramming Languages For The Cloud
Programming Languages For The CloudTed Leung
 
MySQL User Conference 2009: Python and MySQL
MySQL User Conference 2009: Python and MySQLMySQL User Conference 2009: Python and MySQL
MySQL User Conference 2009: Python and MySQLTed Leung
 
PyCon US 2009: Challenges and Opportunities for Python
PyCon US 2009: Challenges and Opportunities for PythonPyCon US 2009: Challenges and Opportunities for Python
PyCon US 2009: Challenges and Opportunities for PythonTed Leung
 
Northwest Python Day 2009
Northwest Python Day 2009Northwest Python Day 2009
Northwest Python Day 2009Ted Leung
 
PyCon UK 2008: Challenges for Dynamic Languages
PyCon UK 2008: Challenges for Dynamic LanguagesPyCon UK 2008: Challenges for Dynamic Languages
PyCon UK 2008: Challenges for Dynamic LanguagesTed Leung
 
OSCON 2008: Open Source Community Antipatterns
OSCON 2008: Open Source Community AntipatternsOSCON 2008: Open Source Community Antipatterns
OSCON 2008: Open Source Community AntipatternsTed Leung
 
OSCON 2007: Open Design, Not By Committee
OSCON 2007: Open Design, Not By CommitteeOSCON 2007: Open Design, Not By Committee
OSCON 2007: Open Design, Not By CommitteeTed Leung
 
Ignite The Web 2007
Ignite The Web 2007Ignite The Web 2007
Ignite The Web 2007Ted Leung
 
ApacheCon US 2007: Open Source Community Antipatterns
ApacheCon US 2007:  Open Source Community AntipatternsApacheCon US 2007:  Open Source Community Antipatterns
ApacheCon US 2007: Open Source Community AntipatternsTed Leung
 
OSCON 2005: Build Your Own Chandler Parcel
OSCON 2005: Build Your Own Chandler ParcelOSCON 2005: Build Your Own Chandler Parcel
OSCON 2005: Build Your Own Chandler ParcelTed Leung
 
PyCon 2005 PyBlosxom
PyCon 2005 PyBlosxomPyCon 2005 PyBlosxom
PyCon 2005 PyBlosxomTed Leung
 
SeaJUG March 2004 - Groovy
SeaJUG March 2004 - GroovySeaJUG March 2004 - Groovy
SeaJUG March 2004 - GroovyTed Leung
 
OSCON 2004: XML and Apache
OSCON 2004: XML and ApacheOSCON 2004: XML and Apache
OSCON 2004: XML and ApacheTed Leung
 
OSCON 2004: A Developer's Tour of Chandler
OSCON 2004: A Developer's Tour of ChandlerOSCON 2004: A Developer's Tour of Chandler
OSCON 2004: A Developer's Tour of ChandlerTed Leung
 
SeaJUG Dec 2001: Aspect-Oriented Programming with AspectJ
SeaJUG Dec 2001: Aspect-Oriented Programming with AspectJSeaJUG Dec 2001: Aspect-Oriented Programming with AspectJ
SeaJUG Dec 2001: Aspect-Oriented Programming with AspectJTed Leung
 
IQPC Canada XML 2001: How to develop Syntax and XML Schema
IQPC Canada XML 2001: How to develop Syntax and XML SchemaIQPC Canada XML 2001: How to develop Syntax and XML Schema
IQPC Canada XML 2001: How to develop Syntax and XML SchemaTed Leung
 
SD Forum 1999 XML Lessons Learned
SD Forum 1999 XML Lessons LearnedSD Forum 1999 XML Lessons Learned
SD Forum 1999 XML Lessons LearnedTed Leung
 

Más de Ted Leung (20)

DjangoCon 2009 Keynote
DjangoCon 2009 KeynoteDjangoCon 2009 Keynote
DjangoCon 2009 Keynote
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency Constructs
 
Seeding The Cloud
Seeding The CloudSeeding The Cloud
Seeding The Cloud
 
Programming Languages For The Cloud
Programming Languages For The CloudProgramming Languages For The Cloud
Programming Languages For The Cloud
 
MySQL User Conference 2009: Python and MySQL
MySQL User Conference 2009: Python and MySQLMySQL User Conference 2009: Python and MySQL
MySQL User Conference 2009: Python and MySQL
 
PyCon US 2009: Challenges and Opportunities for Python
PyCon US 2009: Challenges and Opportunities for PythonPyCon US 2009: Challenges and Opportunities for Python
PyCon US 2009: Challenges and Opportunities for Python
 
Northwest Python Day 2009
Northwest Python Day 2009Northwest Python Day 2009
Northwest Python Day 2009
 
PyCon UK 2008: Challenges for Dynamic Languages
PyCon UK 2008: Challenges for Dynamic LanguagesPyCon UK 2008: Challenges for Dynamic Languages
PyCon UK 2008: Challenges for Dynamic Languages
 
OSCON 2008: Open Source Community Antipatterns
OSCON 2008: Open Source Community AntipatternsOSCON 2008: Open Source Community Antipatterns
OSCON 2008: Open Source Community Antipatterns
 
OSCON 2007: Open Design, Not By Committee
OSCON 2007: Open Design, Not By CommitteeOSCON 2007: Open Design, Not By Committee
OSCON 2007: Open Design, Not By Committee
 
Ignite The Web 2007
Ignite The Web 2007Ignite The Web 2007
Ignite The Web 2007
 
ApacheCon US 2007: Open Source Community Antipatterns
ApacheCon US 2007:  Open Source Community AntipatternsApacheCon US 2007:  Open Source Community Antipatterns
ApacheCon US 2007: Open Source Community Antipatterns
 
OSCON 2005: Build Your Own Chandler Parcel
OSCON 2005: Build Your Own Chandler ParcelOSCON 2005: Build Your Own Chandler Parcel
OSCON 2005: Build Your Own Chandler Parcel
 
PyCon 2005 PyBlosxom
PyCon 2005 PyBlosxomPyCon 2005 PyBlosxom
PyCon 2005 PyBlosxom
 
SeaJUG March 2004 - Groovy
SeaJUG March 2004 - GroovySeaJUG March 2004 - Groovy
SeaJUG March 2004 - Groovy
 
OSCON 2004: XML and Apache
OSCON 2004: XML and ApacheOSCON 2004: XML and Apache
OSCON 2004: XML and Apache
 
OSCON 2004: A Developer's Tour of Chandler
OSCON 2004: A Developer's Tour of ChandlerOSCON 2004: A Developer's Tour of Chandler
OSCON 2004: A Developer's Tour of Chandler
 
SeaJUG Dec 2001: Aspect-Oriented Programming with AspectJ
SeaJUG Dec 2001: Aspect-Oriented Programming with AspectJSeaJUG Dec 2001: Aspect-Oriented Programming with AspectJ
SeaJUG Dec 2001: Aspect-Oriented Programming with AspectJ
 
IQPC Canada XML 2001: How to develop Syntax and XML Schema
IQPC Canada XML 2001: How to develop Syntax and XML SchemaIQPC Canada XML 2001: How to develop Syntax and XML Schema
IQPC Canada XML 2001: How to develop Syntax and XML Schema
 
SD Forum 1999 XML Lessons Learned
SD Forum 1999 XML Lessons LearnedSD Forum 1999 XML Lessons Learned
SD Forum 1999 XML Lessons Learned
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

IQPC Canada XML 2001: How to Use XML Parsing to Enhance Electronic Communication

  • 1. How to Use XML Parsing to Enhance Electronic Commnuication Ted Leung Chairman, ASF XML PMC Principal, Sauria Associates, LLC twl@sauria.com
  • 2. Thank You ASF n xerces-j-dev n xerces-j-user n
  • 3. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 4. Overview Focus on server side XML n Not document processing n Benefits to servers / e-business n Model-View separation for *ML n Ubiquitous format for data exchange n Hope for network effects from data availability n
  • 5. xml.apache.org Xerces XML parser n Java n C++ n Perl n Xalan XSLT processor n Java n C++ n FOP n Cocoon n Batik n SOAP/Axis n
  • 6. Xerces-J Why Java? n Unicode n Code motion n Server side cross platform support n C++ version has lagged Java version n
  • 7. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 8. Basic XML Concepts Well formedness n Validity n DTD’s n Schemas n Entities n
  • 9. Example: RSS <?xml version=quot;1.0quot; encoding=quot;ISO-8859-1quot;?> <!DOCTYPE rss PUBLIC quot;-//Netscape Communications//DTD RSS .91//ENquot; quot;http://my.netscape.com/publish/formats/rss-0.91.dtdquot;> <rss version=quot;0.91quot;> <channel> <title>freshmeat.net</title> <link>http://freshmeat.net</link> <description>the one-stop-shop for all your Linux software needs</description> <language>en</language> <rating>(PICS-1.1 quot;http://www.classify.org/safesurf/quot; 1 r (SS~~000 1))</rating> <copyright>Copyright 1999, Freshmeat.net</copyright> <pubDate>Thu, 23 Aug 1999 07:00:00 GMT</pubDate> <lastBuildDate>Thu, 23 Aug 1999 16:20:26 GMT</lastBuildDate> <docs>http://www.blahblah.org/fm.cdf</docs>
  • 10. RSS 2 <image> <title>freshmeat.net</title> <url>http://freshmeat.net/images/fm.mini.jpg</url> <link>http://freshmeat.net</link> <width>88</width> <height>31</height> <description>This is the Freshmeat image stupid</description> </image> <item> <title>kdbg 1.0beta2</title> <link>http://www.freshmeat.net/news/1999/08/23/935449823.html</link> <description>This is the Freshmeat image stupid</description> </item> <item> <title>HTML-Tree 1.7</title> <link>http://www.freshmeat.net/news/1999/08/23/935449856.html</link> <description>This is the Freshmeat image stupid</description> </item>
  • 11. RSS 3 <textinput> <title>quick finder</title> <description>Use the text input below to search freshmeat</description> <name>query</name> <link>http://core.freshmeat.net/search.php3</link> </textinput> <skipHours> <hour>2</hour> </skipHours> <skipDays> <day>1</day> </skipDays> </channel> </rss>
  • 12. RSS DTD <!ELEMENT rss (channel)> <!ATTLIST rss version CDATA #REQUIRED> <!-- must be quot;0.91quot;> --> <!ELEMENT channel (title | description | link | language | item+ | rating?| image? | textinput? | copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*> <!ELEMENT title (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT link (#PCDATA)> <!ELEMENT language (#PCDATA)> <!ELEMENT rating (#PCDATA)> <!ELEMENT copyright (#PCDATA)> <!ELEMENT pubDate (#PCDATA)> <!ELEMENT lastBuildDate (#PCDATA)> <!ELEMENT docs (#PCDATA)> <!ELEMENT managingEditor (#PCDATA)> <!ELEMENT webMaster (#PCDATA)> <!ELEMENT hour (#PCDATA)> <!ELEMENT day (#PCDATA)> <!ELEMENT skipHours (hour+)> <!ELEMENT skipDays (day+)>
  • 13. RSS DTD 2 <!ELEMENT item (title | link | description)*> <!ELEMENT textinput (title | description | name | link)*> <!ELEMENT name (#PCDATA)> <!ELEMENT image (title | url | link | width? | height? | description?)*> <!ELEMENT url (#PCDATA)> <!ELEMENT width (#PCDATA)> <!ELEMENT height (#PCDATA)>
  • 14. XML Application Tasks Convert XML into application object n Convert application object to XML n Read or write XML into a database n XSLT conversion to XHTML or HTML n
  • 15. RSS Objects Logical View RSSChannel RSSItem +getTitle() +getTitle() +getDescription() +getDescription() +getLink() +getLink() +getItems() +getImage() +getTextInput() RSS Image RSSTextInput +getTitle() +getDescription() +getLink() +getTitle() +getURL() +getDescription() +getHeight() +getLInk() +getWidth() +getName()
  • 16. RSSItem 1 public class RSSItem implements java.io.Serializable { private String title; private String description; private String link; public String getTitle () { return title; } public void setTitle (String value) { String oldValue = title; title = value; } public String getLink () { return link; } public void setLink (String value) { String oldValue = link; link = value; }
  • 17. RSSItem 2 public String getDescription () { return description; } public void setDescription (String value) { String oldValue = description; description = value; } }
  • 18. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 19. Parser API’s What is the job of a parser API? n Make the structure and contents of a n document available Report errors n
  • 20. SAX API Event callback style n Representation of elements & atts n must build own stack n SAX Processing Pipeline model n Development model n Reasonably open n xml-dev n http://www.megginson.com/SAX/SAX2 n
  • 21. ContentHandler The basic SAX2 handler for XML n Callbacks for n Start of document n End of document n Start of element n End of element n Character data n Ignorable whitespace n
  • 22. Handler Strategies Single Monolithic n Requires the handler to know entirely about the n grammar One per application object and multiplexer n Each application object has its own n DocumentHandler The Multiplexer handler is registered with the n parser Multiplexer is responsible for swapping SAX n handlers as the context changes.
  • 23. SAX 2 RSS Handler public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { currentText = new StringBuffer(); textStack.push(currentText); if (localName.equals(quot;rssquot;)) { elementStack.push(localName); } else if (localName.equals(quot;channelquot;)) { elementStack.push(localName); currentChannel = new RSSChannel(); currentItems = new Vector(); currentChannel.setItems(currentItems); currentImage = new RSSImage(); currentChannel.setImage(currentImage); currentTextInput = new RSSTextInput(); currentChannel.setTextInput(currentTextInput); currentChannel.setSkipHours(new Vector()); currentChannel.setSkipDays(new Vector()); } else if (localName.equals(quot;titlequot;)) { } else if (localName.equals(quot;descriptionquot;)) { } else if (localName.equals(quot;linkquot;)) { } else if (localName.equals(quot;languagequot;)) {
  • 24. SAX 2 RSS Handler 2 } else if (localName.equals(quot;itemquot;)) { elementStack.push(localName); currentItem = new RSSItem(); } else if (localName.equals(quot;ratingquot;)) { } else if (localName.equals(quot;imagequot;)) { elementStack.push(localName); } else if (localName.equals(quot;textinputquot;)) { elementStack.push(localName); } else if (localName.equals(quot;copyrightquot;)) { } else if (localName.equals(quot;pubDatequot;)) { } else if (localName.equals(quot;lastBuildDatequot;)) { } else if (localName.equals(quot;docsquot;)) { } else if (localName.equals(quot;managingEditorquot;)) { } else if (localName.equals(quot;webMasterquot;)) { } else if (localName.equals(quot;hourquot;)) { } else if (localName.equals(quot;dayquot;)) { } else if (localName.equals(quot;skipHoursquot;)) { elementStack.push(localName); } else if (localName.equals(quot;skipDaysquot;)) { elementStack.push(localName); } else {} }
  • 25. SAX 2 RSS Handler 3 public void endElement(String namespaceURI, String localName, String qName) throws SAXException { try { String stackValue = (String) elementStack.peek(); String text = ((StringBuffer)textStack.pop()).toString(); if (localName.equals(quot;rssquot;)) { elementStack.pop(); } else if (localName.equals(quot;channelquot;)) { elementStack.pop(); } else if (localName.equals(quot;titlequot;)) { if (stackValue.equals(quot;channelquot;)) { currentChannel.setTitle(text); } else if (stackValue.equals(quot;imagequot;)) { currentImage.setTitle(text); } else if (stackValue.equals(quot;itemquot;)) { currentItem.setTitle(text); } else if (stackValue.equals(quot;textinputquot;)) { currentTextInput.setTitle(text); } else {}
  • 26. SAX 2 RSS Handler 4 } else if (localName.equals(quot;descriptionquot;)) { if (stackValue.equals(quot;channelquot;)) { currentChannel.setDescription(text); } else if (stackValue.equals(quot;imagequot;)) { currentImage.setDescription(text); } else if (stackValue.equals(quot;itemquot;)) { currentItem.setDescription(text); } else if (stackValue.equals(quot;textinputquot;)) { currentTextInput.setDescription(text); } else {} } else if (localName.equals(quot;linkquot;)) { if (stackValue.equals(quot;channelquot;)) { currentChannel.setLink(text); } else if (stackValue.equals(quot;imagequot;)) { currentImage.setLink(text); } else if (stackValue.equals(quot;itemquot;)) { currentItem.setLink(text); } else if (stackValue.equals(quot;textinputquot;)) { currentTextInput.setLink(text); } else {}
  • 27. SAX 2 RSS Handler 5 } else if (localName.equals(quot;languagequot;)) { currentChannel.setLanguage(text); } else if (localName.equals(quot;itemquot;)) { currentItems.add(currentItem); elementStack.pop(); } else if (localName.equals(quot;ratingquot;)) { currentChannel.setRating(text); } else if (localName.equals(quot;imagequot;)) { currentChannel.setImage(currentImage); elementStack.pop()); } else if (localName.equals(quot;heightquot;)) { currentImage.setHeight(text); } else if (localName.equals(quot;widthquot;)) { currentImage.setWidth(text); } else if (localName.equals(quot;urlquot;)) { currentImage.setURL(text); } else if (localName.equals(quot;textinputquot;)) { currentChannel.setTextInput(currentTextInput); elementStack.pop()); } else if (localName.equals(quot;namequot;)) { currentTextInput.setName(text);
  • 28. SAX 2 RSS Handler 6 } else if (localName.equals(quot;copyrightquot;)) { currentChannel.setCopyright(text); } else if (localName.equals(quot;pubDatequot;)) { currentChannel.setPubDate(text); } else if (localName.equals(quot;lastBuildDatequot;)) { currentChannel.setLastBuildDate(text); } else if (localName.equals(quot;docsquot;)) { currentChannel.setDocs(text); } else if (localName.equals(quot;managingEditorquot;)) { currentChannel.setManagingEditor(text); } else if (localName.equals(quot;webMasterquot;)) { currentChannel.setWebMaster(text); } else if (localName.equals(quot;hourquot;)) { currentHour = text; } else if (localName.equals(quot;dayquot;)) { currentDay = text; } else if (localName.equals(quot;skipHoursquot;)) { currentChannel.getSkipHours().add(currentHour); elementStack.pop(); } else if (localName.equals(quot;skipDaysquot;)) { currentChannel.getSkipDays().add(currentDay); elementStack.pop(); } else {} } catch (Exception e) { e.printStackTrace();
  • 29. SAX 2 RSS Handler 7 public void characters(char[] ch,int start,int length) throws SAXException { currentText.append(ch, start, length); }
  • 30. SAX 2 Basic Event Handling Basic Event Handling n attribute handling in startElement n characters() for content n character data handling in endElement n Multiple characters calls n
  • 31. SAX 2 RSS ErrorHandler public class SAX2RSSErrorHandler implements ErrorHandler { public void warning(SAXParseException ex) throws SAXException { System.err.println(quot;[Warning] ”+getLocationString(ex)+quot;:”+ ex.getMessage()); } public void error(SAXParseException ex) throws SAXException { System.err.println(quot;[Error] quot;+getLocationString(ex)+quot;:”+ ex.getMessage()); } public void fatalError(SAXParseException ex) throws SAXException { System.err.println(quot;[Fatal Error]quot;+getLocationString(ex)+quot;:” +ex.getMessage()); throw ex; }
  • 32. SAX 2 RSS ErrorHandler 2 private String getLocationString(SAXParseException ex) { StringBuffer str = new StringBuffer(); String systemId = ex.getSystemId(); if (systemId != null) { int index = systemId.lastIndexOf('/'); if (index != -1) systemId = systemId.substring(index + 1); str.append(systemId); } str.append(':'); str.append(ex.getLineNumber()); str.append(':'); str.append(ex.getColumnNumber()); return str.toString(); } }
  • 33. SAX 2 Error Handling Error Handling n warning n error – parser may recover n fatalError – XML 1.0 spec errors n Locators n
  • 34. SAX 2 RSS Driver public static void main(String args[]) { XMLReader r = new SAXParser(); ContentHandler h = new SAX2RSSHandler(); r.setContentHandler(h); ErrorHandler eh = new SAX2RSSErrorHandler(); r.setErrorHandler(eh); try { r.parse(quot;file:///Work/fm0.91_full.rdfquot;); } catch (SAXException se) { System.out.println(quot;SAX Error quot;+se.getMessage()); se.printStackTrace(); } catch (IOException e) { System.out.println(quot;I/O Error quot;+e.getMessage()); e.printStackTrace(); } catch (Exception e) { System.out.println(quot;Error quot;+e.getMessage()); e.printStackTrace(); } System.out.println(((SAX2RSSHandler) h).getChannel().toString()); }
  • 35. Entity Handling Entity / Catalog handling n Catalog handling is mostly for document n processing (SGML) applications Pluggable entity handling is important for n server applications. Provides access point to DBMS, LDAP, etc. n predefined character entities n amp, lt, gt, apos, quot n
  • 36. Entity Handler <!DOCTYPE rss PUBLIC quot;-//Netscape Communications//DTD RSS91//ENquot; n quot;http://my.netscape.com/publish/formats/rss-0.91.dtdquot;> This will do a network fetch! n public class SAX1RSSEntityHandler implements EntityResolver { public SAX1RSSEntityHandler() { } public InputSource resolveEntity(String publicId,String systemId) throws SAXException, IOException { if (publicId.equals(quot;-//Netscape Communications//DTD RSS 91//ENquot;)) { FileReader r = new FileReader(quot;/usr/share/xml/rss91.dtdquot;); return new InputSource(r); } return null; } }
  • 37. EntityDriver public static void main(String args[]) { XMLReader r = new SAXParser(); ContentHandler h = new SAX2RSSHandler(); r.setContentHandler(h); ErrorHandler eh = new SAX2RSSErrorHandler(); r.setErrorHandler(eh); EntityResolver er = new SAX2RSSEntityResolver(); r.setEntityResolver(er); try { r.parse(quot;file:///Work/fm0.91_full.rdfquot;); } catch (SAXException se) { System.out.println(quot;SAX Error quot;+se.getMessage()); se.printStackTrace(); } catch (IOException e) { System.out.println(quot;I/O Error quot;+e.getMessage()); e.printStackTrace(); } catch (Exception e) { System.out.println(quot;Error quot;+e.getMessage()); e.printStackTrace(); } System.out.println(((SAX2RSSHandler) h).getChannel().toString()); }
  • 38. DefaultHandler The kitchen sink n Lets you do ContentHandler, n ErrorHandler, EntityResolver and DTD Handler A good way to go if you are only n overriding a small number of methods
  • 39. Locators public class LocatorSample extends DefaultHandler { Locator fLocator = null; public void setDocumentLocator(Locator l) { fLocator = l; } public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { System.out.println(localName+quot; @ quot;+fLocator.getLineNumber()+quot;, quot;+fLocator.getColumnNumber()); } public static void main(String args[]) { XMLReader r = new SAXParser(); ContentHandler h = new LocatorSample(); h.setDocumentLocator(((SAXParser) r).getLocator()); r.setContentHandler(h); try { r.parse(quot;file:///Work/fm0.91_full.rdfquot;); } catch (Exception e) {} } }
  • 40. SAX 2 Extension Handlers LexicalHandler n Entity boundaries n DTD bounaries n CDATA sections n Comments n DeclHandler n ElementDecls and AttributeDecls n Internal and External Entity Decls n NO parameter entities n Compatibility Adapters n
  • 41. SAX 2 Configurable Uses strings (URI’s) as lookup keys n Features - booleans n validation n namespaces n Properties - Objects n Extensibility by returning an object that n implements additional function
  • 42. Configurable Example public static void main(String[] argv) throws Exception { // construct parser; set features XMLReader parser = new SAXParser(); try { parser.setFeature(quot;http://xml.org/sax/features/namespacesquot;, true); parser.setFeature(quot;http://xml.org/sax/features/validationquot;, true); parser.setProperty(“http://xml.org/sax/properties/declaration- handler”, dh) } catch (SAXException e) { e.printStackTrace(System.err); } }
  • 43. Xerces and Configurable Standard way to set all parser n configuration settings. Applies to both SAX and DOM Parsers n
  • 44. Xerces Features Inclusion of general entities n Inclusion of parameter entities n Dynamic validation n Extra warnings n Allow use of java encoding names n Lazy evaluating DOM n DOM EntityReference creation n Inclusion of igorable whitespace in DOM n Schema validation [also full checking] n Control of Non-Validating behavior n Defaulting attributes & attribute type-checking n
  • 45. Xerces Properties Name of DOM implementation class n DOM node currently being parsed n Values for n noNamespaceSchemaLocation n SchemaLocation n
  • 46. Validation When to use validation n System boundaries n Xerces 2 pluggable futures n non-validating != wf checking n Not required to read external entities n validation dial settings n
  • 47. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 48. DOM API Object representation of elements & n atts Development Model n W3C Recommendations and process n http:///www.w3.org/DOM n
  • 49. DOM Instance (XML) <?xml version=quot;1.0quot; encoding=quot;US-ASCIIquot;?> <a> <b>in b</b> <c> text in c <d/> </c> </a>
  • 50. DOM Tree Document Element a Text Element Text Element Text [lf] b [lf] c [lf] Text Text Element Text in b [lf]text in c[lf] d [lf]
  • 51. DOM Architecture DocumentType Node +getNodeName() +getNodeValue() ProcessingInstruction +getChildNodes() +getFirstChild() Document +getLastChild() +getNextSibling() +getPreviousSibling() +insertBefore() +appendChild() +replaceChild() +removeChild() Element Attr +getAttribute() +setAttribute() +removeAttribute() +getTagName() CharacterData EntityReference Entity Notation +getAttributeNode() +setAttributeNode() +removeAttributeNode() +normalize() NodeList NamedNodeMap Comment Text CDataSection
  • 52. DOM RSS public static void main(String args[]) { DOMParser p = new DOMParser(); try { p.parse(quot;file:///Work/fm0.91_full.rdfquot;); System.out.println(dom2Channel(p.getDocument()).toString()); } catch (SAXException se) { System.out.println(quot;Error during parsing quot;+se.getMessage()); se.printStackTrace(); } catch (IOException e) { System.out.println(quot;I/O Error during parsing “+e.getMessage()); e.printStackTrace(); } }
  • 53. DOM2Channel public static RSSChannel dom2Channel(Document d) { RSSChannel c = new RSSChannel(); Element channel = null; NodeList nl = d.getElementsByTagName(quot;channelquot;); channel = (Element) nl.item(0); for (Node n = channel.getFirstChild(); n != null; n = n.getNextSibling()) { if (n.getNodeType() != Node.ELEMENT_NODE) continue; Element e = (Element) n; e.normalize(); String text = e.getFirstChild().getNodeValue(); if (e.getTagName().equals(quot;titlequot;)) { c.setTitle(text); } else if (e.getTagName().equals(quot;descriptionquot;)) { c.setDescription(text); } else if (e.getTagName().equals(quot;linkquot;)) { c.setLink(text); } else if (e.getTagName().equals(quot;languagequot;)) { ...
  • 54. DOM2Channel 2 … } nl = channel.getElementsByTagName(quot;itemquot;); c.setItems(dom2Items(nl)); nl = channel.getElementsByTagName(quot;imagequot;); c.setImage(dom2Image(nl.item(0))); nl = channel.getElementsByTagName(quot;textinputquot;); c.setTextInput(dom2TextInput(nl.item(0))); nl = channel.getElementsByTagName(quot;skipHoursquot;); c.setSkipHours(dom2SkipHours(nl.item(0))); nl = channel.getElementsByTagName(quot;skipDaysquot;); c.setSkipDays(dom2SkipDays(nl.item(0))); return c; }
  • 55. DOM Node Traversal Methods Two paradigms n Directly from the Node n Node#getFirstChild n Node#getNextSibling n From the Nodelist for a Node n Node#getChildren n NodeList#item(int i) n
  • 56. DOM2Items public static Vector dom2Items(NodeList nl) { Vector result = new Vector(); Node item = null; for (int x = 0; x < nl.getLength(); x++) { item = nl.item(x); if (item.getNodeType() != Node.ELEMENT_NODE) continue; RSSItem i = new RSSItem(); for (Node n = item.getFirstChild(); n != null; n = n.getNextSibling()) { if (n.getNodeType() != Node.ELEMENT_NODE) continue; Element e = (Element) n; e.normalize(); String text = e.getFirstChild().getNodeValue(); if (e.getTagName().equals(quot;titlequot;)) { i.setTitle(text); } else if (e.getTagName().equals(quot;descriptionquot;)) { i.setDescription(text); } else if (e.getTagName().equals(quot;linkquot;)) { i.setLink(text); }} result.add(i); } return result;
  • 57. DOM Level 1 Attr n API for result as objects n API for result as strings n No DTD support n Need Import - copying between DOMs n is disallowed
  • 58. DOM Level 2 W3C Recommendation n Core / Namespaces n Import n Traversal n Views on DOM Trees n Events n Range n Stylesheets n
  • 59. DOM L2 Traversal 1 public void parse() { DOMParser p = new DOMParser(); try { p.parse(quot;file:///Work//fm0.91_full.rdfquot;); Document d = p.getDocument(); TreeWalker treeWalker = ((DocumentTraversal)d).createTreeWalker( d, NodeFilter.SHOW_ALL, new DOMFilter(), true); for (Node n = treeWalker.getRoot(); n != null; n=treeWalker.nextNode() { System.out.println(n.getNodeName()); } } catch (IOException e) { e.printStackTrace(); } } public static void main(String args[]) { DOMRSSTraversal t = new DOMRSSTraversal(); t.parse(); }
  • 60. DOM L2 Traversal 2 public class DOMFilter implements NodeFilter { public short acceptNode(Node n) { if (n.getNodeName().length() > 3) return FILTER_ACCEPT; else return FILTER_REJECT; } }
  • 61. DOM L3 W3C Working Draft n Improve round trip fidelity n Abstract Schemas / Grammar Access n Load & Save n XPath access n
  • 62. Xerces DOM extensions Feature controls insertion of ignorable n whitespace Feature controls creation of n EntityReference Nodes User data object provided on n implementation class org.apache.xml.dom.NodeImpl
  • 63. DOM Entity Reference Nodes 1 <?xml version=quot;1.0quot; encoding=quot;US-ASCIIquot;?> <!DOCTYPE a [ <!ENTITY boilerplate quot;insert this herequot;> ]> <a> <b>in b</b> <c> text in c but &boilerplate; <d/> </c> </a>
  • 64. DOM Entity Reference Nodes 2 Document Type Node n Document Element Document Type a Entity Text Element Text Element Text [lf] b [lf] c [lf] Text insert this here Text in b
  • 65. DOM Entity Reference Nodes 3 Entity Reference Node n Element c Text Entity Reference Text Element Text [lf]text in c but [lf] d [lf] Text insert this here
  • 66. Xerces Lazy DOM Delays object creation n a node is accessed - its object is created n all siblings are also created. n Reduces time to complete parsing n Reduces memory usage n Good if you only access part of the n DOM
  • 67. Tricky DOM stuff Entity and EntityReference nodes – n depends on parser settings Ignorable whitespace n Subclassing the DOM n Use the Document factory class n
  • 68. DOM vs SAX DOM n creates a document object tree n tree must the be reparsed & converted to n BO’s Could subclass BO’s from DOM classes n SAX n event handler based API n stack management n no intermediate data structures generated n
  • 69. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 70. JAXP Java API for XML Parsing n Sun JSR-061 n Version 1.1 n Abstractions for SAX, DOM, XSLT n Support in Xerces n
  • 71. SAX in JAXP public static void main(String args[]) { XMLReader r = null; SAXParserFactory spf = SAXParserFactory.newInstance(); try { r = spf.newSAXParser().getXMLReader(); } catch (Exception e) { System.exit(1); } ContentHandler h = new SAX2RSSHandler(); r.setContentHandler(h) ErrorHandler eh = new SAX2RSSErrorHandler(); r.setErrorHandler(eh); try { r.parse(quot;file:///Work/fm0.91_full.rdfquot;); } catch (Exception e) { System.out.println(quot;Error during parsing quot;+e.getMessage()); e.printStackTrace(); } System.out.println(((SAX2RSSHandler) h).getChannel().toString()); }
  • 72. DOM in JAXP public static void main(String args[]) { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = null; try { db = dbf.newDocumentBuilder(); } catch (ParserConfigurationException pce) { System.exit(1); } try { Document doc = db.parse(quot;file:///Work/fm0.91_full.rdfquot;); } catch (SAXException se) { } catch (IOException ioe) { } System.out.println(dom2Channel(doc).toString()); }
  • 73. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 74. Namespaces Purpose n Syntactic discrimination only n They don’t point to anything n They don’t imply schema composition/combination n Universal Names n URI + Local Name n {http://www.mydomain.com}Element Declarations n xmlns “attribute” n Prefixes n Scoping n Validation and DTDs n
  • 75. Namespaces Purpose n Syntactic discrimination only n They don’t point to anything n They don’t imply schema composition/combination n Universal Names n URI + Local Name n {http://www.mydomain.com}Element Declarations n xmlns “attribute” n Prefixes n Scoping n Validation and DTDs n
  • 76. Namespace Example <xsl:stylesheet version=quot;1.0“ xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transform“ xmlns:html=quot;http://www.w3.org/TR/xhtml1/strictquot;> <xsl:template match=quot;/quot;> <html:html> <html:head> <html:title>Title</html:title> </html:head> <html:body> <xsl:apply-templates/> </html:body> </xsl:template> ... </xsl:stylesheet>
  • 77. Default Namespace Example <xsl:stylesheet version=quot;1.0“ xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transform“ xmlns=quot;http://www.w3.org/TR/xhtml1/strictquot;> <xsl:template match=quot;/quot;> <html> <head> <title>Title</title> </head> <body> <xsl:apply-templates/> </body> </xsl:template> ... </xsl:stylesheet>
  • 78. SAX2 ContentHandler public class NamespaceDriver extends DefaultHandler implements ContentHandler { public void startPrefixMapping(String prefix,String uri) throws SAXException { System.out.println(quot;startMapping quot;+prefix+quot;, uri = quot;+uri); } public void endPrefixMapping(String prefix) throws SAXException { System.out.println(quot;endMapping quot;+prefix); } public void startElement(String namespaceURI,String localName,String qName,Attributes atts) throws SAXException { System.out.println(quot;Start: nsURI=quot;+namespaceURI+quot; localName= quot;+localName+ quot; qName = quot;+qName); printAttributes(atts); } public void endElement(String namespaceURI,String localName, String qName) throws SAXException { System.out.println(quot;End: nsURI=quot;+namespaceURI+quot; localName= quot;+localName+ quot; qName = quot;+qName); } }
  • 79. SAX2 ContentHandler 2 public void printAttributes(Attributes atts) { for (int i = 0; i < atts.getLength(); i++) { System.out.println(quot;tAttribute quot;+i+quot; URI=quot;+atts.getURI(i)+ quot; LocalName=quot;+atts.getLocalName(i)+ quot; Type=quot;+atts.getType(i)+quot; Value=quot;+atts.getValue(i)); } }
  • 80. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n Performance n JDOM/DOM4J n Xerces Architecture n
  • 81. XML Schema Richer grammar specification n W3C Recommendation n Structures n XML Instance Syntax n Namespaces n Weird content models n Datatypes n Lots n Support in Xerces 1.4 n
  • 82. RSS Schema <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?> <!DOCTYPE xs:schema PUBLIC quot;-//W3C//DTD XMLSCHEMA 200102//EN“ quot;XMLSchema.dtdquot; > <xs:schema targetNamespace=“http://my.netscape.com/publish/formats/rss-0.91” xmlns=quot;http://my.netscape.com/publish/formats/rss-0.91quot;> <xs:element name=quot;rssquot;> <xs:complexType> <xs:sequence> <xs:element ref=quot;channelquot;/> </xs:sequence> <xs:attribute name=quot;versionquot; type=quot;xs:stringquot; fixed=quot;0.91quot;/> </xs:complexType> </xs:element>
  • 83. RSS Schema 2 <xs:element name=quot;channelquot;> <xs:complexType> <xs:choice minOccurs=quot;0quot; maxOccurs=quot;unbounded“> <xs:element ref=quot;titlequot;/> <xs:element ref=quot;descriptionquot;/> <xs:element ref=quot;linkquot;/> <xs:element ref=quot;languagequot;/> <xs:element ref=quot;itemquot; minOccurs=quot;1quot; maxOccurs=quot;unboundedquot;/> <xs:element ref=quot;ratingquot; minOccurs=quot;0quot;/> <xs:element ref=quot;imagequot; minOccurs=quot;0quot;/> <xs:element ref=quot;textinputquot; minOccurs=quot;0quot;/> <xs:element ref=quot;copyrightquot; minOccurs=quot;0quot;/> <xs:element ref=quot;pubDatequot; minOccurs=quot;0quot;/> <xs:element ref=quot;lastBuildDatequot; minOccurs=quot;0quot;/> <xs:element ref=quot;docsquot; minOccurs=quot;0quot;/> <xs:element ref=quot;managingEditorquot; minOccurs=quot;0quot;/> <xs:element ref=quot;webMasterquot; minOccurs=quot;0quot;/> <xs:element ref=quot;skipHoursquot; minOccurs=quot;0quot;/> <xs:element ref=quot;skipDaysquot; minOccurs=quot;0quot;/> </xs:choice> </xs:complexType> </xs:element>
  • 84. RSS Schema 3 … <xs:element name=quot;urlquot; type=quot;xs:anyURIquot;/> <xs:element name=quot;namequot; type=quot;xs:stringquot;/> <xs:element name=quot;ratingquot; type=quot;xs:stringquot;/> <xs:element name=quot;languagequot; type=quot;xs:stringquot;/> <xs:element name=quot;widthquot; type=quot;xs:integerquot;/> <xs:element name=quot;heightquot; type=quot;xs:integerquot;/> <xs:element name=quot;copyrightquot; type=quot;xs:stringquot;/> <xs:element name=quot;pubDatequot; type=quot;xs:stringquot;/> <xs:element name=quot;lastBuildDatequot; type=quot;xs:stringquot;/> <xs:element name=quot;docsquot; type=quot;xs:stringquot;/> <xs:element name=quot;managingEditorquot; type=quot;xs:stringquot;/> <xs:element name=quot;webMasterquot; type=quot;xs:stringquot;/> <xs:element name=quot;hourquot; type=quot;xs:integerquot;/> <xs:element name=quot;dayquot; type=quot;xs:integerquot;/> </xs:schema>
  • 85. RelaxNG Alternative to XML Schema n Satisifies the big 3 goals n XML Instance Syntax n Datatyping n Namespace Support n More simple and orthogonal than XML n Schema OASIS TC n Jing n http://www.thaiopensource.com/jing n
  • 86. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 87. Grammar Access No DOM standard n SAX 2 n Common Applications n Editors n Stub generators n Also experimental API in Xerces n
  • 88. DTD Access Example public class DTDDumper implements DeclHandler { public void elementDecl (String name, String model) throws SAXException { System.out.println(quot;Element quot;+name+quot; has this content model: quot;+model); } public void attributeDecl (String eName, String aName, String type, String valueDefault, String value) throws SAXException { System.out.print(quot;Attribute quot;+aName+quot; on Element quot;+eName+quot; has type quot;+type); if (valueDefault != null) System.out.print(quot;, it has a default type of quot;+valueDefault); if (value != null) System.out.print(quot;, and a default value of quot;+value); System.out.println(); }
  • 89. DTD Access Example 2 public void internalEntityDecl (String name, String value) throws SAXException { String peText = name.startsWith(quot;%quot;) ? quot;Parameter quot; : quot;quot;; System.out.println(quot;Internal quot;+peText+quot;Entity quot;+name+quot; has value: quot;+value); } public void externalEntityDecl (String name, String publicId, String systemId) throws SAXException { String peText = name.startsWith(quot;%quot;) ? quot;Parameter quot; : quot;quot;; System.out.print(quot;External quot;+peText+quot;Entity quot;+name); System.out.print(quot;is available at quot;+systemId); if (publicId != null) System.out.print(quot; which is known as quot;+publicId); System.out.println(); }
  • 90. DTD Access Driver public static void main(String args[]) { try { XMLReader r = new SAXParser(); r.setProperty(quot;http://xml.org/sax/properties/declaration-handlerquot;, new DTDDumper()); r.parse(args[0]); } catch (Exception e) { e.printStackTrace(); } } }
  • 91. Grammar Access in Schema In XML Schema or RelaxNG n Easy, just parse an XML document n
  • 92. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 93. Round Tripping Use SAX2 - DOM can’t do it until L3 n XML Decl n Comments (SAX2) n PE Decls n
  • 94. “Serializing” Several choices n XMLSerializer n TextSerializer n HTMLSerializer n XHTMLSerializer n Work as either: n SAX DocumentHandlers n DOM serializers n
  • 95. Serializer Example DOMParser p = new DOMParser(); p.parse(quot;file:///twl/Work/ApacheCon/fm0.91_full.rdfquot;); Document d = p.getDocument(); // XML OutputFormat format = new OutputFormat(quot;xmlquot;, quot;UTF-8quot;, true); XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(d); // XHTML format = new OutputFormat(quot;xhtmlquot;, quot;UTF-8quot;, true); serializer = new XHTMLSerializer(System.out, format); serializer.serialize(d);
  • 96. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 97. Grammar Design Attributes vs Elements n performance tradeoff vs programming n tradeoff At most 1 attribute, many subelements n ID & IDREF n NMTOKEN vs CDATA n CDATASECTIONs n <![CDATA[ My data & stuff ]]> n Ignorable Whitespace n
  • 98. Grammar Design 2 Everything is Strings n Data modelling n Relationships as elements n modelling inheritance n Plan to use schema datatypes n avoid weird types n
  • 99. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 100. JDOM JDOM Goals n Lightweight n Java Oriented n Uses Collections n Provides input and output (serialization) n Class not interface based n
  • 101. JDOM Input public static void printElements(Element e, OutputStream out) throws IOException, JDOMException { out.write((quot;n===== quot; + e.getName() + quot;: nquot;).getBytes()); out.flush(); for (Iterator i=e.getChildren().iterator(); i.hasNext(); ) { Element child = (Element)i.next(); printElements(child, out); } out.flush(); } public static void main(String args[]) { SAXBuilder b = new SAXBuilder(quot;org.apache.xerces.parsers.SAXParserquot;); try { Document d = b.build(quot;file:///Work/fm0.91_full.rdfquot;); printElements(d.getRootElement(),System.out); } catch (Exception e) { } }
  • 102. JDOM Output public static void main(String args[]) { SAXBuilder builder = new SAXBuilder(quot;org.apache.xerces.parsers.SAXParserquot;); try { Document d = builder.build(quot;file:///Work/fm0.91_full.rdfquot;); XMLOutputter o = new XMLOutputter(quot;n> quot;); // indent o.output(d, System.out); } catch (Exception e) { } }
  • 103. DOM4J DOM4J History n JDOM Fork n DOM4J Goals n Java2 Support n Collections n Integrated XPath support n Interface based n Schema data type support n From schema file, not PSVI n
  • 104. DOM4J XPath public static void main(String args[]) { try { SAXReader r = new SAXReader(quot;org.apache.xerces.parsers.SAXParserquot;); Document d = r.read(quot;file:///Work/ASF/ApacheCon/fm0.91_full.rdfquot;); String xpath = quot;//titlequot;; List list = d.selectNodes( xpath ); System.out.println( quot;Found: quot; + list.size() + quot; node(s)quot; ); System.out.println( quot;Results:quot; ); XMLWriter writer = new XMLWriter(); for ( Iterator iter = list.iterator(); iter.hasNext(); ) { Object object = iter.next(); writer.write( object ); writer.println(); } writer.flush(); } catch (Exception e) { e.printStackTrace(); } }
  • 105. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n JDOM/DOM4J n Performance n Xerces Architecture n
  • 106. Sosnoski Benchmarks http://www.sosnoski.com/opensrc/xmlbench/index.html n Results n Xerces-J DOM is better than JDOM/DOM4J n in almost all cases – both in time and space. One exception is serialization n Xerces-J Deferred DOM proves an excellent n technique where applicable.
  • 107. Performance Tips skip the DOM n reuse parser instances (code) n reset() method n defaulting is slow n external anythings are slower, entities n are slower only validate if you have to, validate n once for all time use fast encoding (US-ASCII) n
  • 108. Outline Overview n Basic XML Concepts n SAX Parsing n DOM Parsing n JAXP n Namespaces n XML Schema n Grammar Access n Round Tripping n Grammar Design n Performance n JDOM/DOM4J n Xerces Architecture n
  • 109. Xerces Architecture Use SAX based infrastructure where n possible to avoid reinventing the wheel Modular / Framework design n Prefer composition for system n components
  • 110. Xerces1 Architecture Diagram SAXParser DOMParser DOM Implementations Error Handling XMLParser XMLDTDScanner DTDValidator XMLDocumentScanner XSchemaValidator Entity Handling Readers
  • 111. Concurrency “Thread safety” n Xerces allows multiple parser instances, n one per thread Xerces does not allow multiple threads to n enter a single parser instance. org.apache.xerces.framework.XMLParser#parseSome() n
  • 112. Things we don’t do HTML parsing (yet) n java Tidy n http://www3.sympatico.ca/ac.quick/jtidy.ht ml Grammar caching (yet) n Parse several documents in a stream n
  • 113. Xerces 2 Alpha n XNI n Remove deferred transcoding n Next steps n Conformance n Schema n Grammar Caching n DOM L3 n Need contributors! n
  • 114. Xerces 2 Architecture Diagram Courtesy Andy Clark, IBM TRL n Component Manager Symbol Grammar Datatype Entity Error Scanner Validator Table Pool Factory Manager Reporter Regular Components Configurable Components
  • 115. Thank You! http://xml.apache.org n twl@apache.org n