1 / 81

DOM (Document Object Model)

DOM (Document Object Model). Cheng-Chia Chen. What is DOM?. DOM (Document Object Model) A tree-based Data model of XML Documents An API for XML document processing cross multi-languages language neutral. defined in terms of CORBA IDL

tehya
Download Presentation

DOM (Document Object Model)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DOM(Document Object Model) Cheng-Chia Chen

  2. What is DOM? • DOM (Document Object Model) • A tree-based Data model of XML Documents • An API for XML document processing • cross multi-languages • language neutral. • defined in terms of CORBA IDL • language-specific bindings supplied for ECMAScript, java, ….

  3. Document Object Model • Defines how XML and HTML documents are represented as objects in programs • W3C Standard • Defined in IDL; thus language independent • HTML as well as XML • Writing as well as reading • Covers everything except internal and external DTD subsets

  4. Trees • An XML document can be represented as a tree. • It has a root. • It has nodes. • It is amenable to recursive processing.

  5. DOM (Document Object Model) • What is the tree view of the document ? <?xml version=“1.0” encoding=“UTF-8” ?> <TABLE> <TBODY> <TR> <TD>紅樓夢</TD> <TD>曹雪芹</TD> </TR> <TR> <TD>三國演義</TD> <TD>羅貫中</TD> </TR> </TBODY> </TABLE>

  6. Tree view (DOM view) of an XML Docuemnt (document node; root) (element node) (text node) 曹雪芹 三國演義 羅貫中 紅樓夢

  7. DOM Evolution • DOM Level 0: • DOM Level 1, a W3C Standard • DOM Level 2, a W3C Standard • DOM Level 3: W3C Standard: • Document Object Model (DOM) Level 3 Core Specification • Document Object Model (DOM) Level 3 Load and Save Specification • Document Object Model (DOM) Level 3 Validation Specification • DOM Level 3 : W3C Working group notes • Document Object Model (DOM) Level 3 XPath Specification Version 1.0 • Document Object Model (DOM) Level 3 Views and Formatting Specification • Document Object Model (DOM) Level 3 Events Specification Version 1.0

  8. DOM Implementations for Java • Apache XML Project's Xerces/Crimson parsers: • http://xml.apache.org/xerces2-j/index.html • http://xml.apache.org/xerces-j/index.html • Hibernated • http://xml.apache.org/crimson/ • Hibernated, default implementation in java1.4 • Sun's Java API for XML • http://java.sun.com/products/xml • Oracle: • http://technet.oracle.com/tech/xml • GNU JAXP: • http://www.gnu.org/software/classpathx/jaxp/jaxp.html

  9. Modules • Eight Modules: • Core: org.w3c.dom • HTML: org.w3c.dom.html • Views: org.w3c.dom.views • StyleSheets: org.w3c.dom.stylesheets • CSS: org.w3c.dom.css • Events: org.w3c.dom.events • Traversal: org.w3c.dom.traversal • Range: org.w3c.dom.range • Only the core and traversal modules really apply to XML. The other six are for HTML.

  10. DOM Trees • Entire document is represented as a tree. • A tree contains nodes. • Some nodes may contain other nodes (depending on node type). • Each document node contains: • zero or one doctype nodes • one root element node • zero or more comment and processing instruction nodes

  11. 17 interfaces: Attr CDATASection CharacterData Comment Document DocumentFragment DocumentType DOMImplementation Element Entity EntityReference NamedNodeMap Node NodeList Notation ProcessingInstruction Text plus one exception: DOMException Plus a bunch of HTML stuff in org.w3c.dom.html and other packages org.w3c.dom

  12. The DOM Interface Hierarchy Fundamental Interface NamedNodeMap DOMImplementation NodeList DOMException Node Document CharacterData Comment Attr Text Element Extended Interface DocumentType CDATASection Notation Entity EntityReference ProcessingInstruction DocumentFragment

  13. Steps to use DOM • Creates a parser using library specific code • Use the parser to parse the document and return a DOM org.w3c.dom.Document object. • The entire document is stored in memory. • DOM methods and interfaces are used to extract data from this object

  14. Parsing documents with a (Xerces) DOM Parser Example import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class DOMParserMaker { public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }}

  15. Parsing process using JAXP • javax.xml.parsers.DocumentBuilderFactory.newInstance() creates a DocumentBuilderFactory • Configure the factory • The factory's newDocumentBuilder() method creates a DocumentBuilder • Configure the builder • The builder parses the document and returns a DOM org.w3c.dom.Document object. • The entire document is stored in memory. • DOM methods and interfaces are used to extract data from this object

  16. JAXP’s DOM plugability mechanism

  17. Parsing documents with a JAXP DocumentBuilder import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class JAXPParserMaker { public static void main(String[] args) { try { DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); builderFactory.setNamespaceAware(true); DocumentBuilder parser = builderFactory.newDocumentBuilder(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document d = parser.parse(args[i]); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } // end for } catch (ParserConfigurationException e) { System.err.println("You need to install a JAXP aware parser."); }}}

  18. The Node Interface package org.w3c.dom; public interface Node { // NodeType public static final short ELEMENT_NODE = 1; public static final short ATTRIBUTE_NODE = 2; public static final short TEXT_NODE = 3; public static final short CDATA_SECTION_NODE = 4; public static final short ENTITY_REFERENCE_NODE = 5; public static final short ENTITY_NODE = 6; public static final short PROCESSING_INSTRUCTION_NODE = 7; public static final short COMMENT_NODE = 8; public static final short DOCUMENT_NODE = 9; public static final short DOCUMENT_TYPE_NODE = 10; public static final short DOCUMENT_FRAGMENT_NODE = 11; public static final short NOTATION_NODE = 12;

  19. The Node interface • Node Property public String getNodeName(); public String getNodeValue()    throws DOMException; public String setNodeValue(String value)    throws DOMException; public short getNodeType(); public String getNamespaceURI(); public String getPrefix(); public void setPrefix(String prefix)    throws DOMException; public String getLocalName();

  20. The Node interface • Tree navigation public Node getParentNode(); public NodeList getChildNodes(); public Node getFirstChild(); public Node getLastChild(); public Node getPreviousSibling(); public Node getNextSibling(); public NamedNodeMap getAttributes(); public Document getOwnerDocument(); public boolean hasChildNodes(); public boolean hasAttributes();

  21. parentNode this previousSlibling nextSibling firstChild lastChild childNodes Node navigation

  22. The Node interface • Tree Modification public Node insertBefore (Node newNode, Node refNode)    throws DOMException; public Node replaceChild (Node newNode, Node refNode)    throws DOMException; public Node removeChild(Node node)    throws DOMException; public Node appendChild(Node newNode)    throws DOMException;

  23. Node manipulation this this.appendChild(newNode) refNode firstChild lastChild childNodes this.inserBefore(newNode, refNode) this.replaceChild(newNode, refNode) newNode

  24. The Node interface • Utilities public Node cloneNode(boolean deep); public void normalize(); public boolean supports(String feature, String version);

  25. The NodeList Interface package org.w3c.dom; public interface NodeList { public Node item(int index); public int getLength();

  26. The NamedNodeMap interface public interface NamedNodeMap { public Node getNamedItem(String name); // by nodeName public Node setNamedItem(Node arg) throws DOMException; // insert/replace node with nodeName== arg.getNodeName() public Node removeNamedItem(String name) throws DOMException; public Node item(int index); public int getLength(); // Introduced in DOM Level 2: public Node getNamedItemNS(namespaceURI, localName); public Node setNamedItemNS(Node arg) throws DOMException; public Node removeNamedItemNS(namespaceURI, localName) throws DOMException ; }

  27. NodeReporter import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class NodeReporter { public static void main(String[] args) { try { DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = builderFactory.newDocumentBuilder(); NodeReporter iterator = new NodeReporter(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document doc = parser.parse(args[i]); iterator.followNode(doc); } catch (SAXException ex) { System.err.println(args[i] + " is not well-formed."); } catch (IOException ex) { System.err.println(ex); } } } catch (ParserConfigurationException ex) { System.err.println("You need to install a JAXP aware parser."); } } // end main

  28. // note use of recursion public void followNode(Node node) { processNode(node); if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { followNode(children.item(i)); } } } public void processNode(Node node) { String name = node.getNodeName(); String type = typeName[node.getNodeType()]; System.out.println("Type " + type + ": " + name); }

  29. Type2TypeName Public String[ ] typeName = new String[]{ "UnknownType“ , "Element“, "Attribute“, "Text“, "CDATA Section“, "Entity Reference“, "Entity“, "Processing Instruction“, "Comment“, "Document“, "Document Type Declaration“, "Document Fragment“, "Notation“, } }

  30. Interface nodeName nodeValue attributes Attr name of attribute value of attribute null CDATASection#cdata-section content null Comment#comment content null Document#document null null DocumentFragment #document-fragment null null DocumentType document type name null null Element tag name null NamedNodeMap Entity entity name null null EntityReference null name of entity referenced null Notation notation name null null ProcessingInstruction content excluding target target null Text#text content of the text node null Values of NodeName, NodeType and attributes in a Node

  31. The Document Node • The root node representing the entire document; not the same as the root element • Contains: • one element node • zero or more processing instruction nodes • zero or more comment nodes • zero or one document type nodes

  32. The Document Interface package org.w3c.dom; public interface Document extends Node { public DocumentType getDoctype(); public DOMImplementation getImplementation(); public Element getDocumentElement(); public NodeList getElementsByTagName(String tagname); public NodeList getElementsByTagNameNS(String NamespaceURI, String localName); public Element getElementById(String elementId);

  33. The Document Interface // Factory methods public Element createElement(String tagName) throws DOMException; public Element createElementNS(String namespaceURI, String qName) throws DOMException; public DocumentFragment createDocumentFragment(); public Text createTextNode(String data); public Comment createComment(String data); public CDATASection createCDATASection(String data) throws DOMException; public ProcessingInstruction createProcessingInstruction(String target, String data) throws DOMException; public Attr createAttribute(String name) throws DOMException; public Attr createAttributeNS(String namespaceURI, String qName) throws DOMException; public EntityReference createEntityReference(String name) throws DOMException; public Node importNode(Node importedNode, boolean deep) throws DOMException; }

  34. Element Nodes • Represents a complete element including its • start-tag, • end-tag, and • content • Content may contain: • Element nodes • ProcessingInstruction nodes • Comment nodes • Text nodes • CDATASection nodes • EntityReference nodes

  35. The Element Interface public String getTagName(); // = getNodeName(); public NodeList getElementsByTagName(String name); public NodeList getElementsByTagNameNS(String rui, String localName); public String getAttribute(String name); public String getAttributeNS(String uri, String localName); public void setAttribute(String name, String value) throws DOMException; public void setAttributeNS(String uriURI, String qName, String value) throws DOMException; public void removeAttribute(String name) throws DOMException; public void removeAttributeNS(String uri, String localName) throws DOMException; public Attr getAttributeNode(String name); public Attr getAttributeNodeNS(String namespaceURI, String localName); public Attr setAttributeNode(Attr newAttr) throws DOMException; public Attr setAttributeNodeNS(Attr newAttr) throws DOMException; public Attr removeAttributeNode(Attr oldAttr) throws DOMException;

  36. Example application • UserLand's RSS based list of Web logs at • http://static.userland.com/weblogMonitor/logs.xml: • or locally, xml/rsslogs.xml <?xml version="1.0"?> <!-- <!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> --> <weblogs> <log> <name>MozillaZine</name> <url>http://www.mozillazine.org</url> <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl> <ownerName>Jason Kersey</ownerName> <ownerEmail>kerz@en.com</ownerEmail> <description>THE source for news on the Mozilla Organization. DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description> <imageUrl></imageUrl> <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif </adImageUrl> </log> … </weblogs>

  37. DOM Design • Want to find all URLs in the logs • The character data of each url element needs to be read. Everything else can be ignored. • The getElementsByTagName() method in Document gives us a quick list of all the url elements.

  38. The programWeblogsDOM .java

  39. CharacterData interface • Represents things that are basically text holders • Super interface of Text, Comment, and CDATASection

  40. The CharacterData Interface package org.w3c.dom; public interface CharacterData extends Node { // content retrieval public String getData() throws DOMException; public int getLength(); public String substringData(int offset, int count) throws DOMException; // content modification public void setData(String data) throws DOMException; public void appendData(String arg) throws DOMException; public void insertData(int offset, String arg) throws DOMException; public void deleteData(int offset, int count) throws DOMException; public void replaceData(int offset, int count, String arg) throws DOMException; }

  41. Text Nodes • Represents the text content of an element or attribute • Contains only pure text, no markup • Parsers will return a single maximal text node for each contiguous run of pure text • Editing may change this

  42. The Text Interface package org.w3c.dom; public interface Text extends CharacterData { public Text splitText(int offset) throws DOMException; }

  43. CDATA section Nodes • Represents a CDATA section like this example from a hypothetical SVG tutorial: <p>You can use a default <code>xmlns</code> attribute to avoid having to add the svg prefix to all your elements:</p> <![CDATA[ <svg xmlns="http://www.w3.org/2000/svg" width="12cm" height="10cm"> <ellipse rx="110" ry="130" /> <rect x="4cm" y="1cm" width="3cm" height="6cm" /> </svg> ]]> • No children

  44. The CDATASection Interface package org.w3c.dom; // no additional methods other than those form Text public interface CDATASection extends Text { }

  45. DocumentType Nodes • Represents a document type declaration • Has no children

  46. The DocumentType Interface package org.w3c.dom; public interface DocumentType extends Node { public String getName(); public NamedNodeMap getEntities(); public NamedNodeMap getNotations(); public String getPublicId(); public String getSystemId(); public String getInternalSubset(); }

  47. Example <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> • name = “html” • pubicId = "-//W3C//DTD XHTML 1.0 Strict//EN" • systemId= "DTD/xhtml1-strict.dtd"

  48. Attr Nodes • Represents an attribute • Contains: • Text nodes • Entity reference nodes

  49. The Attr Interface package org.w3c.dom; public interface Attr extends Node { public String getName(); public boolean getSpecified(); //false => from DTD public String getValue(); public void setValue(String value) throws DOMException; public Element getOwnerElement(); // namespaceURI, prefix, localName inherited from Node }

  50. ProcessingInstruction Nodes • Represents a processing instruction like <?robots index="yes" follow="no"?> • No children

More Related