1 / 22

CSE 121/131 Programming Spring 2001 Lecture Notes 7 ã 2000-2001 A. Sahuguet & V.Tannen

CSE 121/131 Programming Spring 2001 Lecture Notes 7 ã 2000-2001 A. Sahuguet & V.Tannen. Data on the Web, today: HTML. . . . <a name="primary"> <H2> Primary Faculty </H2> <DL> <DT> <BR> <A href="http://www.cis.upenn.edu/~alur/info.html">

errin
Download Presentation

CSE 121/131 Programming Spring 2001 Lecture Notes 7 ã 2000-2001 A. Sahuguet & V.Tannen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 121/131ProgrammingSpring 2001 Lecture Notes 7ã 2000-2001 A. Sahuguet & V.Tannen

  2. Data on the Web, today: HTML . . . <a name="primary"> <H2> Primary Faculty </H2> <DL> <DT> <BR> <A href="http://www.cis.upenn.edu/~alur/info.html"> <IMG SRC="images/resdesc.gif" ALIGN=right ALT="resdesc"></A> <A href="http://www.cis.upenn.edu/~alur/home.html"> <IMG SRC="images/home.gif" ALIGN=right ALT="Home"></A> <B>Rajeev Alur</B><BR> Associate Professor, Computer and Information Science <DD> Formal support for design and analysis of reactive, real-time, and hybrid systems. Hardware verification; Software engineering; Control of distributed multi-agent systems; Logic and concurrency theory; Distributed computing. . . .

  3. Data on the Web, tomorrow: XML . . . <primary> <name> <first>Rajeev</first> <last>Alur</last> </name> <title>Associate Professor</title> <department>Computer and Information Science</department> <bio>http://www.cis.upenn.edu/~alur/info.html</bio> <homepage>http://www.cis.upenn.edu/~alur/home.html</homepage> <interest>Formal support for design and analysis of reactive, real-time, and hybrid systems. Hardware verification; Software engineering; Control of distributed multi-agent systems; Logic and concurrency theory; Distributed computing.</interest> </primary> . . .

  4. What is XML? • Like HTML, XML is a “document markup language” i.e., a way to enrich text with tags and attributes. • HTML’s markup is about visual presentation. However, it is difficult for a program to manipulate the data in HTML. • XML’s markup is about the meaning of the information. This makes it easier for programs to manipulate XML. • Still, what we saw on the previous slide is an external format. Internally, XML is represented as trees.

  5. How XML overcomes some HTML limitations • Using XML, content providers can separate form and content. XML Content XSL (Stylesheets) HTML(Web-TV) Wireless Markup Language HTML http://www.wapforum.org/docs/technical/wml-30-apr-98.pdf

  6. Wireless Applications • Hand-held devices have some constraints • small display • narrowband network connection • limited memory and computational resources • HTML is not suitable to deliver information to them -> Need for a Wireless Markup Language (WML) • What WML offers • specific layout • new metaphor (deck, cards) • state management • binary XML format to make data more concise The same metaphor can be used for e-forms in various domains: interactive kiosks, medical forms, etc.

  7. Manipulating XML documents • Manipulation • parsing: reading, checking syntax, transforming in internal format • navigating • modifying • Fortunately, XML comes with a standard API that offers all these features Document Object Model (DOM) API: Application Programming Interface

  8. DOM • “DOM provides a programmatic access to the content, structure and style of XML documents and allows languages such as Java to extract information from documents containing specific tags as if they were objects.” [Ardent’s white paper on XML] • Platform neutral API designed by W3C using CORBA/IDL • Mapping to various programming languages (Java, C++, Perl, etc.) • DOM supported by all the major players • DOM makes XML documents parser and representation independent

  9. DOM overview • What DOM is doing <TABLE> <TBODY><TR><TD>Shady Grove</TD><TD>Aeolian</TD></TR><TR><TD>Over the River, Charlie</TD><TD>Dorian</TD></TR></TBODY></TABLE>

  10. The DOM API (overview) Node NodeList Attr CharacterData Document Element Entity Comment Text CDATASection interface DocumentcreateAttribute(…)createCDATASection(…)createComment(…) createElement(…) createTextNode(…) interface NodeappendChild(…) getAttributes(…) getChildNodes(…) interface Element getAttribute(name) getAttributeNode(name) getElementsByTagName(name) The full API can be found at http://www.w3c.org/DOM

  11. DOM in action • We take an HTML page from the IBM Patent server and we XML-ize it. • From it, we want to extract some specific information, such as the name of the inventors. • 4 ways to do it • Java DOM • Java XQL • Perl • XML-QL (will return an XML document)

  12. The Patent Example Converted using W4F

  13. DOM with Java import com.ibm.xml.parser.*; import org.w3c.dom.*; import java.io.*; public class Test { public static void main(String args[]) throws Exception { Parser parser = new Parser( args[0] ); Document doc = parser.readStream( new FileInputStream( args[0] )); NodeList nodes = doc.getElementsByTagName("Inventor"); int n = nodes.getLength(); for(int i=0; i<n; i++) { Element node = (Element) nodes.item(i); String href= node.getAttribute("First_Name"); System.out.println(href); } } }

  14. DOM with Java and XQL (GMD, IBM) import de.gmd.ipsi.xql.*; import org.w3c.dom.*; import com.ibm.xml.parser.*; import java.io.*; public class XQLTest { public static void main(String args[]) throws Exception { Parser parser = new Parser( args[0] ); Document doc = parser.readStream( new FileInputStream( args[0] )); XQLResult r = XQL.execute("//Inventor", doc ); for(int i=0; i<r.getLength(); i++) { Element inventor = (Element) r.getItem(i); String href = inventor.getAttribute("First_Name"); System.out.println(href); } } }

  15. DOM with Perl • Extracting the name of the Inventors from the IBM Patent database. #!/usr/bin/perl use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("patent.xml"); my $nodes = $doc->getElementsByTagName ("Inventor"); my $n = $nodes->getLength; for (my $i = 0; $i < $n; $i++) { my $node = $nodes->item ($i); my $href = $node->getAttribute ("First_Name"); print $href, "\n"; } Include the Perl package Instantiate a new parserand parse the source file. Get the list of nodes that correspond to <Inventor>. For each node, extract the First_Name attribute and print it.

  16. SAX, a low-level alternative to DOM • SAX • simple API for XML • supported by most XML parsers • event-driven parser • Instead of reading the entire file in memory and building a tree, SAX reads a stream of tokens and triggers events • startDocument • startElement • endElement • endDocument • The programmer has to write a document handler that captures these events and do something with the tokens.

  17. public class OutputHandler implements DocumentHandler { private PrintWriter pw; } public OutputHandler() { this.pw = new PrintWriter( System.out ); } public OutputHandler(PrintWriter pw) { this.pw = pw; } public String toString() { pw.flush(); return ""; } public void characters(char[] ch, int start, int length) { pw.print(new String(ch, length)); return ""; } /* to be continued … */ public void endDocument() { pw.println("<!-- end of document -->"); } public void endElement(String name) { pw.println("</" + name + ">"); } public void startDocument() { pw.println("<?xml version=\"1.0\"?>"); return; } public void startElement(String name, AttributeList atts) { pw.print("<" + name); if (atts != null) for(int i = 0; i < atts.getLength(); ++i) pw.print(" " + atts.getName(i) + "=\"" + atts.getValue(i) + "\""); pw.println(">"); return; } } An Example of SAX

  18. SAX vs DOM • SAX • does not store anything in memory (great for stream-based processing) • navigation in the document is clumsy • does not permit to update an XML document • DOM • permits updates • offers the DOM API for navigation/construction • requires the entire document to be stored in main memory

  19. XML (input) Application XML (output) The Missing Link • There is only a “gentlemen’s agreement” between the application and its XML environment. • Why do we need to go beyond that? • performance • static guarantees (helps to identify and control failures) • How do we create a tight contract between the application and its XML environment?

  20. XML Binding • Requirements • high-level specification for XML (e.g. DTD, XML-Schemas, UML, etc.) • a mapping to your favorite programming language (e.g. Java) • a compiler that will generate code (“stubs” that define an API) (Same paradigm as CORBA/IDL or ODMG/ODL) Sun’s Proposal: <http://www.javasoft.com/xml/white-papers.html> XMLspec. stubs compiler

  21. generic API generic parsing getElement(“order”) getAttribute(“date”) generic marshalling only runtime checks domain specific API domain specific parsing get_order() get_date() domain specific marshalling both static and runtime checks Generic (DOM/SAX) vsDomain Specific API • Instead of a generic API (e.g. SAX, DOM), the application will use a domain specific API generated from the specification. • Issues • mapping accurately XML “types” to a programming language • static checks vs runtime checks (some features from the specification cannot be checked statically)

  22. XML programming • Resources • Java and XML, Brett McLaughlin, Mike Loukides • XML parsers (DOM/SAX) • Apache http://xml.apache.org/xerces-j/index.html • Oracle http://technet.us.oracle.com/tech/xml/ • Sun Project X http://java.sun.com/xml/ • Microsoft http://msdn.microsoft.com/xml/default.asp • XML-binding frameworks • Oracle ClassGenerator http://technet.us.oracle.com/tech/xml/classgen/index.htm • Castor http://castor.exolab.org/

More Related