1 / 32

No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java. Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke Itay Maman 236826 Seminar lecture, 15 June 2005. The basic premise. XML is getting increasingly popular

luka
Download Presentation

No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. No More Pain for XML’s GainXJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke Itay Maman 236826 Seminar lecture, 15 June 2005

  2. The basic premise • XML is getting increasingly popular • XML manipulation is now a common programming task • The lead question: • Do modern OO languages sufficiently support XML ?

  3. Introduction: Schema file(file: technioncatalog.xsd) <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

  4. Introduction: XML document(file: short.xml) <?xml version="1.0" encoding="UTF-8"?> <catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course> </catalog> “Combinatorics for CS (234141) by Ran El-Yaniv, 3 credit points” Desired Output...

  5. Introduction: The XJ program import java.io.*; import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat = new catalog(new(File("short.xml")); catalog.course c = cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } }

  6. Traditional XML processing: (DOM, XPath apis) The types of the XML objects (Node, Document) do not reflect the schema public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new java.io.File("short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); } • XPath is a plain string. It may be: • Syntactically incorrect • Incompatible with the document

  7. Traditional XML processing(DOM apis) Assumption: 3rd child is the course number • These assumptions will not hold if the schema is changed • => run-time errors • problems remain, even if we identify nodes by name • Possible Schema changes: • Allowing a new optional <students> sub-element • Changing the order of the sub-elements Assumption: 2nd child has no child elements private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points"); } Assumption: Four child nodes must exist What about reading the numeric value of an element?

  8. No easy solution • Similar problems occur when: • XML elements are created by the program • Other libraries are used for reading/writing XML documents • Such as: Xalan, SAX • The developer wraps several complex operations within a single function/method/class • These are inherent problems of the language

  9. Shaping the future • What XML-related facilities do we want? • Typed XML objects • Seamless translation of a Schema/DTD into a Java type • Two composition techniques • XML notation • Java’s object creation syntax • Two decomposition techniques • Typed XPath • Typed, named methods/fields • XPath expressions as first-class-values

  10. Has the future arrived yet? • Significant effort in integration of XML into modern programming language • XJ • Scala • Cω • XTatic • … • We will overview the constructs offered by XJ • A super-set of Java • Available at: http://www.research.ibm.com/xj

  11. XJ’s Type system

  12. XJ’s Type system • Hierarchy of classes • A common root class: XMLObject • Automatic import: package com.ibm.xj.* • Genericity: Sequence<T>, XMLCursor<T> • XMLCursor<T>is a Sequence<T> iterator

  13. Integration with Schema • The rationale: • An OO program is a collection of class definitions • A Schema file is a collection of type definitions • => let’s integrate these definitions • Any Schema is also an XJ types • The XJ compiler generates a “logical class” for each such type • Schema file == package name • Using a schema == import schema_file_name;

  14. XML literal in XJ code • Invalid XML content triggers a compile-time error • Resulting elements are typed! • Curly braces allow “escaping” back into XJ import technioncatalog.*; public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points> <number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); } private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); } }

  15. An ill-typed program ... course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c); XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x); ... private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); } Wrong <course> element An XMLObject cannot be passed as a course element

  16. Embedding XPath Queries in XJ • Syntax: XmlValue[| XPathQuery |] • Requires: a context-provider: • An XML element over which the XPath query is invoked • (see the cat variable in the sample) • Escaping: use a ‘$’ prefix course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |]; }

  17. XPath Semantics • Problem: resulting type is sometimes not so clear • Two options • Sequence<T> • If the compiler determines that all result elements are of type T • Sequence<XMLObject> • (Otherwise) • Automatic conversion from a singleton sequence • Static check of XPath queries • If result is always empty => compile-time error • (The compiler cannot catch all cases)

  18. Implicit coercions • An atomic XML value can be seamlesslyconverted into a corresponding Java value • xsd:double => double • xsd:boolean => boolean • xsd:string => java.lang.String • … • This reduces the verbosity of XML-related code: import technioncatalog.*; import technioncatalog.catalog.*; public static String getTeacher(course c) { return c [| /teacher |]; } Sequence<teacher> ► teacher ► String

  19. Updates: Assignment to Query Result public static void changePoint(catalog.course c, int p) { c [| /points |] = p; } • An XPath expression returns a reference to an existing element • (No copying is involved) • Consistent with Java’s semantics for objects • Thus, it can be assigned to • An XPath expression is a legal lvalue • Bulk assignment • Occurs when the XPath expression denotes a sequence • Bulk assignment operator := allows multiple assignments • Double the credit points of each course: cat [| //points |] *:= 2;

  20. Tree structure update • Class XMLObject also defines methods, such as: • insertAfter() • insertBefore() • insertAsFirst() • detach() public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c); } Which object is being modified?

  21. Problems: Type Consistency • Definitions • An XML update operation, u, is a mapping over XML values • u: T1 -> T2 • An update is consistent if T1 = T2 • Ideally, a compile-time error should be triggered for each inconsistent update in the program • Unfortunately, this cannot be promised • The solution: Additional run-time check Why do we want the two types to be equal? Can you think of an example ?

  22. Problems: Covariant subtyping (1/2) • Covariance: change of type in signature is in the same direction as that of the inheritance A1.m() is “spoiled”: Requires only X1 objects class X { } class A { public void m(X x) { } } Class X1 extends X { } Class A1 extends A { public void m(X1 x) { } } ... A a = new A1(); a.m(new X()); Which method should be invoked: A.m() or A1.m() ? • Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding • Same approach is taken by C++, C# • But, covariance is allowed for arrays • Array assignments may fail at run-time

  23. Problems: Covariant subtyping (2/2) (Now let us get back to our technioncatalog schema…) • A <course> value is also spoiled • It requires unique children: <points>, <name>, etc. • But, it also has an unspoiled super-class: XMLObject • All updates to XMLObject are legal at compile-time • The following code compiles successfully: public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); } Run-time error is here !!

  24. Shaping the future (revisited) • Language constructs seen so far • Typed XML objects • Seamless translation of a Schema/DTD into a Java type • Two composition techniques • XML notation • Java’s object creation syntax • Two decomposition techniques • Typed XPath • Typed, named methods/fields • XPath expressions as first-class-values

  25. XPath expression as first-class-values • What is a first-class-value? • A value that can be used “naturally” in the program • Passed as an argument • Stored in a variable/field • Returned from a method • Created • In XJ, XPath expression do not met these conditions • The main obstacle: The XPath part of the expression cannot be separated from its context provider

  26. XPath expression as first-class-values(cont’d) • Let’s speculate on XPath as an FCV… • (Following code IS NOT a legal XJ program) private static Sequence<teacher> teachers; static Sequence<teacher> find(XPath<catalog,teacher> q) { Catalog c = new Catalog(new File("file1.xml")); return q.evaluate(c); } static void main(String[] args) { Sqeuence<teacher> all = find(<catalog>[| //teacher |]); Sequence<teacher> few = find( <catalog>[| //number/234319/../../teacher |] ); }

  27. XPath expression as first-class-values(cont’d) • Operators on XPath values • Composition • Conjunction • Disjunction • These operators will allow the developer to easily create a rich array of safe XPath values • The compiler must keep track of the type of each such value • Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject • When two XPath values are composed, the result type is deduced from the types of the operands

  28. Scala: Composition of XML elements • In Scala, types can be defined in a DTD file • A DTD can be translated into Scala classes via the dtd2scala utility • Scala offers two options for composition of XML elements: • Using XML notation (similar to XJ) • Using case-class construction notation: import Data._; // import generated definitions import scala.xml._; // for creating PCDATA nodes object Main with Application { val x = course(teacher(Text("Ran El-Yaniv")), points(Text("3")), name(Text("Combinatorics for CS")), number(Text("234141"))); Console.println(x); }

  29. Typed, named methods/fields • Usually, values aggregated by a Java object are accessed by fields/methods • Can we access XML sub-elements this way? • (Following code IS NOT a legal XJ program) import technioncatalog.*; void printTeachers(catalog cat) { for(int i = 0; i < cat.courses.length; ++i) { catalog.course c = cat.courses[i]; System.out.println(c.teacher); } }

  30. Typed, named methods/fields(cont’d) • Some of the difficulties: • Sub-elements are not always named • Schema supports optional types: <xsd:choice> • How can Java express an “optional” field? • Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types • Missing features: virtual fields, inheritance without polymorphism • Other features can be found in Functional languages • E.g.: Variant types, immutability, structural conformance • But, their popularity lags behind

  31. Summary • XJ is a Java extension that has built in support for XML • Type safety: Many things are checked at compile time • Ease of use • OO languages are not powerful enough (in terms of typing) • Some type information is lost in the transition Schema -> Java

  32. - The End-

More Related