1 / 37

Declaratively Producing Data Mash-ups

Declaratively Producing Data Mash-ups. Sudarshan Murthy 1 , David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland State University. http://www.sixml.org. Mash-ups. Web applications that combine information from multiple sources [Wikipedia]

Download Presentation

Declaratively Producing Data Mash-ups

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Declaratively Producing Data Mash-ups Sudarshan Murthy1, David Maier2 1Applied Research, Wipro Technologies 2 Department of Computer Science, Portland State University http://www.sixml.org

  2. Mash-ups • Web applications that combine information from multiple sources [Wikipedia] • A mash-up does not need to be a web app • Data that includes or transcludes content from multiple sources • In either case, a source is likely only a fragment • This work is about data mash-ups • In this talk, a mash-up is an XML document Declaratively Producing Data Mash-ups

  3. Portland State University Campus Map • 45 markers, 53 landmarks • Marker: Balloon on map • Landmark: Building, department, … • Information from 188 fragments in 58 web pages • Fragments selected manually http://sparce.cs.pdx.edu/cmap/ Declaratively Producing Data Mash-ups

  4. Portland Metro Food Markets • 154 markers, 154 landmarks • 154 fragments harvested programmatically from 4 MS Word documents • Developed for the Oregon Department of Agriculture http://sparce.cs.pdx.edu/Declaratively Producing Data Mash-ups/oda-1.1/ Declaratively Producing Data Mash-ups

  5. An HTML Review Report Declaratively Producing Data Mash-ups

  6. Problem Areas • Development • Getting data from heterogeneous fragments • Might use a DBMS, yet code operators such as sort, join, and aggregate for external data • Execution • When to get external data, how much to get? • Design: Expressing that • A part comes from an external fragment • A part is data (such as page number) which cannot be “selected” in the source Declaratively Producing Data Mash-ups

  7. Outline • Introduction • The conceptual approach • Sixml: Condensed mash-ups • Sixml DOM: Reconstituted mash-ups • Sixml Navigator: Formatted mash-ups • Evaluation • Summary • Discussion Declaratively Producing Data Mash-ups

  8. Superimposed Information (SI) • SI is new data and structure overlaid on existing base information • Mark: A reference to an external fragment • Benefits • Multiple, simultaneous organizations • Make new connections among base fragments • Preserve context Heterogeneous sources: Word, Excel, PDF, HTML,… Declaratively Producing Data Mash-ups

  9. Services Services Transform Collect and Classify Extract and Combine DBMS Docs The Mash-up Production Process Services Condensed mash-up Reconstitutedmash-up Formattedmash-up DBMS DBMS Docs Docs Collect marks, add new data and structure Extract data from marks and combine with added data Format reconstituted data for display and other purposes Declaratively Producing Data Mash-ups

  10. SI, Bi-level Information, Mash-ups • A condensed mash-up is SI • Links mash-up parts to external fragments • Relates to mash-up design: Sixml • A reconstituted mash-up and a formatted mash-up are both bi-level information • SI plus reconstituted parts • Relates to runtime mash-up manipulation and execution: Sixml DOM and Sixml Navigator Declaratively Producing Data Mash-ups

  11. Outline • Introduction • The conceptual approach • Sixml: Condensed mash-ups • Sixml DOM: Reconstituted mash-ups • Sixml Navigator: Formatted mash-ups • Evaluation • Summary • Discussion Declaratively Producing Data Mash-ups

  12. Sixml • A mash-up specification language • SI represented as XML; Sixml is XML • A condensed mash-up is encoded as a Sixml document • A mark association is encoded as an XML element of a type we define • Associate marks with six kinds of content • Validated using standard schema constructs • Uniform and comprehensible serialization Declaratively Producing Data Mash-ups

  13. Sixml Mark Associations <Comment excerpt="" xmlns:sixml="http://schema.sixml.org"> Contradicts prior work <sixml:EMark> <sixml:Descriptor>…</sixml:Descriptor> </sixml:EMark> </Comment> <Comment excerpt="" xmlns:sixml="http://schema.sixml.org"> <sixml:TMark> Contradicts prior work <sixml:Descriptor>…</sixml:Descriptor> </sixml:TMark> <sixml:AMark target="excerpt" sixml:valueSource="true"> <sixml:Descriptor>…</sixml:Descriptor> </sixml:AMark> <sixml:EMark> <sixml:Descriptor>…</sixml:Descriptor> </sixml:EMark> </Comment> <Comment excerpt=""> Contradicts prior work </Comment> <Comment excerpt="" xmlns:sixml="http://schema.sixml.org"> Contradicts prior work <sixml:AMark target="excerpt"> <sixml:Descriptor>…</sixml:Descriptor> </sixml:AMark> <sixml:EMark> <sixml:Descriptor>…</sixml:Descriptor> </sixml:EMark> </Comment> <Comment excerpt="" xmlns:sixml="http://schema.sixml.org"> <sixml:TMark> Contradicts prior work <sixml:Descriptor>…</sixml:Descriptor> </sixml:TMark> <sixml:AMark target="excerpt"> <sixml:Descriptor>…</sixml:Descriptor> </sixml:AMark> <sixml:EMark> <sixml:Descriptor>…</sixml:Descriptor> </sixml:EMark> </Comment> • By default text excerpt is assigned at run time, but possible to declare that the value should be something other than the excerpt • Mark association names shown here are same as type name, but custom names are possible (with both static and dynamic typing) Declaratively Producing Data Mash-ups

  14. Sixml Mark Descriptors <Comment excerpt="" xmlns:sixml="…" xmlns:xsi="…"> <sixml:TMark> Contradicts prior work <sixml:Descriptor xsi:type="sixml:XPointer"> <pointer>http://www.w3.org/#element(/1/2)</pointer> </sixml:Descriptor> </sixml:TMark> <sixml:AMark target="excerpt" sixml:valueSource="true"> <sixml:Descriptor>…</sixml:Descriptor> </sixml:AMark> <sixml:EMark> <sixml:Descriptor xsi:type="sixml:SPARCE"> <Agent>OfficeAgents.MSWord</Agent> <Doclocation="c:\abc.doc" /> <SubdocstartChar="45" endChar="53" /> </sixml:Descriptor> </sixml:EMark> </Comment> Any internal structure OK. An implementation specific to an xsi:type interprets the structure Declaratively Producing Data Mash-ups

  15. Outline • Introduction • The conceptual approach • Sixml: Condensed mash-ups • Sixml DOM: Reconstituted mash-ups • Sixml Navigator: Formatted mash-ups • Evaluation • Summary • Discussion Declaratively Producing Data Mash-ups

  16. Sixml DOM • Extends W3C XML DOM to easily manipulate Sixml documents • Using DOM can be tedious and inefficient • Automatic and lazy reconstitution • Detects mark associations and interprets attributes such as sixml:valueSource • Developer uses only the DOM interface • Access to descriptors and “context” of external fragments Declaratively Producing Data Mash-ups

  17. Run-time Representation <Comment excerpt="" xmlns:sixml="http://schema.sixml.org"> <sixml:TMark> Contradicts prior work <sixml:Descriptor>…</sixml:Descriptor> </sixml:TMark> <sixml:AMark target="excerpt" sixml:valueSource="true"> <sixml:Descriptor>…</sixml:Descriptor> </sixml:AMark> <sixml:EMark> <sixml:Descriptor>…</sixml:Descriptor> </sixml:EMark> </Comment> DOM tree Declaratively Producing Data Mash-ups

  18. Generating a Sixml DOM Tree Sixml DOM tree A mark association is “attached” to its target, but is not a child - The DOM interface suffices to access the reconstituted mash-up Value reconstituted Descriptor is not a child Declaratively Producing Data Mash-ups

  19. Context Information • Information retrieved from the context of an external fragment • An xsi:type-specific implementation determines (statically or dynamically) what is in context <sixml:Context> <Content> <Text>provide ... system</Text> </Content> <Presentation> <FontName>Times New Roman</FontName> <FontSize>11</FontSize> </Presentation> <Placement> <Page>3</Page> </Placement> </sixml:Context> Declaratively Producing Data Mash-ups

  20. Programming with Sixml DOM • procedure WriteComment(SixmlElementc) • XmlElementctxt = c.markAssociations[0].Context • XmlNodepage = ctxt.getElementsByTagName("Page")[0] • Writeln("Page: ", page.firstChild.nodeValue) • Writeln("Excerpt: ", c.getAttribute("excerpt")) • Writeln("Comment: ", c.firstChild.nodeValue) • Only Lines 1 and 2 use the Sixml DOM interface • Lines 2–4 get page number; Line 5 the reconstituted excerpt; and Line 6 the comment text Declaratively Producing Data Mash-ups

  21. Outline • Introduction • The conceptual approach • Sixml: Condensed mash-ups • Sixml DOM: Reconstituted mash-ups • Sixml Navigator: Formatted mash-ups • Evaluation • Summary • Discussion Declaratively Producing Data Mash-ups

  22. Sixml Navigator • Alternative to the traditional path navigator • Extends XDM so that Sixml documents can be declaratively queried using existing languages and query processors • Also applies to XPath 1.0 and XSLT 1.0 • Performs automatic and lazy reconstitution Declaratively Producing Data Mash-ups

  23. XDM Extensions • Allow child elements for any kind of node with which a mark may be associated • Make a mark association a child of its target node • Represent a mark descriptor and context as children of a mark association • These extensions allow reuse of existing query languages and processors Declaratively Producing Data Mash-ups

  24. An Extended-XDM Tree Extended-XDMtree Declaratively Producing Data Mash-ups

  25. Queries over Bi-level Information • With Comment as current node, get the comment text ./text() • Get excerpt of commented region ./@excerpt • Get page number of commented region ./sixml:EMark/sixml:Context/Placement/Page <sixml:Context> <Placement> <Page>3</Page> </Placement> </sixml:Context> Declaratively Producing Data Mash-ups

  26. Outline • Introduction • The conceptual approach • Sixml: Condensed mash-ups • Sixml DOM: Reconstituted mash-ups • Sixml Navigator: Formatted mash-ups • Evaluation • Summary • Discussion Declaratively Producing Data Mash-ups

  27. Implementation and Usage • Element types for Sixml mark associations defined in XML Schema • Sixml DOM and Sixml Navigator in C# on the .NET Framework • Sixml DOM implemented by extending DOM and by revising DOM • Three implementations of Sixml DOM: 2 extensions (MS and Mono), 1 revision (Mono) • Sixml, Sixml DOM, and Sixml Navigator used in mash-ups for several applications Declaratively Producing Data Mash-ups

  28. Experimental Data • 8 mash-ups • 4 each from 2 apps; different scale factors • File size: 200 KB to 26.1 MB • #Docs referenced: 18 to 426 • #Mark associations: 1.9K to over 311K • 3 traditional XML documents • File size: 484 KB to 113.7 MB • Tree depth: 4, 8, 16 Declaratively Producing Data Mash-ups

  29. Evaluation Summary • Sixml DOM • Saves time over DOM when accessing mark associations • When accessing SI, savings decrease as the amount of SI increases • It is better to use DOM to access large traditional XML documents • Sixml Navigator • Saves time over traditional navigator for both mark associations and SI Declaratively Producing Data Mash-ups

  30. Outline • Introduction • The conceptual approach • Sixml: Condensed mash-ups • Sixml DOM: Reconstituted mash-ups • Sixml Navigator: Formatted mash-ups • Evaluation • Summary • Discussion Declaratively Producing Data Mash-ups

  31. Summary • A mash-up has three forms: condensed, reconstituted, and formatted • Sixml, Sixml DOM, and Sixml Navigator support the three forms, respectively • Sixml makes it easier to specify mash-ups; Sixml DOM and Navigator provide a more efficient means of manipulating mash-ups • The XML Schema instance documents and the source code are on www.sixml.org Declaratively Producing Data Mash-ups

  32. Outline • Introduction • The conceptual approach • Sixml: Condensed mash-ups • Sixml DOM: Reconstituted mash-ups • Sixml Navigator: Formatted mash-ups • Evaluation • Summary • Discussion Declaratively Producing Data Mash-ups

  33. Our Mash-up Framework XSLT and XQuery Processors Client Application XPath Processor Sixml Sixml DOM Sixml Navigator SPARCE Bulk Accessor Cloaker Reference and retrieve fragments of arbitrary types Efficiently retrieve large number of fragments Hide data to improve query expression and execution Declaratively Producing Data Mash-ups

  34. Bi-level Query Processors • Sixml Navigator uses Sixml DOM internally: Does not construct extended-XDM trees • Existing query processors use the Sixml Navigator instead of using the traditional navigator Declaratively Producing Data Mash-ups

  35. Mark Creation Clipboard Superimposed Application Mark Manager Superimposed Info Descriptors Repository S1 M4 <Mark ID="M4"> <Agent>AcrobatAgents.PDFAgent</Agent> <Class>AcrobatPDFTextMark</Class>  <Address>2|395|439</Address> …   <ContainerID>D6</ContainerID> </Mark> Declaratively Producing Data Mash-ups

  36. Activation and Context Retrieval Context Manager Base Application Superimposed Application Base Info Mark Manager Superimposed Info Descriptors Repository S1 M4 <Mark ID="M4"> <Agent>AcrobatAgents.PDFAgent</Agent> <Class>AcrobatPDFTextMark</Class>  <Address>2|395|439</Address> …   <ContainerID>D6</ContainerID> </Mark> Declaratively Producing Data Mash-ups

  37. About Context PDF Mark PowerPoint Mark • Context information is modeled as a hierarchical property set Declaratively Producing Data Mash-ups

More Related