1 / 31

Integrating Heterogeneous in situ Information using SPARCE

Integrating Heterogeneous in situ Information using SPARCE. Sudarshan Murthy CSE 606 INI: Fall 2003 This work is supported by US NSF Grant IIS 0086002 . Observations. People often superimpose new interpretations onto existing information (from heterogeneous sources)

hedya
Download Presentation

Integrating Heterogeneous in situ Information using SPARCE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating Heterogeneous in situ Information using SPARCE Sudarshan Murthy CSE 606 INI: Fall 2003 This work is supported by US NSF Grant IIS 0086002.

  2. Integrating Heterogeneous in situ Information using SPARCE

  3. Observations • People often superimpose new interpretations onto existing information (from heterogeneous sources) • They excerpt information and create annotations • They integrate existing information and new interpretations • Prepare many arrangements of the same information • Organize using appropriate models and schemas (possibly different from any of the sources) Integrating Heterogeneous in situ Information using SPARCE

  4. Goal Facilitate integration of heterogeneous in situ information, of varying granularity, with minimal mediation using superimposed information to enhance base information given one superimposed information model and schema (possibly different from any base information model and schema). Integrating Heterogeneous in situ Information using SPARCE

  5. Benefits • Likely discover information not completely contained in base sources • Information cannot always be obtained by a query distributed over base sources • Exploit human expertise • Annotations and relationships created by humans can be valuable • Minimize volume of base data mediated • We only retrieve selected information Integrating Heterogeneous in situ Information using SPARCE

  6. Outline • Goal • Background • Superimposed information management, SPARCE • Information integration example • Proposal • Future work • Conclusion Integrating Heterogeneous in situ Information using SPARCE

  7. What is Superimposed Information? Data placed over existing information sources to help organize, access, connect, and reuse information elements in those sources. [Maier 1999, Delcambre 2001] Integrating Heterogeneous in situ Information using SPARCE

  8. Marks • A Mark is a reference to a base-layer element [Delcambre 2001] • Several mark implementations exist • Addressing scheme usually depends on the base type • PDF mark uses page no., and starting and ending word indexes; MS Word mark uses starting and ending character indexes • Marks provide uniform interface across base-layer types and access protocols Integrating Heterogeneous in situ Information using SPARCE

  9. Excerpts and Contexts • Excerpt is the content of a marked region • Type of an excerpt varies: text, graphics, … • Context is information related to a mark • Context element is one piece of context • Section heading, containing paragraph text, and font name are examples • Many kinds of context elements exist • Content, Presentation, Location, Topology, … • Context definition varies across and within base types Integrating Heterogeneous in situ Information using SPARCE

  10. Example Context

  11. Superimposed Applications • These are applications that manipulate superimposed information • They associate marks and context elements with superimposed information elements • They are free to choose display and data models based on their needs • A user can activate a mark to navigate to base layer or examine context without expressly navigating to base layer Integrating Heterogeneous in situ Information using SPARCE

  12. SPARCE • SPARCE: Superimposed Pluggable Architecture for Contexts and Excerpts • Middleware for superimposed information management • Address base information regardless of its type, location, and access protocol • Retrieve excerpts and contexts • Use the same programmatic interface to work with any base type • View excerpts and contexts side by side with superimposed information Integrating Heterogeneous in situ Information using SPARCE

  13. Overview Superimposed Layer Base Layer Acrobat SPARCE SA 1 XML Marks Word SA 2 <mark ID=“…”> <type>…</type> <address>…</address> … </mark> Relations Integrating Heterogeneous in situ Information using SPARCE

  14. Information Integration Example Integrating Heterogeneous in situ Information using SPARCE

  15. Setup • SPARCE extended for information integration • XML serialization introduced • Pluggable context transformers infrastructure added • A query interface developed • RIDPad extended for information integration • Annotations, XML serialization (and DOM) added • Information models supported • Object model (COM) • XML (DOM and serialized) • Example uses XML Integrating Heterogeneous in situ Information using SPARCE

  16. Input • Five items (two groups) • An item contains a label and a comment • Five base documents (all PDF—heterogeneous?) • Granularity of marks varies Integrating Heterogeneous in situ Information using SPARCE

  17. Generating XML Data (1) Integrating Heterogeneous in situ Information using SPARCE

  18. Generating XML Data (2) Integrating Heterogeneous in situ Information using SPARCE

  19. XML Data Generated XML for the two groups Mark Context Integrating Heterogeneous in situ Information using SPARCE

  20. *Currently supports XSLT and XPath; XQuery coming soon  Querying* For each item, get text content from the context (of its mark) Integrating Heterogeneous in situ Information using SPARCE

  21. This system isn’t very smart. Integrating Heterogeneous in situ Information using SPARCE

  22. <Item name=‘CLIO’> <Mark id=‘…’> <Context …> </Context> </Mark> </Item> <Group name=‘…’> <Item name=‘…’> </Item> </Group> ------------------------------------- <Mark id=‘…’>…</Mark> <Mark id=‘…’>…</Mark> ------------------------------------- <Context …>…</Context> <Context …>…</Context> Preserve the Layers Integrating Heterogeneous in situ Information using SPARCE

  23. Why Preserve the Layers • The information sources are different • SI: Superimposed application • Marks: SPARCE • Contexts: Base applications (via context agents) • A hierarchy is inefficient and unnecessary • Mark and context information is replicated • Context can be large (broad) • Joins can provide the same result Integrating Heterogeneous in situ Information using SPARCE

  24. Start with the Query • Figure out what information is in scope • Only some superimposed information elements might qualify • Only some marks might qualify • Only some context elements might be needed • Minimize the amount of information retrieved • Push “selects” down and distribute “selects” • Helped by preserving the layers • Enables parallel and distributed query execution Integrating Heterogeneous in situ Information using SPARCE

  25. Exploit Relationships • Relationships in superimposed layer can have many benefits • Improve recall (for user) • Alternative execution plans (for query processor) • XML has no native support for relationships • Can be implemented using XPointer, XLink, etc. Integrating Heterogeneous in situ Information using SPARCE

  26. Future Work • Test proposal • Bi-level query system • Develop example queries • Model the system, build it, test it • Support other information models • RDF should be easy, relational might not be • Support for new models can be added without affecting existing implementations • Sun’s “No Recompile” guarantee for superimposed applications  Integrating Heterogeneous in situ Information using SPARCE  Some restrictions may apply

  27. Conclusion • Enhancing base information with superimposed information makes possible new queries over base information • Heterogeneous in situ base information can be integrated and queried using SPARCE • The naïve XML implementation makes a good straw man • If this stuff holds water, a bi-level query system maybe in my future Integrating Heterogeneous in situ Information using SPARCE

  28. Questions? ask me about a demo Integrating Heterogeneous in situ Information using SPARCE

More Related