An extension to xml schema for structured data processing
Download
1 / 19

An Extension to XML Schema for Structured Data Processing - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

An Extension to XML Schema for Structured Data Processing. Presented by: Jacky Ma Date: 10 April 2002. Presentation Outline. The Problems Research Objectives The Schema Extension: MMX MMX Query System Discussion Conclusion. The Problems. Mapping XML data into relational tables

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' An Extension to XML Schema for Structured Data Processing' - deanna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
An extension to xml schema for structured data processing

An Extension to XML Schema for Structured Data Processing

Presented by: Jacky Ma

Date: 10 April 2002


Presentation outline
Presentation Outline

  • The Problems

  • Research Objectives

  • The Schema Extension: MMX

  • MMX Query System

  • Discussion

  • Conclusion


The problems
The Problems

  • Mapping XML data into relational tables

    • Not natural to XML structure

    • Efficient, but may not be a effective method

  • Legacy application-specific structured data

    • Similar modeling but proprietary implementation

    • Not interoperable, and difficult to maintain

    • Lack of modular design and thus difficult to combine to form more complex data structure

  • Meta-data can facilitate wide range of needs, while XML Schema is solely used for physical data validation nowadays


Research objectives
Research Objectives

  • To facilitate more effective searching and storing of XML contents by making use of meta-data (XML Schema)

  • Propose a data-oriented model to allow different storage mechanism, processing model, and query model on XML contents


Our approach mmx
Our Approach – MMX

  • Use meta-data to map XML data into structured data objects

  • Define the structured data models “conceptually” and link the models to XML document structure “syntactically”

  • Meta-data is defined as an extension of XML Schema

  • The extension is called MMX (Multi Model XML)


Program driven vs data driven

Raw Data

Structured Data (XML)

Data with Modeling Information

Data with Program Codes

Program Driven vs. Data Driven

Information for processing is hard-coded in program

Program Driven

MMX!

Data Driven

Processing instruction is hard-coded in data?!




Schema extension
Schema Extension

  • The extended schema is associated with a namespace

  • The extended schema goes within a schema element, like <tree:element> in the example

    • <tree:element> specify a single structure object instance

    • Name association for elements and attributes

    • Class hierarchies:

      • <tree:element> -> <tree:internal> -> <tree:leafNode>

      • finally to the structure specified in <tree:leafNodeValue>

    • Additional properties in <rootNodeAttr>, <internalNodeAttr> and <leafNodeAttr>

  • Schema writer has to know the structure model specification, while the XML writer only needs to know the given schema


Modeling
Modeling

  • For an instance of “MMX data object”

    • As an encapsulated information object only accessible from the root, thus as a “single tree node”

    • As a mapping from root node, query method and query parameters to the value at leaf nodes

    • Leaf nodes may contain any valid XML content, as long as defined in the Schema

      • I.e. may contain another “MMX data object”

    • A query is modeled as a 3-dimension tuple:

      • [accessing-node, query-method, query-parameters]

      • Accessing-node is specified by XPath

      • Query-method is specified in String Value

      • Query-parameters is multi-dimension depends on the current model


Modeling 2
Modeling (2)

A

Tree(1) is accessible frompoint A, occasionally, a query

(e.g. [A, “spatial-search”,(3, 5)],

assuming Tree(1) will accept

spatial-search with two coordinates)

may return point B as answer,

either by XPath of B or the

XML subtree of B.

From this point B, user may drill down the tree by issueanother query on Tree(2).

Tree (1)

B

Tree(2)

XML Elements..


Query with and without mmx
Query with and without MMX

  • From the original XML data, we could not assume the semantics of the data:

    • We can ONLY do XML-based query such as XPath

    • We can do the spatial query ONLY IF we can map the data into a R-Tree

  • After mapping the data into R-Tree

    • Spatial Queries

      • Give me the point at (2,7)

      • Give me the point nearest to (4,4)

    • Nearest Neighbor Search

      • Give me the point nearest to “Franklin”


Processing
Processing

  • Users might not know the “type” of the node (and not necessary to know). They are interested in what they can do

  • Users retrieved the list of possible operation by issuing a LIST-OPERATION method to the root element of a MMX object

  • Possible operations may include queries, updates, and other model-specific operations


Mmx query system
MMX Query System

  • To show that the schema, modeling, and processing of MMX extension is workable

  • To illustrate how it assists in querying XML data

  • To facilitate as the platform for testing the implementation of arbitrary structured models

  • Implement with JDK1.4


System design
System Design

Clients

XML

DOM

MMXDocument

Node Data

Schema

MMX Element

ParseSchema

FetchClasses

AbstractMMX Element

The Abstract Class defines common interface that have to be implement in each MMX Element such as LIST-OPERATION, QUERY, BUILD, etc.

Extends class

(Partly)Defines

VP-Tree

X-Tree

R-TreeSchema

Maps

R-Tree


Discussions pros
Discussions - Pros

  • Compatible with the relational approach, and supersedes that.

  • Modular design promotes reusability and maintainability

  • XML “flatten” the legacy structured data to make them text-editable, easy to transport and process by different systems


Discussion cons
Discussion - Cons

  • There is no generic syntax to precisely describe all kinds of structures models

  • The size of XML file is often larger than legacy data file

  • Each structure model needs additional implementation effort

  • Schema specification become longer and longer quickly as number of supported model increases


Conclusion
Conclusion

  • Propose a representation to encapsulate data structures

  • Describe XML data with the Schema conceptually as well as syntactically

  • Map legacy structure models into Schema, and map XML data to the structure models by the Schema

  • Structured data repository with increased interoperability, reusability, and transportability



ad