New applications of compiler techniques for data grids
Download
1 / 11

New (Applications of) Compiler Techniques for Data Grids - PowerPoint PPT Presentation


  • 130 Views
  • Uploaded on

New (Applications of) Compiler Techniques for Data Grids. Gagan Agrawal . Outline. Automatic Data Virtualization SQL Implementation XML/XQuery Automatic Wrapper Generation Data Integration in Bioinformatics Compiling XML Query Language XQuery Issues with streaming data .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'New (Applications of) Compiler Techniques for Data Grids' - craig


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline
Outline

  • Automatic Data Virtualization

    • SQL Implementation

    • XML/XQuery

  • Automatic Wrapper Generation

    • Data Integration in Bioinformatics

  • Compiling XML Query Language XQuery

    • Issues with streaming data


Data virtualization
Data Virtualization

An abstract view of data

dataset

Data

Virtualization

Data Service

-- Scientific Data being shared on Web/Grids

-- Low-level layouts

-- Need for efficient storage and processing


Our approach automatic data virtualization
Our Approach: Automatic Data Virtualization

  • Automatically create data services

    • A new application of compiler technology

  • A meta-data descriptor describes the layout of data in a repository

  • An abstract view is exposed to the users

  • Two implementations:

    • Relational /SQL-based (HPDC 2004, LCPC 2004)

    • XML/XQuery based (ICS 2003, LCPC 2003)


Sql relational implementation
SQL/Relational Implementation

SELECT < Data Elements >

FROM < Dataset Name >

WHERE ….

AND Filter( < Data Element> );


Xml xquery implementation

XQuery

???

XML

XML/XQuery Implementation

HDF5

NetCDF

TEXT

RMDB


Approach contributions
Approach / Contributions

  • Use of XML Schemas to provide high-level abstractions on complex datasets

  • Using XQuery with these Schemas to specify processing

  • Issues in Translation

    • High-level to low-level code

    • Data-centric transformations for locality in low-level codes

    • Issues specific to XQuery

      • Recognizing recursive reductions

      • Type inferencing and translation


Wrappers
Wrappers

  • Goal: to provide the integration system transparent access to data sources

  • Challenges

    • Development cost

    • Performance

      • Scripting languages can be slow

    • Updates

      • Data Formats can change frequently


Our approach
Our Approach

  • Machine-interpretable metadata

  • A layout descriptor associated with each dataset

  • Wrappers generated on the fly

    • Applied to several bioinformatics examples


Layout descriptor
Layout Descriptor

Dataset name

Schema name

DATASET “FASTAData” {

DATATYPE {FASTA}

DATASPACE LINESIZE=80 {

LOOP ENTRY 1:EOF:1 {

“>” ID

“ “ DESCRIPTION

< “\n” SEQ >

“\n” | EOF }

}

DATA {osu/fasta}

}

ID

DESCRIPTION

>Example1 envelope protein

ELRLRYCAPAGFALLKCNDA

DYDGFKTNCSNVSVVHCTNL

MNTTVTTGLLLNGSYSENRT

QIWQKHRTSNDSALILLNKH

>Example2 synthetic peptide

HITREPLKHIPKERYRGTNDT…

SEQ

SEQ

File layout

SEQ

SEQ

File location


Xquery on streaming data
XQuery on Streaming Data

  • Infinite data streams

    • All processing must be single pass

  • Interesting Compiler Questions:

    • How do I transform a code to execute on a single pass

    • How to tell that it can be executed correctly with a single pass

  • Addressed this problem for XML Streams and XML query language XQuery

  • Appears in VLDB 2005