New applications of compiler techniques for data grids
Download
1 / 11

New (Applications of) Compiler Techniques for Data Grids - PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on
  • Presentation posted in: General

New (Applications of) Compiler Techniques for Data Grids. Gagan Agrawal . Outline. Automatic Data Virtualization SQL Implementation XML/XQuery Automatic Wrapper Generation Data Integration in Bioinformatics Compiling XML Query Language XQuery Issues with streaming data .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentationdownload

New (Applications of) Compiler Techniques for Data Grids

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


New applications of compiler techniques for data grids

New (Applications of) Compiler Techniques for Data Grids

Gagan Agrawal


Outline

Outline

  • Automatic Data Virtualization

    • SQL Implementation

    • XML/XQuery

  • Automatic Wrapper Generation

    • Data Integration in Bioinformatics

  • Compiling XML Query Language XQuery

    • Issues with streaming data


Data virtualization

Data Virtualization

An abstract view of data

dataset

Data

Virtualization

Data Service

-- Scientific Data being shared on Web/Grids

-- Low-level layouts

-- Need for efficient storage and processing


Our approach automatic data virtualization

Our Approach: Automatic Data Virtualization

  • Automatically create data services

    • A new application of compiler technology

  • A meta-data descriptor describes the layout of data in a repository

  • An abstract view is exposed to the users

  • Two implementations:

    • Relational /SQL-based (HPDC 2004, LCPC 2004)

    • XML/XQuery based (ICS 2003, LCPC 2003)


Sql relational implementation

SQL/Relational Implementation

SELECT < Data Elements >

FROM < Dataset Name >

WHERE ….

AND Filter( < Data Element> );


Xml xquery implementation

XQuery

???

XML

XML/XQuery Implementation

HDF5

NetCDF

TEXT

RMDB


Approach contributions

Approach / Contributions

  • Use of XML Schemas to provide high-level abstractions on complex datasets

  • Using XQuery with these Schemas to specify processing

  • Issues in Translation

    • High-level to low-level code

    • Data-centric transformations for locality in low-level codes

    • Issues specific to XQuery

      • Recognizing recursive reductions

      • Type inferencing and translation


Wrappers

Wrappers

  • Goal: to provide the integration system transparent access to data sources

  • Challenges

    • Development cost

    • Performance

      • Scripting languages can be slow

    • Updates

      • Data Formats can change frequently


Our approach

Our Approach

  • Machine-interpretable metadata

  • A layout descriptor associated with each dataset

  • Wrappers generated on the fly

    • Applied to several bioinformatics examples


Layout descriptor

Layout Descriptor

Dataset name

Schema name

DATASET “FASTAData” {

DATATYPE {FASTA}

DATASPACE LINESIZE=80 {

LOOP ENTRY 1:EOF:1 {

“>” ID

“ “ DESCRIPTION

< “\n” SEQ >

“\n” | EOF }

}

DATA {osu/fasta}

}

ID

DESCRIPTION

>Example1 envelope protein

ELRLRYCAPAGFALLKCNDA

DYDGFKTNCSNVSVVHCTNL

MNTTVTTGLLLNGSYSENRT

QIWQKHRTSNDSALILLNKH

>Example2 synthetic peptide

HITREPLKHIPKERYRGTNDT…

SEQ

SEQ

File layout

SEQ

SEQ

File location


Xquery on streaming data

XQuery on Streaming Data

  • Infinite data streams

    • All processing must be single pass

  • Interesting Compiler Questions:

    • How do I transform a code to execute on a single pass

    • How to tell that it can be executed correctly with a single pass

  • Addressed this problem for XML Streams and XML query language XQuery

  • Appears in VLDB 2005


ad
  • Login