New applications of compiler techniques for data grids
This presentation is the property of its rightful owner.
Sponsored Links
1 / 11

New (Applications of) Compiler Techniques for Data Grids PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on
  • Presentation posted in: General

New (Applications of) Compiler Techniques for Data Grids. Gagan Agrawal . Outline. Automatic Data Virtualization SQL Implementation XML/XQuery Automatic Wrapper Generation Data Integration in Bioinformatics Compiling XML Query Language XQuery Issues with streaming data .

Download Presentation

New (Applications of) Compiler Techniques for Data Grids

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


New applications of compiler techniques for data grids

New (Applications of) Compiler Techniques for Data Grids

Gagan Agrawal


Outline

Outline

  • Automatic Data Virtualization

    • SQL Implementation

    • XML/XQuery

  • Automatic Wrapper Generation

    • Data Integration in Bioinformatics

  • Compiling XML Query Language XQuery

    • Issues with streaming data


Data virtualization

Data Virtualization

An abstract view of data

dataset

Data

Virtualization

Data Service

-- Scientific Data being shared on Web/Grids

-- Low-level layouts

-- Need for efficient storage and processing


Our approach automatic data virtualization

Our Approach: Automatic Data Virtualization

  • Automatically create data services

    • A new application of compiler technology

  • A meta-data descriptor describes the layout of data in a repository

  • An abstract view is exposed to the users

  • Two implementations:

    • Relational /SQL-based (HPDC 2004, LCPC 2004)

    • XML/XQuery based (ICS 2003, LCPC 2003)


Sql relational implementation

SQL/Relational Implementation

SELECT < Data Elements >

FROM < Dataset Name >

WHERE ….

AND Filter( < Data Element> );


Xml xquery implementation

XQuery

???

XML

XML/XQuery Implementation

HDF5

NetCDF

TEXT

RMDB


Approach contributions

Approach / Contributions

  • Use of XML Schemas to provide high-level abstractions on complex datasets

  • Using XQuery with these Schemas to specify processing

  • Issues in Translation

    • High-level to low-level code

    • Data-centric transformations for locality in low-level codes

    • Issues specific to XQuery

      • Recognizing recursive reductions

      • Type inferencing and translation


Wrappers

Wrappers

  • Goal: to provide the integration system transparent access to data sources

  • Challenges

    • Development cost

    • Performance

      • Scripting languages can be slow

    • Updates

      • Data Formats can change frequently


Our approach

Our Approach

  • Machine-interpretable metadata

  • A layout descriptor associated with each dataset

  • Wrappers generated on the fly

    • Applied to several bioinformatics examples


Layout descriptor

Layout Descriptor

Dataset name

Schema name

DATASET “FASTAData” {

DATATYPE {FASTA}

DATASPACE LINESIZE=80 {

LOOP ENTRY 1:EOF:1 {

“>” ID

“ “ DESCRIPTION

< “\n” SEQ >

“\n” | EOF }

}

DATA {osu/fasta}

}

ID

DESCRIPTION

>Example1 envelope protein

ELRLRYCAPAGFALLKCNDA

DYDGFKTNCSNVSVVHCTNL

MNTTVTTGLLLNGSYSENRT

QIWQKHRTSNDSALILLNKH

>Example2 synthetic peptide

HITREPLKHIPKERYRGTNDT…

SEQ

SEQ

File layout

SEQ

SEQ

File location


Xquery on streaming data

XQuery on Streaming Data

  • Infinite data streams

    • All processing must be single pass

  • Interesting Compiler Questions:

    • How do I transform a code to execute on a single pass

    • How to tell that it can be executed correctly with a single pass

  • Addressed this problem for XML Streams and XML query language XQuery

  • Appears in VLDB 2005


  • Login