new applications of compiler techniques for data grids
Download
Skip this Video
Download Presentation
New (Applications of) Compiler Techniques for Data Grids

Loading in 2 Seconds...

play fullscreen
1 / 11

New (Applications of) Compiler Techniques for Data Grids - PowerPoint PPT Presentation


  • 129 Views
  • Uploaded on

New (Applications of) Compiler Techniques for Data Grids. Gagan Agrawal . Outline. Automatic Data Virtualization SQL Implementation XML/XQuery Automatic Wrapper Generation Data Integration in Bioinformatics Compiling XML Query Language XQuery Issues with streaming data .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' New (Applications of) Compiler Techniques for Data Grids' - craig


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Automatic Data Virtualization
    • SQL Implementation
    • XML/XQuery
  • Automatic Wrapper Generation
    • Data Integration in Bioinformatics
  • Compiling XML Query Language XQuery
    • Issues with streaming data
data virtualization
Data Virtualization

An abstract view of data

dataset

Data

Virtualization

Data Service

-- Scientific Data being shared on Web/Grids

-- Low-level layouts

-- Need for efficient storage and processing

our approach automatic data virtualization
Our Approach: Automatic Data Virtualization
  • Automatically create data services
    • A new application of compiler technology
  • A meta-data descriptor describes the layout of data in a repository
  • An abstract view is exposed to the users
  • Two implementations:
    • Relational /SQL-based (HPDC 2004, LCPC 2004)
    • XML/XQuery based (ICS 2003, LCPC 2003)
sql relational implementation
SQL/Relational Implementation

SELECT < Data Elements >

FROM < Dataset Name >

WHERE ….

AND Filter( < Data Element> );

xml xquery implementation

XQuery

???

XML

XML/XQuery Implementation

HDF5

NetCDF

TEXT

RMDB

approach contributions
Approach / Contributions
  • Use of XML Schemas to provide high-level abstractions on complex datasets
  • Using XQuery with these Schemas to specify processing
  • Issues in Translation
    • High-level to low-level code
    • Data-centric transformations for locality in low-level codes
    • Issues specific to XQuery
      • Recognizing recursive reductions
      • Type inferencing and translation
wrappers
Wrappers
  • Goal: to provide the integration system transparent access to data sources
  • Challenges
    • Development cost
    • Performance
      • Scripting languages can be slow
    • Updates
      • Data Formats can change frequently
our approach
Our Approach
  • Machine-interpretable metadata
  • A layout descriptor associated with each dataset
  • Wrappers generated on the fly
    • Applied to several bioinformatics examples
layout descriptor
Layout Descriptor

Dataset name

Schema name

DATASET “FASTAData” {

DATATYPE {FASTA}

DATASPACE LINESIZE=80 {

LOOP ENTRY 1:EOF:1 {

“>” ID

“ “ DESCRIPTION

< “\n” SEQ >

“\n” | EOF }

}

DATA {osu/fasta}

}

ID

DESCRIPTION

>Example1 envelope protein

ELRLRYCAPAGFALLKCNDA

DYDGFKTNCSNVSVVHCTNL

MNTTVTTGLLLNGSYSENRT

QIWQKHRTSNDSALILLNKH

>Example2 synthetic peptide

HITREPLKHIPKERYRGTNDT…

SEQ

SEQ

File layout

SEQ

SEQ

File location

xquery on streaming data
XQuery on Streaming Data
  • Infinite data streams
    • All processing must be single pass
  • Interesting Compiler Questions:
    • How do I transform a code to execute on a single pass
    • How to tell that it can be executed correctly with a single pass
  • Addressed this problem for XML Streams and XML query language XQuery
  • Appears in VLDB 2005
ad