1 / 10

Here are my Data Files. Here are my Queries. Where are my Results?

Here are my Data Files. Here are my Queries. Where are my Results?. Stratos Idreos * Ioannis Alagiannis ‡ Ryan Johnson § Anastasia Ailamaki ‡. § University of Toronto. ‡ École Polytechnique Fédérale de Lausanne. *CWI, Amsterdam. CERN ($20B physics experiment).

tahir
Download Presentation

Here are my Data Files. Here are my Queries. Where are my Results?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Here are my Data Files. Here are my Queries. Where are my Results? StratosIdreos* IoannisAlagiannis‡ Ryan Johnson§ Anastasia Ailamaki‡ §University of Toronto ‡ÉcolePolytechniqueFédérale de Lausanne *CWI, Amsterdam

  2. CERN ($20B physics experiment) • Last year: 35PB! • Experiments, simulation, user data… • All stored in flat files • Database only stores metadata • Custom solutions & scripts • Almost never a DBMS Why???

  3. Why people don’t use DBMS? Requirements Analysis Define a schema Load the data Iterate to convergence Tune the system Evolving requirements => no convergence

  4. Data import & tuning Massage Data Load Tuples DBMS owns the data now Flat Files Why wait? Why complete load? Database Which format? Hire DB expert? Not worth the startup cost

  5. Avoiding up-front overheads Flat File Flat files an integral part of the system Hot data Query over flat files Adaptive loads Tuning in background DBMS actions driven by workload

  6. Adaptive loading Flat File Metadata ColumnLoad Loaded Columns: a2 a3 Partial Load Full Load Metadata Loaded Parts: a2 a3 Storage

  7. Dynamic file adaptation New Flat Files a) Parse only needed columns b) New flat file per attribute Original Flat File Analyze non-tokenized attributes

  8. Adaptive loading in practice Q1: Loading Cost + First Query Constant performance for all queries Q11: load from FF Filtering on-the-fly Q1: half the cost On-the-fly load Cache data select sum(a1), avg(a2) from R where a1<v1 and a2<v2 Amortize loading cost over the query sequence

  9. Towards a fully autonomous system Give me your queries Give me your data as is Get your results! Adaptive Load Adaptive Data Store Adaptive Kernel Invisible DBMS (supports SQL + your tools) grep, awk Challenge: make this invisible

More Related