1 / 13

MauveDB: Model-based User Views

MauveDB: Model-based User Views. Problem. Databases are unusable for scientific data Data are incomplete, imprecise, and erroneous Need to be filtered/synthesized using models Scientists use the in the most rudimentary ways As a backing store for raw data Run few or no queries

lance
Download Presentation

MauveDB: Model-based User Views

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MauveDB: Model-based User Views

  2. Problem • Databases are unusable for scientific data • Data are incomplete, imprecise, and erroneous • Need to be filtered/synthesized using models • Scientists use the in the most rudimentary ways • As a backing store for raw data • Run few or no queries • User-define functions are inadequate • Static models, insufficient for many applications • Let’s discuss this later?

  3. Approach • Define user-views based on a model syntax • Extend traditional SQL-view model • User views provide access to synthesized data • Data independence • Present stable view of system • When sites don’t report data (missing values) • When network changes • Report data at different locations than sampled • View maintenance • Issues of whether to materialize or not

  4. Processing Scientific Data • Without Model-based views • Export to Matlab then apply models • Use custom, programmatic querying tools • Can’t use SQL • Getting data back into database is awkward and inefficient • With Model-based views • Self-updating models as data changes • Standard SQL data against synthesized data

  5. Example • Benefits • Network changes are transparent • Spatial or temporal biases removed (e.g., for aggregates) • What about model errors?

  6. Architecture

  7. View Creation: Regression • Select a virtual grid on which data are reported • Using MatLab style syntax • Create a unique model at each time T

  8. View Creation: Interpolation • Interpolate missing values from nearby sites

  9. Case Study 1: Temp Regression

  10. Case Study 2: Temp Interpolation

  11. The AS Clause • AS clause specifies each model • AS FIT • AS INTERPOLATE • Probably needs extended syntax for models methods • INTERPOLATE with splines, nearest neighbor, regression • User-views are only as flexible as models pre-programmed into the syntax • How does this compare with UDFs, table valued functions? • Is this the appropriate level for this kind of customization?

  12. View Maintenance • Options • Logical: build results for each query • Materialized: pre-compute all results for each model • Partial/Cached: store results generated by queries • Model-based: often models have fixed costs • Building basis functions, matrix inversions, linear solutions • Tradeoff between query latency and overhead • Is implementing model logic at such a low level reasonable?

  13. Outcomes/Opinions • Is MauveDB the technology that will make scientists use databases?

More Related