1 / 6

Why Computer Scientists Don’t Use Databases

Basically Everyone Except My Bank. Biologists. Physicists. Why Computer Scientists Don’t Use Databases. Sam Madden. The Answer. Benefit(DBMS ) ≠ Suckiness(DBMS ) Disadvantages of DBMSs are growing Impoverished data manipulation language Lack of modern cleaning and modeling tools

wood
Download Presentation

Why Computer Scientists Don’t Use Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basically Everyone Except My Bank Biologists Physicists Why Computer Scientists Don’t Use Databases Sam Madden

  2. The Answer • Benefit(DBMS) ≠ Suckiness(DBMS) • Disadvantages of DBMSs are growing • Impoverished data manipulation language • Lack of modern cleaning and modeling tools • Advantages of a DBMSs are shrinking • Large data sets? Transactions? High-level languages? • DBMS setup & boundary crossings painful • Especially if you have to do it multiple times! MATLAB Python awk Regression Graphical models Interpolation

  3. Example • CarTel - ~1M GPS points per day from a fleet of 40 cabs on Boston streets • Pipeline • Raw data in DBMS • Trajectories with Matlab • Queries with SQL • Route Planning with C++ • Visualization on Google Maps • Database isn’t the most valuable tool in this picture • Import/Export is non-trivial • Database is least flexible tool, requires most maintenance IMPORT EXPORT IMPORT EXPORT EXPORT

  4. Two Solutions • Decrease Frequency of Boundary Crossings • Cram more stuff into the DBMS • FunctionDB • Probabilistic databases • ArrayDBs • XML • …

  5. FunctionDB • DBMS that can fit continuous functions to raw data, query data representedby these functions using SQL Regression Function temp(t) Raw data (temp readings) Solveequation temp(t) = thresh Query: Report when temp crosses threshold SELECT time WHERE temp = thresh time

  6. Two Solutions • Decrease Frequency of Boundary Crossings • Cram more stuff into the DBMS • FunctionDB • Probabilistic databases • ArrayDBs • XML • … • Decrease Pain of Boundary Crossings • Don’t insist on DBMS-specified storage representation • Text and fileystem based tools to manage, edit, manipulate data • Don’t insist on SQL • Don’t insist on structured data • Add transactions, rollback, lineage to Unix toolchain and filesystems • > checkpoint data.txt • > awk -fprocess.awkdata.txt > data.txt • > history data.txt • awkprocess.awk • > rollback data.txt • If you can’t beat them, join them

More Related