It s always been big data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 7

It’s Always been Big Data…! PowerPoint PPT Presentation


  • 47 Views
  • Uploaded on
  • Presentation posted in: General

It’s Always been Big Data…!. Minos Garofalakis Technical University of Crete http://www.softnet.tuc.gr/~minos/. “Big” Depending on Context… . Grows by Moore’s Law… 1 st VLDB (1975): Big = millions of data points gathered by the US Census Bureau [Simonson, Alsbrooks , VLDB’75]

Download Presentation

It’s Always been Big Data…!

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


It s always been big data

It’s Always been Big Data…!

MinosGarofalakis

Technical University of Crete

http://www.softnet.tuc.gr/~minos/


Big depending on context

“Big” Depending on Context…

  • Grows by Moore’s Law…

  • 1st VLDB (1975): Big = millions of data points gathered by the US Census Bureau [Simonson, Alsbrooks, VLDB’75]

    • Things have changed since then…

  • In general, Big = data that cannot be handled using standalone, standard tools (on a desktop)

    • Today, this means using Hadoop/MR clusters, Cloud DBMSs, Supercomputers, …


The big data pipeline

The Big Data Pipeline

  • Several major pain points/ challenges at each step

  • Throwback to early batch computing of the 1960s!

    • No direct manipulation, interactivity, fast response

    • Processing is opaque, time consuming, costly

      • Typically, using a series of remote VMs

      • Different designs => VERY different temporal/financial implications


Data analytics is exploratory by nature

Data Analytics is Exploratory by Nature!

  • Can we support interactive exploration and rapid iteration over Big Data?

    • Mimic versatility of local file handling with tools like Excel and scripts (e.g., R)

  • One approach: Small footprint Synopses/Sketches for fast approximate answers and visualizations

    • Sampling already used (in ad-hoc manner)

    • Much relevant work on AQP and streaming

    • But, we must handle the Variety dimension

      • Both in data types and classes of analytics tasks!

    • Another important dimension: Distribution

      • LIFT/LEADS/FERARI projects and BD3 Workshop (this Friday!)


Optimization collaboration provenance

Optimization, Collaboration, Provenance

  • Can we help users to plan/monitor the monetary and time implications of their design decisions?

    • Again, this should be an interactive process

  • Can we enable users to collaborate around Big Data?

    • Share data sources, scripts, experiences, even data runs

    • Work on collaborative mashups/visualization, CSCW

  • Can we help users to explore and exploit the provenance and computation history of the data?

    • “Institutional memory” on data sources and analyses

  • Data synopses/approximation critical to all three…!

    • May just be my personal bias speaking…


A grand challenge

A Grand Challenge

Can we take a typical Excel/R user and empower them to become a Big Data Scientist?

  • For non-data-savvy “citizen scientists”, lack of statistical sophistication is a key problem

    • Can lead to poor decisions and results; more “play” than “science”

  • Support for fast interactive exploration, workflow optimization, collaboration, and provenance is critical

    • Relevant work exists in our community but still lots to be done…


A happy data scientist is a good thing

A Happy Data Scientist is a Good Thing! 


  • Login