it s always been big data
Download
Skip this Video
Download Presentation
It’s Always been Big Data…!

Loading in 2 Seconds...

play fullscreen
1 / 7

It’s Always been Big Data…! - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

It’s Always been Big Data…!. Minos Garofalakis Technical University of Crete http://www.softnet.tuc.gr/~minos/. “Big” Depending on Context… . Grows by Moore’s Law… 1 st VLDB (1975): Big = millions of data points gathered by the US Census Bureau [Simonson, Alsbrooks , VLDB’75]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' It’s Always been Big Data…!' - vidor


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
it s always been big data

It’s Always been Big Data…!

MinosGarofalakis

Technical University of Crete

http://www.softnet.tuc.gr/~minos/

big depending on context
“Big” Depending on Context…
  • Grows by Moore’s Law…
  • 1st VLDB (1975): Big = millions of data points gathered by the US Census Bureau [Simonson, Alsbrooks, VLDB’75]
    • Things have changed since then…
  • In general, Big = data that cannot be handled using standalone, standard tools (on a desktop)
    • Today, this means using Hadoop/MR clusters, Cloud DBMSs, Supercomputers, …
the big data pipeline
The Big Data Pipeline
  • Several major pain points/ challenges at each step
  • Throwback to early batch computing of the 1960s!
    • No direct manipulation, interactivity, fast response
    • Processing is opaque, time consuming, costly
      • Typically, using a series of remote VMs
      • Different designs => VERY different temporal/financial implications
data analytics is exploratory by nature
Data Analytics is Exploratory by Nature!
  • Can we support interactive exploration and rapid iteration over Big Data?
    • Mimic versatility of local file handling with tools like Excel and scripts (e.g., R)
  • One approach: Small footprint Synopses/Sketches for fast approximate answers and visualizations
    • Sampling already used (in ad-hoc manner)
    • Much relevant work on AQP and streaming
    • But, we must handle the Variety dimension
      • Both in data types and classes of analytics tasks!
    • Another important dimension: Distribution
      • LIFT/LEADS/FERARI projects and BD3 Workshop (this Friday!)
optimization collaboration provenance
Optimization, Collaboration, Provenance
  • Can we help users to plan/monitor the monetary and time implications of their design decisions?
    • Again, this should be an interactive process
  • Can we enable users to collaborate around Big Data?
    • Share data sources, scripts, experiences, even data runs
    • Work on collaborative mashups/visualization, CSCW
  • Can we help users to explore and exploit the provenance and computation history of the data?
    • “Institutional memory” on data sources and analyses
  • Data synopses/approximation critical to all three…!
    • May just be my personal bias speaking…
a grand challenge
A Grand Challenge

Can we take a typical Excel/R user and empower them to become a Big Data Scientist?

  • For non-data-savvy “citizen scientists”, lack of statistical sophistication is a key problem
    • Can lead to poor decisions and results; more “play” than “science”
  • Support for fast interactive exploration, workflow optimization, collaboration, and provenance is critical
    • Relevant work exists in our community but still lots to be done…
ad