cs 519 big data exploration and analytics n.
Skip this Video
Loading SlideShow in 5 Seconds..
CS 519 : Big Data Exploration and Analytics PowerPoint Presentation
Download Presentation
CS 519 : Big Data Exploration and Analytics

Loading in 2 Seconds...

play fullscreen
1 / 21

CS 519 : Big Data Exploration and Analytics - PowerPoint PPT Presentation

  • Uploaded on

CS 519 : Big Data Exploration and Analytics. 1: Introduction. Welcome to CS519!. Arash Termehchy Assistant professor in the school of EECS Usable data management and exploration. Your turn: Name, field, DB background. The Era of Big Data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'CS 519 : Big Data Exploration and Analytics' - adah

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
welcome to cs519
Welcome to CS519!
  • Arash Termehchy
    • Assistant professor in the school of EECS
    • Usable data management and exploration.
  • Your turn:
    • Name, field, DB background
the era of big data
The Era of Big Data
  • People and devises generate and share data in staggering rates
    • Your friends: social networks, online games, …
      • 30 billion data items shared on Facebook every month
    • Your cell phone: your positions, daily activities, …
    • Your car
    • Your shopping activities
    • Web: Surface and deep web
the era of big data1
The Era of Big Data
  • Hubble Telescope: 50 GB each month.
  • High throughput screening devices
  • Environmental sensor networks
data is valuable
Data is valuable
  • In the mid-1850s, Dr. John Snow plotted cholera deaths on a map, and in the corner of a particularly hard-hit buildings was a water pump.
  • A 19th-century version of big data, which suggested an association between cholera and the pump.
data is valuable1
Data is valuable
  • “The Fourth Paradigm: Data-Intensive Scientific Discovery”, Jim Gray
    • Empirical
    • Theoretical
    • Computational
    • Data exploration, eScience
  • Sloan Sky server is one the

most cited resources in


data is valuable2
Data is valuable
  • Spread of diseases by analyzing Google query log
  • Personalized medicine, drug discovery, …
  • “The Unreasonable Effectiveness of Data”
three v s of big data volume
Three V’s of big data: Volume
  • Large HardonColider: 500 exabyte per day of all sensors work.
  • Sloan Digital Sky Server has to accommodate 30 TB new data per day at 2016.
  • According to McKinsey & Company’:
    • 40% growth in the global data each year
  • 90% of world’s data was generated in the last two year!
three v s of big data variety
Three V’s of big data: Variety
  • Valuable information are scattered across various sources in various forms.
  • Large number of social networks
  • Large number of life science databases
three v s of big data variety1
Three V’s of big data: Variety
  • “The systemic risks associated with the subprime lending market and the crash of the housing market in 2007 could have been modeled through a comprehensive integration and analysis of available public datasets. …. Integrating these datasets may have provided financial analysts, regulators and academic researchers, with comprehensive models to enable risk assessment.”


three v s of big data variety2
Three V’s of big data: Variety
  • It is arguably more challenging than volume, as it requires deeper understanding of the data
  • Data integration has been recognized as a hard problem in DB community.
three v s of big data velocity
Three V’s of big data: Velocity
  • Data is rapidly evolving.
    • Web sites, social networks, scientific data, …
  • Trends are changing in a short amount of time.
    • News media, stock market, …
  • We like to get the insight fast.
  • We do not like to rewrite our programs.
an extra v veracity
An extra V: Veracity
  • Data is not clean and consistent.
  • Common experience between data engineers and scientists
data exploration and analysis
Data exploration and analysis
  • The focus of course is on variety and velocity: data heterogeneity.
  • Why data is heterogeneous and how we can handle it?
  • We will discuss other issues as well.
  • CS 540 or equivalent
  • Contact instructor if you are not sure.
course format
Course format
  • Some basic lectures at the beginning.
  • Mostly paper presentation and discussion.
  • One paper in most sessions.
  • Student presentation followed by group discussion.
student presentation
Student presentation
  • Select a paper by the end of this week.
  • Multiple papers for some subjects
  • Choose an exciting paper /subject for you.
  • Email the presentation material by 5:00 pm the day before.
discussion and participation
Discussion and participation
  • All students must read the paper and post a summary and questions on Piazza.
  • Each student should ask at least one question per week.
  • A short wrap-up quiz.
  • The list of papers posted on the course Web site.
  • Referred text books:
    • Foundations of Databases, Serge Abiteboul, Richard Hull, and Victor Vianu
    • Database Systems: The Complete Book, Hector Garcia Molina, Jeffry Ullman, and Jennifer Widom
  • A research project on big data exploration
    • Pick a question
    • Define the problem rigorously and get insights.
    • Provide a solution: building a proof of concept prototype, prove some interesting results
  • Group 1 – 4
  • Talk to the instructor
  • Project proposal is due in the third week of the class.
grading scheme
Grading Scheme
  • One assignment to cover the basic concepts: 5%
  • Paper review: 15%
  • Paper presentation: 15%
  • Discussion: 10%
  • Quiz: 15%
  • Project: 40%