1 / 12

Learning Objectives for Big Data

Learning Objectives for Big Data. Define big data and understand how it is differentiated from “regular old” data. Recognize examples and applications of big data. Understand the key problems we are trying to solve when coping with big data.

viveca
Download Presentation

Learning Objectives for Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Objectives for Big Data Define big data and understand how it is differentiated from “regular old” data. Recognize examples and applications of big data. Understand the key problems we are trying to solve when coping with big data. Become aware of the “solutions” that people are using to cope with big data.

  2. Big Data • Definitions differ depending on perspective. • Data that is difficult to process using traditional database and software techniques (abbreviated from Wikipedia/Webopedia). • “Big” is relative to the organization. • “Big” is relative in time.

  3. Characteristics of big data Volume Variety Variability Velocity Veracity

  4. What are the problems with big data? • Dealing with different types of data. • Data that doesn’t have a clear data type. • Data that changes data type. • Unstructured data: does not have a pre-defined data model; usually text. • Storing and accessing incredibly large quantities of data. • Transforming and loading data immediately. • Performing analytics immediately. • Using “big data” to create “real information”.

  5. Solutions for storing unstructured data • Rows and columns don’t work. • Need a “file” or “document” type of management system. • Examples: • MongoDB • VelocityDB • Apache Hadoop (HDFS) • Oracle NoSQL • CouchDB

  6. Solutions for storing and accessing big data (1) • Distribute processing of very large multi-structured data files across a large cluster of ordinary machines/processors • MapReduce • Sharding/Horizontal partitioning • Break the data into parts, which are then loaded into a file system on multiple nodes. • Each part may be replicated multiple times. • The results are collected and aggregated using a MapReduce algorithm, or other type of partitioning algorithm.

  7. Solutions for storing and accessing big data (2) • Lots of memory; really fast disk • In-memory computing • HANA (SAP) • DB2 BLU (IBM) • Informix (IBM) • ActiveSpaces (TIBCO Software) • Oracle • Database appliance: marketing term for an integrated set of servers, storage, operationg system, and DBMS specifically pre-installed and pre-optimized for data warehousing (Wikipedia rules!!)

  8. Solutions for TL immediacy • Transform after loading data. Perform data loading and transformation continuously. • Problems: • Most data transformation tools are not designed to work well with unstructured data. • Few frameworks are currently focusing on ETL, because the data is not “mission critical.” • Opportunities!!!!

  9. Solutions for analytics immediacy Define need for immediacy. Real-time or close?? Streaming analytics: process data as it arrives; usually does not compare against all existing data – usually has a pre-defined “window” of time/data used for analytical processing. May or may not store the results of the analytical processes. Perpetual analytics: process data as it arrives comparing it against existing data and then storing the results of the analytics.

  10. Solutions for creating information from big data Culture of data-driven decision making. Data scientist. Information visualization techniques.

  11. Skills that define a data scientist

  12. Typical Job Post for Data Scientist

More Related