1 / 25

Big Data Analytics

Big Data Analytics. A Presentation by Meg Monsen , Michael Leonard, and Eric Zeng. Agenda. Big Data Analytics and its Objective s Financial Impact Structured vs Unstructured Data Us ers of Big Data Relevant Technologies ( Hadoop, MongoDB) Coding Examples Future of A nalytics.

jdunn
Download Presentation

Big Data Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Analytics A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng

  2. Agenda • Big Data Analytics and its Objectives • Financial Impact • Structured vs Unstructured Data • Users of Big Data • Relevant Technologies ( Hadoop, MongoDB) • Coding Examples • Future of Analytics

  3. What is Big Data and why does it matter? • Defining Big Data Analytics • Examining large sets of data • Discovering patterns and trends • Data warehouses are insufficient • Purposes • Uncovering hidden needs of customers • Improve operational efficiency

  4. Big Data & Operational Efficiency • “By using big data for operations analysis, organizations can gain real-time visibility into operations, customer experience, transactions and behavior.” – IBM • Core Objectives • Gain • Analyze • Apply • Optimize

  5. Financial Impact of Big Data • High cost of poor data quality • 3.1 trillion to US government annually • 10-25% of US business revenues • Opportunities for qualified analysts • Business Analyst: $66,000 • Data Analyst: $60,000 • Data Scientist: $113,000

  6. Dimensions of Big Data • Essential Characteriestics: • Volume - Data quantity • Velocity - Data Speed • Variety - Data Types

  7. Structured vs. Unstructured Data Structured Data • Represented as text • Transactional data, formal reports, accounting records of sales and costs • Relational databases / data warehouse • SQL Unstructured Data • May be textual or non-textual • Mobile usage, click stream activity, social media responses, genomic data • No structured database / data lake • NoSQL (Not only SQL), SQL Batch Queries

  8. Illustrative Example Inventory Analyst Insurance Actuary

  9. Interpretations Structured Data Big Data Analytics Big Data Analytics Structured Data

  10. Users of Big Data • Device manufacturers, ERP providers, consulting firmscomprise 7 of top 10 users Big Data • Based on a survey conducted by Dell of large corporations in 2014… • 55% now follow Big Data strategy • 60% of Big Data projects involve a cloud • 32% involve real-time or near real-time processing • 22% use data lake • 20% of projects by outside consultants

  11. Hadoop • Free, Java-Based programming framework • Distributes storage and processes large data sets • Started from a Google File System paper published in October 2003 • Development was furthered by Apache • Named after Doug Cutting’s son’s toy elephant (logo!)

  12. When to Use (and Not Use) Hadoop YES! • Analytics • Search • Data Retention • Log File processing • Analysis of Text, Image, Audio, and Video Content • Recommendation systems like in E-Commerce Websites NO! • Low-latency or near real-time data access • Large number of small files to process • Multiple write scenarios requiring arbitrary writes between files

  13. Who Uses Hadoop?

  14. Hadoop Framework • Hadoop Common: Contains all the libraries and utilities • Hadoop Distributed File System (HDFS): Storage with high bandwith • Hadoop YARN: Resource-management platform • Hadoop MapReduce: Programming Model • for data processing

  15. HDFS

  16. MapReduce

  17. MapReduce Example

  18. MongoDB

  19. MongoDB = “The database for giant ideas” • Cross-platform document-oriented database • Open-source • “The database for giant ideas” • Founded in 2007 written to • handle specific problems with DoubleClick • Classified as NoSQL database

  20. MongoDB Example Also, we can practice! http://www.w3resource.com/mongodb-exercises/#PracticeOnline

  21. The Future of Big Data Analytics

  22. Any Questions?

More Related