html5-img
1 / 26

Engineering BIG DATA with HADOOP

This presentation explains about Introduction of Big Data with Hadoop.

SS_Reddy
Download Presentation

Engineering BIG DATA with HADOOP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BY International School of Engineering {We Are Applied Engineering} ENGINEERING BIG DATA WITH HADOOP Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not for any commercial business intention

  2. OVERVIEW • WHAT IS BIG DATA? • EXPLOSION OF DATA • DATA CONTRIBUTIONS • DATA EXPLOSION • WHO ARE THE PLAYERS? • BIG DATA–BIG PICTURE– LANDSCAPE • BIG DATA– ENTERPRISE ROLES • WHAT IS HADOOP? • EVOLUTION OF HADOOP • HADOOP ECOSYSTEM • HADOOP ECOSYSTEM MAP • HADOOP: 30,000 FEET VIEW • BIG DATA & ANALYTICS Case studies • VIDEO OF HADOOP ECOSYSYTEM

  3. WHAT IS BIG DATA? • High-volume, high-velocity and high- variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making. -Gartner HIGH VOLUME HIGH VARIETY HIGH VELOCITY

  4. EXPLOSION OF DATA

  5. Source: http://www.emc.com/leadership/digital-universe/iview/index.htm

  6. DATA CONTRIBUTIONS

  7. DATA EXPLOSION

  8. Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf

  9. Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf

  10. WHO ARE THE PLAYERS?

  11. BIG DATA–BIG PICTURE– LANDSCAPE

  12. BIG DATA– ENTERPRISE ROLES

  13. INTRODUCTION TO

  14. WHAT IS HADOOP? • Flexible Structured/Unstructured Text/Binary Schema/Schema less • 100% Open Source • Scalable – Petabytes of Data – Thousands of Nodes Source: http://cloudtimes.org/2013/06/25/hadoop-as-a-service-market-growing/

  15. EVOLUTION OF HADOOP How does an Elephant Sneak up on you?

  16. HADOOP ECOSYSTEM Chukwa Sqoop Zookeeper Pig Avno HBase Mahout Flume Map Reduce Engine Whirr Hama Hadoop Distributed File System Hive Hadoop Common

  17. HADOOP ECOSYSTEM MAP Source: http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/

  18. Hadoop Evolution – Map Explained! • How did it all start- huge data on the web! • Nutch built to crawl this web data • Huge data had to be saved- HDFS was born! • How to use this data? Map reduce framework built for coding and running analytics – java, any language-streaming (Hadoop streaming) • How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs  – fuse,webdav, chukwa, flume, Scribe • Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!

  19. Continued • High level interfaces required over low level map reduce programming– Pig, Hive, Jaql • BI tools with advanced UI reporting- drilldown etc- Intellicus • Workflow tools over Map-Reduce processes and High level languages: Oozie • Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere, eclipse plugin, cacti, ganglia • Support frameworks- Avro (Serialization), Zookeeper (Coordination) • More High level interfaces/uses- Mahout, Elastic map Reduce • OLTP- also possible – Hbase

  20. HADOOP: 30,000 FEET VIEW • Distribute data initially • Let processors / nodes work on local data • Minimize data transfer over network • Replicate data multiple times for increased availability • Write applications at a high level • Programmers should not have to worry about network programming, temporal dependencies, low level infrastructure, etc • Minimize talking between nodes (share-nothing)

  21. Case Studies BIG DATA & ANALYTICS

  22. YAHOO - PERSONALIZATION

  23. YAHOO SEARCH ASSIST

  24. For Detailed Description of HADOOP ECOSYSTEM components checkout our video on

  25. International School of Engineering Plot no 63/A, 1st Floor, Road No 13, Film Nagar, Jubilee Hills, Hyderabad-500033 For Individuals (+91) 9502334561/62 For Corporates (+91) 9618 483 483 Facebook: www.facebook.com/insofe Slide share: www.slideshare.net/INSOFE

More Related