1 / 30

excelonlineclasses.co.nr/ excel.onlineclasses@gmail

http://www.excelonlineclasses.co.nr/ excel.onlineclasses@gmail.com. http://www.excelonlineclasses.co.nr/. Excel Online Classes offers following services:. Online Training Development Testing Job support Technical Guidance Job Consultancy Any needs of IT Sector.

elma
Download Presentation

excelonlineclasses.co.nr/ excel.onlineclasses@gmail

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://www.excelonlineclasses.co.nr/ excel.onlineclasses@gmail.com http://www.excelonlineclasses.co.nr/

  2. Excel Online Classes offers following services: • Online Training • Development • Testing • Job support • Technical Guidance • Job Consultancy • Any needs of IT Sector http://www.excelonlineclasses.co.nr/

  3. HADOOP Nagarjuna K http://www.excelonlineclasses.co.nr/

  4. Why and What Hadoop ? • A tool to process big data http://www.excelonlineclasses.co.nr/

  5. What is BIG Data ? • Facebook, Google+ etc., • Machines too generate lots of data • We are having a online discussion now , certainly how many of us are in this conference will also be recorded as data. http://www.excelonlineclasses.co.nr/

  6. What is BIG Data ? ..continued • Exponential growth of data  challenges to Google, Yahoo, Microsoft, Amazon • Need to go through TBs and PBs of data ? • Which websites and books were popular ? • What kind of Ads appeal to them ? • Existing tools became inadequate to process such large data sets. http://www.excelonlineclasses.co.nr/

  7. Why is the data so BIG ? • Till Couple of decade back  Floppy disks • From then on  CD/DVD Drives • Half a decade back  Hard drives (500 GB) • Now  Hard Drives(I TB) are available in abundance http://www.excelonlineclasses.co.nr/

  8. Why is the data so BIG ? • So WHAT ? • Even the technology to read has taken a leap. http://www.excelonlineclasses.co.nr/

  9. Why is the data so BIG ? http://www.excelonlineclasses.co.nr/

  10. How to handle such BIG ? http://www.excelonlineclasses.co.nr/ • BIG elephant • Numerous small chicken ?

  11. How to handle such BIG ? • Concept of Torrents • Reduce time to read by reading it from multiple sources simultaneously. • Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in less than two minutes. http://www.excelonlineclasses.co.nr/

  12. How to handle such BIG ? -- Issues • How to handle a system up and downs ? • How to combine the data from all the systems ? http://www.excelonlineclasses.co.nr/

  13. Problem1 : System’s Ups and Downs • Commodity hard ware for data storage and analysis • Chances of failure are very high • So, have a redundant copy of the same data across some machines • In case of eventuality of one machine, you have the other • Google came up with a file system  GFS (Google File System) which implemented all these details. http://www.excelonlineclasses.co.nr/

  14. GFS • Divides data into chunks and stores in the file System • Can store data in ranges of PBs also http://www.excelonlineclasses.co.nr/

  15. Problem 2 : How to combine the data ? • Analyze data across different machines , But how do we merge them to get a meaningful outcome ? • Yes, all (some) of the data has to travel across network. Then only merging of the data can occur. • Doing this is notoriously challenging • Again Google  Map—Reduce http://www.excelonlineclasses.co.nr/

  16. Map Reduce • Provides a programming model  abstracts the problem of disk reads and writes transforming in to a computation of keys and values. • Two phases • Map • Reduce http://www.excelonlineclasses.co.nr/

  17. So what is Hadoop ? • An operating system ? • Provides • A reliable shared storage system • Analysis system http://www.excelonlineclasses.co.nr/

  18. History of Hadoop • Google was the first to launch GFS and MapReduce • They published a paper in 2004 announcing the world a brand new technology • This technology was well proven in Google by 2004 itself MapReduce paper by Google http://www.excelonlineclasses.co.nr/

  19. History of Hadoop • Doug Cutting saw an opportunity and led the charge to develop an open sourceversion of this MapReduce system called Hadoop . • Soon after, Yahoo and othersrallied around to support this effort. • Now Hadoop is core part in : • Facebook, Yahoo, LinkedIn, Twitter … http://www.excelonlineclasses.co.nr/

  20. History of Hadoop • GFS  HDFS • MapReduce  MapReduce http://www.excelonlineclasses.co.nr/

  21. HDFS -- A Brief Design  Streaming very large files on commodity cluster. • Very Large Files • MBs to PBs • Streaming • Write once read many approach • After huge data being placed  We tend to use the data not modify it • Time to read the whole data is more important • Commodity Cluster • No High end Servers • Yes, high chance of failure (But HDFS is tolerant enoguh) • Replication is done http://www.excelonlineclasses.co.nr/

  22. MapReduce -- A Brief • Large scale data processing in parallel. • MapReduce provides: • Automatic parallelization and distribution • Fault-tolerance • I/O scheduling • Status and monitoring • Two phases in MapReduce • Map • Reduce http://www.excelonlineclasses.co.nr/

  23. MapReduce -- A Brief • Map phase • map (in_key, in_value) -> list(out_key, intermediate_value) • Processes input key/value pair • Produces set of intermediate pairs • Reduce Phase • reduce (out_key, list(intermediate_value)) -> list(out_value) • Combines all intermediate values for a particular key • Produces a set of merged output values (usually just one) http://www.excelonlineclasses.co.nr/

  24. http://www.excelonlineclasses.co.nr/ MapReduce -- A Brief

  25. http://www.excelonlineclasses.co.nr/ Hadoop Cluster

  26. Hadoop Ecosystems

  27. Version of Hadoop • We will deal with either of • Apache hadoop-0.20 • Clouderahadoop - cdh3 http://www.excelonlineclasses.co.nr/

  28. Pre-Requisites • Core-Java • Acquaintance with LINUX will help. • For better experience :- Linux installation on your machines. http://www.excelonlineclasses.co.nr/

  29. Thank you  • Your feedback is highly important to improve our course material and teaching methodologies. • Please email your suggestions to excel.onlineclasses@gmail.com nagarjuna@outlook.com http://www.excelonlineclasses.co.nr/

  30. Disclaimer • Excel Online classes acknowledges the proprietary rights of the trademarks and product names of other companies mentioned in any of the training material including but not limited to the handouts, written material, videos, power point presentations, etc. All such training materials are provided to our students for learning purposes only. Students shall not use such materials for their private gain nor can they sell any such materials to a third party. Some of the examples provided in any such training materials may not be owned by us and as such we does not claim any proprietary rights for the same. We does not guarantee nor is it responsible for such products and projects. We acknowledges that any such information or product that has been lawfully received from any third party source is free from restriction and without any breach or violation of law whatsoever. http://www.excelonlineclasses.co.nr/

More Related