1 / 30

100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀

100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀. Outline. Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion. What is Hadoop ?. open-source software framework process and store big data Easy to use and implement, economic, flexible

annis
Download Presentation

100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

  2. Outline • Introduction • Architecture of Hadoop • HDFS • MapReduce • Comparison • Why Hadoop • Conclusion 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  3. What is Hadoop ? • open-source software framework • process and store big data • Easy to use and implement, economic, flexible • lots of nodes(server) • written in JAVA • free license • created by Doug Cutting and Mike Cafarella in 2005 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  4. Advantages of Interpreted Language • Cross-platform(ex: Windows, Ubuntu, Mac OS X) • smaller executable program size • easier to modify during both development and execution 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  5. Architecture of Hadoop 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  6. Hadoop in Enterprise The Dell representation of the Hadoop ecosystem. 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  7. Hadoop in Enterprise 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  8. Who is using Hadoop ? more than half of the Fortune 50 uses Hadoop by 2013 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  9. HDFS • Hadoop Distributed File System • Client: user • name node: manage and store metadata, namespace of files • Data node: store files • each data node sends its status to name node periodically 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  10. HDFS: Writing data in HDFS • Each file will be divided into blocks(in size 64 or 128MB) , and have three copies in different data nodes. • Client asks name node to get a list of data node sorted by distance, and send the file to the nearest one , then the data node will send the file to the rest node. • When above operation done, data node will send “done” to name node. 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  11. HDFS: Reading data in HDFS • Client send filename to the name node , then the name node will send a list of the blocks of files sorted by distance. • Client use the list to get the file from data node. 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  12. HDFS: failure • node failure • communication failure • data corruption 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  13. HDFS: handle failure • Handle writing failure:name node will skip the data node without an ACK. • Handle reading failure:recall that when reading a file, client will get a list of data node content the file. 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  14. HDFS: handle failure • Name node handle node failure : name node will find out the data the failure node have, and copy those data from others and restore them to other data node. • Note that HDFS can’t guarantee at least one copy of data is alive. 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  15. MapReduce • similar to divide-and-conquer • First, use “Map” to divide tasks • Second, use “Shuffle” to “transfer the data from the mapper nodes to a reducer’s node and decompress if needed. “ • Third, use “Reduce” to “execute the user-defined reduce function to produce the final output data. “ 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  16. MapReduce-Map 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  17. MapReduce-shuffle 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  18. MapReduce-Reduce 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  19. MapReduce 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  20. Comparison 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  21. Comparison 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  22. Why Hadoop? technically Comparison of Grep Task Result with Vertica and DBMS-X 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  23. Why Hadoop? Simple structure vs. Optimization Transaction time not minimized Lower performance with same number of nodes No compelling reason to choose Hadoop technically 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  24. Why Hadoop? commercially 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  25. Why Hadoop Cheap (Buy more servers to beat DBMS) Flexible (Both in design and deployment) Easier to design Easier to scale up Combine with other system to achieve better performance commercially 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  26. Conclusion • Hadoop is much easier for users to implement and more economic • MapReduce advocates should study the techniques used in parallel DBMSs • Hybrid systems are also popular • With improvement of performance, we believe Hadoop will lead the trend of big data computing 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  27. Reference • http://hadoop.apache.org/ • http://www.runpc.com.tw/content/cloud_content.aspx?id=105318 • http://en.wikipedia.org/wiki/Apache_Hadoo • https://www.facebookbrand.com/ • http://assets.fontsinuse.com/static/use-media-items/15/14246/full-2048x768/522903b7/Yahoo_Logo.png • http://wiki.apache.org/hadoop/PoweredBy • http://semiaccurate.com/assets/uploads/2011/09/Amazon-logo.jpg • http://www.conceptcupboard.com/blog/wp-content/uploads/2013/09/google.jpg 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  28. Reference • http://datashieldcorp.com/files/2013/11/adobe-LOGO-2.jpg • http://upload.wikimedia.org/wikipedia/commons/7/77/The_New_York_Times_logo.png • http://i.dell.com/sites/content/business/solutions/whitepapers/en/Documents/hadoop-introduction.pdf • http://hadoop.intel.com/pdfs/IntelDistributionReferenceArchitecture.pdf • http://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.classcloud.org%2Fcloud%2Fraw-attachment%2Fwiki%2FHinet100402%2F02.HadoopOverview.pdf&ei=IE2XUtLfBMfxiAea_oHQCA&usg=AFQjCNFoIXxLJrOnoul4cKJpQ8v3_kuTYg 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  29. Reference • http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Hadoop-Deployment-Comparison-Study.pdf • https://www.google.com.tw/url?sa=t&rct=j&q&esrc=s&source=web&cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fwww.psgtech.edu%2Fyrgcc%2Fattach%2FMAP%2520REDUCE%2520PROGRAMMING.ppt&ei=7lGXUtvCJsy5iAfWtYH4Bw&usg=AFQjCNGWRKJLal-tvbvORULZV6_Te2y74g&sig2=Ba77ihsV1SEqcNeEFkRzfg • https://www.cs.duke.edu/starfish/files/hadoop-models.pdf • http://dotnetmis91.blogspot.tw/2010/04/hdfs-hadoop-mapreduce.html • http://wiki.apache.org/hadoop/HDFS • http://www.ewdna.com/2013/04/Hadoop-HDFS-Comics.html 100062108 李智宇、100062116 林威宏、100062220 施閔耀

  30. Reference • http://en.wikipedia.org/wiki/Interpreted_language • A Comparison of Approaches to Large-Scale Data Analysis by Sam Madden • http://www.cc.ntu.edu.tw/chinese/epaper/0011/20091220_1106.htm • http://web.cs.wpi.edu/~cs561/s12/Lectures/6/Hadoop.pdf • http://www.mobilemartin.com/mobile/show-me-the-mobile-money.jpg 100062108 李智宇、100062116 林威宏、100062220 施閔耀

More Related