1 / 18

Big Data and Hadoop

http://www.learntek.org/product/big-data-and-hadoop/ Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional. www.learntek.org

Learntek1
Download Presentation

Big Data and Hadoop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIG DATA

  2. The following topics will be covered in our BIG DATA Online Training: Copyright @ 2015 Learntek. All Rights Reserved.

  3. What is Hadoop? • Big Data Hadoop Training: Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative. Copyright @ 2015 Learntek. All Rights Reserved.

  4. Why Hadoop? • Large Volumes of Data: Ability to store and process huge amounts of variety (structure, unstructured and semi structured) of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that’s a key consideration. • Computing Power: Hadoop’s distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have. • Fault Tolerance: Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically. • Flexibility: Unlike traditional relational database, you don’t have to process data before storing it, You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos etc. • Low Cost: The open-source framework is free and used commodity hardware to store large quantities of data. • Scalability: You can easily grow your system to handle more data simply by adding nodes. Little administration is required. Copyright @ 2015 Learntek. All Rights Reserved.

  5. Big Data Hadoop Training: Hadoop Introduction Copyright @ 2015 Learntek. All Rights Reserved.

  6. Hadoop Installation Copyright @ 2015 Learntek. All Rights Reserved.

  7. Hadoop Distribute File System (HDFS) Copyright @ 2015 Learntek. All Rights Reserved.

  8. Map Reduce Programming Copyright @ 2015 Learntek. All Rights Reserved.

  9. Hive Copyright @ 2015 Learntek. All Rights Reserved.

  10. Pig • Pig basics • Install and configure PIG on a cluster • PIG Library functions • Pig Vs Hive • Write sample Pig Latin scripts • Modes of running PIG • Running in Grunt shell • Running as Java program • PIG UDFs Copyright @ 2015 Learntek. All Rights Reserved.

  11. HBase Copyright @ 2015 Learntek. All Rights Reserved.

  12. Sqoop • Install and configure Sqoop on cluster • Connecting to RDBMS • Installing Mysql • Import data from Mysql to hive • Export data to Mysql • Internal mechanism of import/export Copyright @ 2015 Learntek. All Rights Reserved.

  13. Oozie • Introduction to OOZIE • Oozie architecture • XML file specifications • Specifying Work flow • Control nodes • Oozie job coordinator Copyright @ 2015 Learntek. All Rights Reserved.

  14. Flume • Introduction to Flume • Configuration and Setup • Flume Sink with example • Channel • Flume Source with example • Complex flume architecture Copyright @ 2015 Learntek. All Rights Reserved.

  15. ZooKeeper • Introduction to ZooKeeper • Challenges in distributed Applications • Coordination • ZooKeeper : Design Goals • Data Model and Hierarchical namespace • Cilent APIs Copyright @ 2015 Learntek. All Rights Reserved.

  16. YARN Copyright @ 2015 Learntek. All Rights Reserved.

  17. Prerequisites : • Knowledge in any programming language, Database knowledge and Linux Operating system. Core Java or Python knowledge helpful. Copyright @ 2015 Learntek. All Rights Reserved.

  18. Copyright @ 2015 Learntek. All Rights Reserved.

More Related