1 / 3

MapReduce in Hadoop Framework

MapReduce serves two essential functions: it filters and parcels out work to various nodes within the cluster or map, a function sometimes referred to as the mapper, and it organizes and reduces the results from each node into a cohesive answer to a query, referred to as the reducer.

Download Presentation

MapReduce in Hadoop Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www.prwatech.in MapReduce in Hadoop Framework MapReduce is a core component of the Apache Hadoop software framework. Hadoop enables resilient, distributed processing of massive unstructured data sets across commodity computer cluster, in which each node of the cluster includes its own storage. MapReduce serves two essential functions: it filters and parcels out work to various nodes within the cluster or map, a function sometimes referred to as the mapper, and it organizes and reduces the results from each node into a cohesive answer to a query, referred to as the reducer. MapReduce Programs Work in two Phases: 1.Map phase 2.Reduce phase.

  2. www.prwatech.in Input to each phase are Key Value pairs. In addition, every programmer needs to specify two functions: Map Function and Reduce Function. How MapReduce Works: The original version of MapReduce involved several component daemons, including: JobTracker:- the master node that manages all the jobs and resources in a cluster; TaskTrackers:- agents deployed to each machine in the cluster to run the map and reduce tasks; and JobHistory Server:- a component that tracks completed jobs and is typically deployed as a separate function or with JobTracker. MapReduce Examples and Uses: The power of MapReduce is in its ability to tackle huge data sets by distributing processing across many nodes, and then combining or reducing the results of those nodes. As a basic example, users could list and count the number of times every word appears in a novel as a single server application, but that is time- consuming. By contrast, users can split the task among 26 people, so each takes a page, writes a word on a separate sheet of paper and takes a new page when they're finished. This is the map aspect of MapReduce. And if a person leaves, another person takes his or her place. This exemplifies MapReduce's fault-tolerant element. When all the pages are processed, users sort their single-word pages into 26 boxes, which represent the first letter of each word. Each user takes a box and sorts each word in the stack alphabetically. The number of pages with the same word is an example of the reduce aspect of MapReduce. There is a broad range of real-world uses for MapReduce involving complex and seemingly unrelated data sets. For example, a social

  3. www.prwatech.in networking site could use MapReduce to determine users' potential friends, colleagues and other contacts based on site activity, names, locations, employers and many other data elements. A booking website could use MapReduce to examine the search criteria and historical behaviors of users, and can create customized offerings for each. An industrial facility could collect equipment data from different sensors across the installation and use MapReduce to tailor maintenance schedules or predict equipment failures to improve overall uptime and cost-savings.

More Related