It’s all about Map Reduce – Part-2

Reducer Phase:

Record writers, a function in the reducer phase of processing, take the output from the mapper phase and process the data to create the final output, which is then stored in an output file within the Hadoop Distributed File System (HDFS).

Shuffle and sort phases run concurrently, merging outputs as they are being retrieved.

Note that, each job’s number of reducers can be customized and defined in the mapred-site.xml configuration file.

RecordReader:
By interacting with the InputSplit, this transforms the data into key-value pairs that the mapper can read.

Combiner:
This phase is optional; it functions as a little reduction. Data from the map jobs are received by the combiner, which then processes it before sending its results to the reducer phase.

Partitioner:
How many reduced tasks will be used to summarize the data is decided by the partitioner. Additionally, it regulates the partitioning of the keys in the intermediate map outputs and verifies how combiner outputs are transmitted to the reducer.

Here is an illustration of how mapreduce and its components process data.

mapreduce_workflow