It’s all about Map Reduce – Part-2

Click here for the previous part.

Reducer Phase:

Record writers, a function in the reducer phase of processing, take the output from the mapper phase and process the data to create the final output, which is then stored in an output file within the Hadoop Distributed File System (HDFS).

Shuffle and sort phases run concurrently, merging outputs as they are being retrieved.

Note that, each job’s number of reducers can be customized and defined in the mapred-site.xml configuration file.

RecordReader:
By interacting with the InputSplit, this transforms the data into key-value pairs that the mapper can read.

Combiner:
This phase is optional; it functions as a little reduction. Data from the map jobs are received by the combiner, which then processes it before sending its results to the reducer phase.

Partitioner:
How many reduced tasks will be used to summarize the data is decided by the partitioner. Additionally, it regulates the partitioning of the keys in the intermediate map outputs and verifies how combiner outputs are transmitted to the reducer.

Here is an illustration of how mapreduce and its components process data.

mapreduce_workflow

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s