Why MapReduce is slow?

Why MapReduce is slower than the other processing frameworks is a common question.

When it comes to data processing, it is batch-oriented. The mapper and reducer functions to process the data in this case must be provided under any circumstances.

Instead of handling tiny datasets, MapReduce is made to handle massive datasets. Because of the number of phases, the time needed is roughly the same regardless of how big or little the assignment is. This increases delay since MapReduce takes a long time to complete these operations. MapReduce spreads out the processing of the data throughout the cluster, which slows down and extends the processing time.

Create & Execute your First Hadoop MapReduce Project in Eclipse | by Ojas  Gupta | DataX Journal | Medium

Any output that the mapper function produces during processing is written to HDFS and the underlying disks. Before being picked up for the reduction stage, this data will first be shuffled and sorted. MapReduce is a longer procedure because it requires writing data to HDFS and retrieving it from HDFS.

While MapReduce processes data on disk, other frameworks like Spark process and keep data in memory for later phases. As a result, MapReduce processes data 100 times slower than Spark for smaller workloads.

In addition to the previously mentioned issues, MapReduce uses the Java programming language, which is challenging to program because to its enormous line count.

Please inform us if you find any further points.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s