The big daddy of big data – Google – bids goodbye to MapReduce in 2014-15. It was the Google guys who inspired the development of Hadoop with core parallel processing engine as MapReduce. This definitely sounds like a death knell for MapReduce and Hadoop, where it immediately forces to turn our heads towards SPARK. However, let’s try to understand the real story.
Almost every article on Spark on the web lambasts MapReduce, citing the performance of SPARK to be 100x faster than MapReduce and making MapReduce and Hadoop appear puny. The reality is that the best-case performance of SPARK is 100x better than that of MapReduce in most idealistic scenarios. In the worst case, it’s 3x faster than MapReduce. One of the key reasons for this is the fact that MapReduce extensively depends on disc I/O operations, which is slower when compared to the in-memory based operations of Spark. Well, this is not the end of the story.
The speed of processing is not the only USP of Spark but speed and flexibility in a single package are. Flexibility here refers to the ability to perform both batch-oriented jobs as well as interactive & iterative workloads incorporating machine learning. These are the areas where MapReduce severely lacks and takes a backseat, restricting it to only for batch-oriented jobs – something close to traditional data warehousing kind of applications involving ETL operations predominantly.
It’s obviously not doom and gloom days for Hadoop and MapReduce but it would stay afloat for quite some time; after all, the mainframe systems (the grandfather of computers) are still in use in major Wall Street financial trading companies. The advent of Spark simply has shifted the momentum away from Hadoop & MapReduce and the real focus is more towards fast & interactive real-time analytics involving streaming data. The promising candidates to look out in this space would include SPARK, STORM & KAFKA and definitely not MapReduce which was certainly not built for this.
Suggested ReadsÂ
Who wins the battle between Hadoop and Spark?
Fill in the details to know more
How To Use the Pivot Table in Excel ?
May 12, 2023
Role of Cost in Pricing of the Product!
April 18, 2023
What Is Data Visualization in Excel?
April 14, 2023
What Are Databases and Tables in SQL?
March 24, 2023
It’s Raining Opportunities In Cloud Computing!Â
March 23, 2023
Product Management – With Great Power Comes Great Responsibility!