Hadoop is a cost effective solution for Big Data. We keep hearing this. But what is the real cost of Hadoop for Big Data Analytics? How economical is it compared to a traditional RDBMS system?
A typical Hadoop cluster is a collection of machines, each being the Master node or slave node or a client machine. Interestingly, unlike RDBMS systems, the machines used in the Hadoop cluster can be of commodity hardware and not necessarily enterprise class. The Hadoop software framework implements enough fault tolerance techniques to handle failures in commodity hardware.
Scaling a RDBMS might require upgrading the available hardware or buying more RDBMS servers of enterprise class. The cost of a RDBMS software is also usually very high.
However scaling a Hadoop system just requires adding more commodity hardware whose overall cost is going to be very much lesser than RDBMS systems.
Now that we keep saying Hadoop infrastructure is far more economical compared to RDBMS systems, how much cheaper is it exactly? Let’s get down to the numbers:
Cost of an RDBMS system for 1 TB of Data – $10,000 to $15,000
Cost of Hardware (a processor, a network card and few hard drives) for a Hadoop System – $4000
Clearly the difference is massive.
However this cost does not include the cost of software, maintenance cost, installation cost, employee salary etc. These costs are not negligible. Let’s make another estimate including these numbers as well to get a more realistic comparison.
Hadoop Systems
Assuming the cluster has 100 nodes, cost of each node is $4,000.
Hadoop qualified engineers are paid really high. Let’s assume on an average, an annual salary of $150,000/engineer.
Let’s assume that Apache open source free version of Hadoop is deployed.
Now, based on these assumptions, amortizing the cost for a period of 3 years, we get the following estimate per hour.
That comes out to an operational cost of about $32 per hour for the entire system.
RDBMS Systems
Assuming an RDBMS system of similar size.
An Oracle database machine with 168 TB of storage costs $650,000
Its software costs $1.68 million and hence the number $14,000/ TB.
Assuming the annual salary for an Oracle database administrator is $95,000.
Now, based on these assumptions, amortizing cost for a period of 3 year, we get the following estimate per hour.
That comes out to an operational cost of about $99 per hour for the entire system.
We see that the RDBMS systems are nearly 3 times costlier than the Hadoop system of similar size.
We have talked about how Big Data Solutions using Hadoop can save big bucks for you. However Hadoop is not necessarily a replacement for RDBMS systems. The RDBMS systems still has its strong place in transactional data management. Recommendation is to consider deploying Hadoop infrastructure along with existing Database management systems to exploit the best from both worlds.
Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.
Suggested Read:
A Two Minute Guide to the Top Three Hadoop Distributions
Machine Learning in Hadoop
Fill in the details to know more
Understanding the Staffing Pyramid!
May 15, 2023
From The Eyes Of Emerging Technologies: IPL Through The Ages
April 29, 2023
Understanding HR Terminologies!
April 24, 2023
How Does HR Work in an Organization?
A Brief Overview: Measurement Maturity Model!
April 20, 2023
HR Analytics: Use Cases and Examples
What’s the Relationship Between Big Data and Machine Learning?
November 25, 2022
What are Product Features? An Overview (2023) | UNext
November 21, 2022
20 Big Data Analytics Tools You Need To Know
October 31, 2022
Hypothesis Testing: A Step-by-Step Guide With Easy Examples
October 21, 2022
Biases in Data Collection: Types and How to Avoid the Same
What Is Data Collection? Methods, Types, Tools, and Techniques
October 20, 2022