I have often come across this question – at times as a direct question from few of my colleague and also at times as a point of discussion while designing business intelligence system for the clients.
Data warehousing is the buzzword for the past two decades and big data is a hot trend in the recent decade. Let’s find out what could be the answer for this question.
Obviously, first thought for anyone who is technically not much deep into these technologies is that recent big data will replace older data warehousing. An additional reason for this simple thinking is the similarities they offer:
But still, Big data and Data warehouse are not interchangeable. Why?
Data Warehousing is extracting data from one or more homogeneous or heterogeneous data sources, transforming the data and loading that into a data repository to do data analysis which helps in taking better decisions to improve one’s performance and can be used for reporting.
Data repository generated from the process as mentioned is nothing but the data warehouse.
Big data refers to volume, variety, and velocity of the data. How big is the data, the speed at which it is coming and a variety of data determines so-called “Big Data”. The 3 V’s of the big data was articulated by industry analyst Doug Laney in the early 2000s.
Both the above look similar but there is a clear difference. Big data is a repository to hold lots of data but it is not sure what we want to do with it, whereas data warehouse is designed with the clear intention to make informed decisions. Further, a big data can be used for data warehousing purposes.
Big data and data warehouse are two different things, it is like comparing apple to an orange.
A technology, such as big data, is a means to store and manage large amounts of data. Organizations make use of various big data solutions to store a large volume of data at a lower cost.
Whereas a data warehouse is a framework to organize data to give a single version of the truth. Typically, a data warehouse is built to consolidate data from varied sources and organize them in an easily readable way. There is a data lineage capability that helps trace the origin of the data.
As evident from the important differences between big data and data warehouse, they are not the same and therefore not interchangeable. Therefore big data solution will not replace data warehouse. An organization can have any combination as below depending on the need(not because they are similar):
This is a guest post by Manjunath Hegde, who has over a decade’s experience in Business Intelligence and working with analytics related technologies. He is currently enrolled in the Executive Program in Business Analytics by Jigsaw Academy and MISB Bocconi.