The volume of data generated and the speed with which it is generated has grown rapidly since the early 2000s, primarily driven by the ever-reducing cost of storing more and more data into smaller and smaller devices. This was obviously not considered when the early database management systems (DBMS) or even Relational DBMS were designed. Even Data warehouses started getting overwhelmed with such an influx of data. In this article, we will learn more about Big Data Architecture.
Big Data usually refers to data sets that are massive in size, massive enough to easily overwhelm a traditionally used technology in data capture, data management and data processing. Before Big Data took shape, the idea of data was of a structured data set with well-defined sources, formats, and processing techniques. Post Big Data, data refers to anything that the human brain can use to derive value out of, thus including all kinds of data like structured, semi-structured and unstructured.
In this article let us look at:
There is no definition per se for Big Data. Big data, in short, is a technique used to solve business or worldly problems with the help of all the data that makes up the business or the specific context of a worldly problem, which includes data in any form, and data at any rate. A popular description of Big Data uses the 3, 4 or 5 V’s to describe the characteristics of data being consumed by the Big Data ecosystem. The V’s are Volume, Variety, Velocity, Veracity and Variability. In today’s times, when Big Data is mentioned, it often is equated with advanced forms of data analytics like data modelling and predictive analytics.
Big Data’s challenges keep changing depending on the context or the ecosystem and include capturing, analysis, storage technology, visualization, sharing, transfer, and privacy concerns. Multiple technologies partake in the Big Data model to handle the V’s mentioned earlier as efficiently as possible. For structured data, RDBMS or distributed RDBMS is used. For unstructured data, NoSQL databases like MongoDB is used. For data ingestion Apache Flume and other specialized tools are used. One of the prominent and well known, Big Data frameworks is Hadoop. There are other custom implementations of the Hadoop ecosystem from the market’s major players like Amazon, Microsoft, Oracle and IBM.
We shall quickly run through a few of the top applications of Big Data in the real world before we dive into Big Data Architecture.
The banking industry uses Big Data for risk analysis, anti-money laundering and other such financial frauds, so such frauds and quickly detected and mitigated. Banks use risk analysis to analyse the creditworthiness of a potential customer. In many countries, securities and Exchange boards use Big Data to perform Network analytics and Natural Language Processing to catch illegal trading activities.
Media content is driven by analysing customer data at Big Data’s scale to understand behavioural patterns, likes and dislikes of a broad set of customers. YouTube is a big example of Big Data at work, driving ad revenues using the information derived by Big Data analytics.
Big Data is being used to deliver evidence-based diagnosis instead of a battery of medical tests, thus bringing down costs and improving efficiency in medical circles. Big Data is also used in Machine Learning models to detect a condition based on x-ray images, thus saving valuable time and lives.
Big Data today is extensively used in manufacturing for Supply Chain Management, Predictive Maintenance, Predictive Quality, Production Forecasting, improving throughput and yield and more.
Big Data helps insurance companies in minimizing underwriting risks and improving fraud detection in claims.
The biggest application of big data is probably in the retail and e-commerce industry, with data analysis from various streams, including social media, for producing targeted adverts. Sentiment analysis also plays a major role in gauging customer feedback on all products, thus quickly fixing any issues before it turns out to be a major loss.
Let’s now dive straight into Big Data Architecture.
There are several Big Data products on the market, but you still have to design the system to suit your business’s particular needs. You will need a Big Data Architect to design your Big Data solution catering to your unique business ecosystem. Big Data has a generic architecture that applies to most businesses at a high level, and it is not necessary that you need all of the components used for successful implementation.
To start with, Big Data is known to have at least 6 layers to its architecture. They are
You need your Big Data setup to handle all incoming data streams, whether structured, unstructured or semi-structured and at speeds that match the rate at which data is coming in. This is achieved at the Data Ingestion Layer. The incoming data is prioritized and categorized for a smooth flow into further layers down the line.
The data collector layer is concerned with the transportation of data from the data ingestion layer to the rest of the pipeline. Here the components are decoupled to allow analytical processing.
The processing layer where the analytical process begins, where data is needed for analysis is selected, cleaned, formatted for further analysis and modelling.
This layer is critical to Big Data. After all, it is all about data. The volume of data and the velocity of data directly impact Data Storage Layer. The storage solution should be in line with the data ingestion requirement of your business ecosystem.
This layer is where active analytical processing of data takes place.
This layer is everything to do with a graphical representation of information and value gained through analysis. Using rich charts, graphs and maps, the tools in this layer help present a compelling story for a decision to be made by your leadership team.
This layer involves Data Profiling and Lineage, Data Quality, Data Cleansing, Data Loss Prevention.
Here is a representation of Big Data Architecture with just the Big Data components shown.
There are several design patterns that are like templates that you can select for your business. These design patterns are based on the layer in context. We shall mention a few patterns in the ingestion, data storage layers.
Data Source and Ingestion layers
A few design patterns exist for this layer, namely,
Data Storage Layer
With ACID (atomicity, consistency, isolation, and durability), BASE (basically available, soft state, eventually consistent) and CAP (consistency, availability, and partition tolerance) paradigms, several design patterns have been built for the storage layer, namely,
There are a few more design patterns available that you may attempt to explore.
Here we list out a few well-known examples of Big Data Architecture shaped according to the business ecosystem they were developed for and evolved into their current state.
This kind of Big Data architecture allows for real-time data ingestion of critical business data that needs to be taken care of or responded to in real-time. The velocity and variety of data here is the key pivot around which this architecture has evolved.
This kind of Big Data Architecture provides generic storage and processing capabilities that are applicable in most businesses.
Big Data architectures of this kind stress dealing with data coming in at high velocity, is high in volume, and is coming in a variety of types (structured, unstructured or semi-structured).
A Big Data Architecture inspired by an enterprise Datawarehouse that stores a separate database of historical data for years, using it for analytical purposes.
An architecture that allows data to be left “in place” in a low-cost storage engine where ad-hoc queries can be run without the need for separate and expensive clusters.
Without any doubt, it can be said that Big Data will be the technology that businesses run on, be it on-premises or on the cloud. There is no denying Big Data is the technology that facilitates Machine Learning and Artificial intelligence and is a backend skill that will be sought after in every industry and business in the near future.
Big data analysts are at the vanguard of the journey towards an ever more data-centric world. Being powerful intellectual resources, companies are going the extra mile to hire and retain them. You too can come on board, and take this journey with our Big Data Specialization course.