We are all aware of the growing size and variety of data. Can industries use their legendary traditional database systems to store and use all this data?
Cisco predicts that by 2014, the total internet traffic will be 4.8 Zeta bytes of data. Today, anything and everything is data, a like in Facebook; data from sensors; vibration data from manufacturing equipment; a comment, a share, a tweet; video feeds from CCTV; image files, audio files from phones and cameras etc. Such varied forms of data comprise about 80% of the total data available. Generating insights from this vast sources of data for Business strategies is unavoidable in the current times.
The traditional Relational Database systems like RDBMS were primarily designed for handling transaction data. They provide the required infrastructure for reliably storing and processing data that have structure, mainly transactional data.
Clearly storing and processing the emerging forms of big data needs a different design.
Take an example:
Consider a retail company.
Today the company may only be interested in the name and contact number of a customer. But few months down the line, they may be interested in more details like his purchases from the store, his activity in social media, his location, his occupation etc. And further later, they may be interested in further more details about the customer according to their business needs.
Traditional database models are all schema based, i.e. they require the data to have a structure to be stored and processed. For any data that has to be inserted, the user first has to specify a schema and then insert the data. It is difficult to predict the structure of the data in case of dynamically changing needs for data.
SQL models are a good fit for transactional data and data that have a well-defined structured. But with the advancement of Big data and unstructured data sources, traditional database models become very restrictive. Application developers have been frustrated with the impedance mismatch between the relational data structures and the in-memory data structures of the application.
No-SQL database models are not schema based and are of web scale. They do not impose the data to be stored to have a structure. Data is stored in Key-value pairs. Apart from being schema free, they are also intended to support easy replication and APIs. Hadoop/HDFS is Apache’s open source No-SQL database system. There are plenty of No-SQL data systems in the market. Different projects aimed at different aspects of BigData. Some database systems designed for Text and document type data, some for graph databases, some for media files etc. Cassandra, HyperTable, Accumulo, MongoDB are some of the popular ones.
Clearly, No SQL databases are highly suitable for the 21st century web estates. They are gaining importance mainly because they can run well on clusters and are schema-less that caters to the growing size and variety of Big Data.
Understanding Recommendation Engines
Big Data in Action- How Modak Analytics, Built India’s First Big Data-Based Electoral Data Repository