Star schema is a traditional database schema with a central table that can be schematically represented in the form of a star. It consists of fact tables and dimensions tables. Fact tables measure numbers, whereas dimension tables give more context to the fact table.
It is very simple in structure. Star schema can also be defined as a fact table that is surrounded by dimension tables, and the dimensions are in a normalized form that means it has a lot of data redundancy, and this can sometimes cause data integrity and storage issues as well.
Star schema is used in real-time applications as they save a lot of memory. It involve several layers of summarization to recover information faster. Star schema is widely used as an approach to develop data warehouses and dimensional data marts.
In this article let us look at:
The whole concept of star schema data warehousing and OLAP (Online Analytical Processing) system is to analyze the data and to derive meaningful information, so fact tables are those tables that hold the data which has to be analyzed for meaningful information. Analysis plays a vital role, and in facts tables, the numbers are given much significance, which is stored in it. A facts table in a star schema constitutes two major attributes or columns that include fact and foreign key of dimensions.
The fact table can also be defined as a centralized table in a star schema that is surrounded by the dimension table with all the primary key, and these dimensions act as foreign keys which interprets the grain of a fact table, the grain of the fact table is very important as it helps with the unique identification of a row in a fact table. The fact table also has a bunch of major columns that give meaning in full data.
To understand this, we can take e-commerce websites as a real-life scenario-based example. We can break this theory with the help of another real-life example as well. Let say that a person has opened a new grocery store in an area, and that person has hired few employees. After the business started generating sales, the owner would like to know the transactions made in the business, about employees work, the kind of products sold in the last month and what is revenue generated?
Whether the revenue exceeds employees’ salary and expenses that will lead to an indication of whether the business is into profits or losses, and this can happen with the help of two properties or columns, which includes the foreign key of dimensions.
It contains the descriptive measures which make the facts meaningful. Fact tables only have the keys of the dimension tables, whereas dimension tables have the information and details. For example, there is an employee of a company, and the dimension table shows the information about his/her office name and all the details like their phone number, Which team they work for etc.
Two words, ‘slice’ and ‘dice’, are heard synonymously. To provide the outcomes of a particular task, work, or a company’s performance, like profit/loss, we need to slice the data. Slicing is like filtering the data. Dicing involves the aggregation of data or cubing of data.
Following are the features of a star schema.
Snowflake is much similar to a star design as a star schema. The only exception is the dimension table in the snowflake design. In the snowflake design, the dimension tables are normalized. The star design is a real normalized design, but in the case of snowflake design, it can have normalized dimension tables. We can design complex star schema designs by normalizing the dimension table into several tables.
|Normalized dimension tables
|Pure de-normalized dimension tables
|Less maintenance due to less redundancy
|More maintenance due to more redundancy
Star schema converts process data into facts which holds measurable quantitative data about a business. A star schema is also called a centipede scheme for its coverage of many dimensions.
If you are interested in making a career in the Data Science domain, our 11-month in-person Postgraduate Certificate Diploma in Data Science course can help you immensely in becoming a successful Data Science professional.