With the passage of the 1990s and the introduction of data mining, the need for a common methodology to integrate lessons learned intensified. The CRISP-DM, CRoss Industry Standard Process for Data Mining, is a process that was codified over the course of less than a year by two of the leading software providers of the 1990s and the introduction of data mining – Teradata and SPSS. It was established by three early adopter corporations, OHRA, NCR, and Daimler, in 1996. It wasn’t the first to use CRISPR-DM.
There was also a version called SEMMA – Sample, Explore, Modify, Model, Assess developed by SAS Institute. However, CRISP-DM became increasingly popular within just a year or two of getting adopted by practitioners.
Planning a data mining project can be structured using the CRISP-DM model and methodology. The methodology has a good track record of being robust and reliable. Although we do not own it, as professionals, we tout its usefulness for solving thorny business problems, flexibility, and practicality. Almost every client engagement has this golden thread running through it.
There are several elements in this model that are idealized. Some tasks can be performed in a different order already in practice, and there is often the need to backtrack and repeat certain actions to complete the tasks in the right order. There is no attempt to capture all possible routes through the process of mining data using this model.
As a basis for the Data Science process, the CRoss Industry Standard Process for Data Mining (CRISP-DM) has been developed as a process model that serves as the basis of a Data Science method. Six phases are involved in the process:
Business Understanding
An understanding of the project’s objectives and requirements forms the basis of the Business Understanding phase. Apart from the third task, most projects require the following project management activities as foundational actions:
Teams often skip this phase, but it is essential for establishing a solid foundation for success.
Data Understanding
The next phase is Data Understanding. Identifying, collecting, and analyzing the data sets that can help you achieve the project goals enhances Business Understanding. There are four tasks in this phase:
Data Preparation
It is during this stage of the project you decide which data you will use for the purpose of analysis to complete your project. It may be useful for you to consider a few factors when making this decision, such as whether the data is relevant to your data mining goals, the quality of the data, as well as technical limitations such as volume and type limits for the data sets. Data selection also pertains to selecting columns (columns) in a table and selecting records (rows) within the table.
Modeling
Models based on several different modeling techniques are likely to be built and assessed here. In spite of the CRISP-DM guide’s recommendation to “iterate model building and assessment until you have a clear understanding of your chosen model(s),” teams should continue iteration until they find a model that is “good enough,” proceed with the CRISP-DM lifecycle and then enhance the model further in future iterations.
Evaluation
The analyst builds and selects models that appear to have high process quality. The analyst then tests it to ensure that they can generalize the models against data that has never been seen previously. Following that, the analyst verifies that the models adequately address all key business issues. Ultimately, the champion model(s) are selected for the process.
Deployment
An operating system will typically deploy the model as a code representation. Additionally, this includes a mechanism for scoring or categorizing new data as it emerges. In order to solve the original business problem, the mechanism should use the new information. All data preparation steps prior to modeling are represented in the code representation. As a result, the model will handle new raw data the same way it does when developing new models.
Several characteristics of CRISP-DM contribute to its longevity in a rapidly changing field:
By now, you must have understood what CRISP-DM is and the data mining and analysis process using this framework. If you’re interested in obtaining in-depth knowledge, do check out UNext’s Integrated Program In Business Analytics in association with IIM Indore, which will surely guide you about CRISP-DM and the data mining and analysis processes using this framework.