“Which is the best analytics software to learn?” – I am often asked this question by those aspiring to get into the field of analytics.
The analytics software market has numerous players – from billion dollar corporations to 1-man shops- offering highly sophisticated, even self-learning platforms to niche, custom-made solutions.
SAS is undoubtedly the reigning king in the market. Most large global companies and cash-rich businesses tend to go for this software. In India, most medium and large analytics service providers use SAS for predictive modelling and advanced data mining.
There are many different software available from the SAS institute. Some of them are fairly generic and domain agnostic while some are domain-specific platforms that serve a niche purpose. Of all the packages available, Base SAS is the cheapest and most widely used. The Base SAS package is a code-based tool that uses the SAS language for coding. While some of the tasks like data import can be performed using the GUI, most tasks require a knowledge of the SAS language.
Being a code-based tool, Base SAS is not very easy to learn. However, other than Excel, it is one of the most popular analytics software in the business market and hence, a handy skill to have.
SAS has another tool called “E-miner”. E-miner is a GUI based version of Base SAS where you can perform complex data manipulation and modelling tasks just by clicks and drags. E-miner also has additional advanced analytics capabilities that are not present in the base SAS version. Capabilities to perform market basket analysis, decision trees, neural networks and support vector machines make e-miner a very comprehensive and easy to learn tool.
The biggest drawback of this amazing software is the extremely high price that the SAS institute insists on charging for it. Deployment of this tool in a mid-size organization can run into millions of dollars. In India, apart from the large MNC banks like Citibank, Barclays etc. very few other companies are able to afford this tool.
IBM: With some major acquisitions in the last few years, IBM has suddenly become a major player in the analytics software market. While COGNOS is largely considered as a business intelligence tool, SPSS and IBM modeler (previously SPSS Clementine) are big players in this market. SPSS is more popular in the market research field than the business analytics field. IBM modeler is comparable to SAS e-miner both in terms of features as well as pricing and hence has the same pros and cons.
Software for the future
R – R is the most popular open-source (FREE) analytics software in the world. Within the analytics community, its popularity easily surpasses that of any other tool. And it is easily the tool of choice for most academicians and scientists. Businesses, however, have been slow to adopt R and a lot of this can be because of intellectual property issues arising out of codes and algorithms written on an open source platform.
While R is again a code based package similar to Base SAS, there are GUIs available for it. The Revolution analytics GUI for enterprise R is a very welcome new addition to the analytics software market. The GUI cuts down on the learning time for R and even though it is not free, it still makes for a very cost-effective solution.
In my discussions with various analytics companies in India, many have expressed a desire to move to R. Companies are however unsure about how to go about training their existing resources and building a pipeline of trained resources for the future. As the new generation of analysts starts to move towards building R skills, companies will slowly but surely move to this platform.
With its ever-increasing capability list, and low/no-cost pricing, R is the most exciting analytics software for the future.
WPS: WPS is a tool that has been around for some time but has not been able to gain the popularity it deserves. This tool, now also acquired by IBM, is virtually a clone of Base SAS. It uses the same SAS language, has a similar interface and has identical algorithms. In fact, most of your SAS codes will run as-is on the WPS platform and most SAS users will be able to transition to this tool with a simple, 2-day training.
The software is completely legitimate, by the way, having won numerous cases filed by the SAS institute.
WPS is very attractively priced compared to Base SAS and companies can easily expect to save over 50% in costs if they move from SAS to WPS.
This means that once IBM gets around to marketing this tool effectively, this could seriously dent SAS’s stranglehold on the market. With a pool of SAS trained resources available to them, companies will find moving from SAS to WPS a walk in the park.
With such a short learning and adoption time, WPS is the second most exciting tool for the future.
Which tools do you think will dominate the market in the future?
Edit 1 – 24 Jan 2012
This article has generated a lot of interesting response on LinkedIn., Here is a link to one of such discussions. It is interesting to get varied perspectives on this topic.
Edit 2- 14 July 2015
By Gunnvant Singh
This is one question which is often asked and often ends up evoking varied responses. In this article we try to put things into perspective and take a holistic view of the things. The landscape of analytics software can be divided into two distinct geographies: On the one hand we have a set of tools that rule the “product development” and “heavy lifting” territory. Whereas on the other hand we have a set of tools dominating the “data exploration and discovery” arena.
Do excuse me if some of the below are repeated above. They are still relevant and very important and I had to yet again, give them their due importance.
The heavy lifters
The products in this category are either standalone number crunching giants or programming languages suited for numerical routines.
SAS is the great grandfather of all statistical programming languages. Most classic predictive modelling routines are available in SAS. The development of SAS started as a North Carolina State university project. Soon the brains behind SAS decided to launch the software as a commercial product. The rest is history. Until recently SAS was the only software which was approved by FDA to analyse the data related to drug experiments. SAS is still popular tool amongst established analytics players. It is still heavily used by banks and pharmaceutical companies. The learning curve for SAS is not that steep, infact it is the first choice of people starting to learn any serious predictive modelling.
R has become popular fairly recently. It was not a very well-known tool outside of academia, but the fact that R is open source and the ecosystem in R provides solution to every conceivable analytics problem has contributed to its popularity amongst the practitioners in analytics industry. The learning curve for R can be steep, but it is worthwhile to spend time to learn R as one gets the ability to use cutting edge algorithms. Another strong point of R is ability to produce fairly advanced graphics. R can be used as a module in a data product as R functionality can be coded inside a web application.
Python is a fairly popular scripting language, known for its clean syntax and is being taught as a first programming language in top universities around the world. Python has a fairly extensive set of machine learning modules available. Since it is a scripting language it can be used to build data products very efficiently. Python is also known for its exceptional ability to handle text data.
This is one product worth keeping an eye on as it is a Machine Learning, App hosting cloud service all rolled into one. For serious developers and businesses trying to create data products, it makes sense to take a serious look at this offering from Microsoft.
It is a machine learning framework written entirely in Java. Its suite of algorithms include tree models, clustering algorithms, neural nets and SVM. Its GUI version is very powerful and popular tool.
Julia is a fairly new statistical computing language. It is more focussed towards audience that wants to develop data products or code statistical algorithms more efficiently. It is still very new and will take some time to reach a level of maturity.
Data exploration and discovery tools
The tools in this category are used to explore and prepare data for subsequent statistical modelling. Most of these are excellent reporting solutions with some even having the ability to implement statistical algorithms.
It’s the defacto, data exploration and visualization tool. It is used both for producing reports and heavy duty data exploration in the industry. Tableau’s visuals allow one to quickly investigate a hypothesis, sanity check their gut, and just go explore the data before embarking on a treacherous statistical journey.
RapidMiner works through visual programming and not only provides a very good platform to do data cleaning and data exploration, but also can be used to build predictive models. The variety of predictive algorithms that are available is also large.
The good old database solution is still used to do basic reporting and preparing data for modelling. SQL as a database solution is here to stay despite the emergence of HDFS and HADOOP. A lot of effort in terms of best practices and legacy code has gone into SQL ecosystem.
KNIME can be considered as a product in the same league as the RapidMiner, it also uses the same visual programming paradigm that is used by the RapidMiner. It can be integrated with R or Python and can run routines written in these languages.
If you found this article interesting, and want to get started on learning the language of SAS, you will most certainly want to read the article Learn the Language of SAS for Free. Written by Jigsaw Academy trainer Subhashini Tripathi, the article lists many of the free Language of SAS Resources available online.