One of the things that I have come across while browsing through the Job Descriptions of a Data Scientist on portals like Glassdoor is the seemingly overwhelming co-occurrence of Python along with R as one of the skill sets required. I have been a user of both the languages and I love working with both. I started as a user of R and then picked up Python along the way.
I found many similarities between R and Python when it comes to wrangling data and these similarities helped me pick up the language quite quickly. One of the most commonly used libraries while working in Python is the pandas library. This is the library which makes it’s usage like that of R.
The slight difference in syntax is because, in python indices of any iterable start from 0 instead of 1.
Both the sets of code produce a plot as shown below:
As can be seen, the process of data ingestion is almost similar, both R and python make use of a “read” function. Both make use of head() method to look at the snapshot of the data.
One can’t fail to notice how similar both sets of codes are, even the function names are similar.
Being an extensive user of both the languages I believe anyone who is a beginner to intermediate level R user, can easily transition to Python. Having said that there are many additional benefits that a python user can reap. Here are a few tasks that one can do in Python far more easily as compared to using R:
1. Text Processing: Python is very good at processing text data. There are many good text processing modules available in python. Python being an object oriented language has a very clean syntax that aids in working with text data.
2. Scraping data from websites: Python modules such as Beautiful soup, scrappy etc can be used to scrape data from webpages relatively easily.
3. Image processing: Projects like OpenCV and PIL help in processing image data relatively easily. A good data scientist should be able to make sense out of data from diverse sources. A quick look at the newly launched kaggle competitions will reveal that in many competitions the data is nothing but a bunch of images, revealing a strong trend towards the changing notions of “data”. Having the ability to work with image data will put any analyst at the top of skill set ladder in the industry today.
4. Using Big Data frameworks such as Spark: Python is becoming defacto language when it comes to working with Spark with its pyspark One can accomplish a lot using pyspark. Another good news is if you have used pandas you can easily pick up the pyspark syntax.
Learning to use python can yield great dividends. If you are a fresher who already knows R, then picking up python won’t be difficult at all. If you have some experience in the industry and are currently stuck with same type of projects since past couple of years, then learning Python will give you the opportunity to work in exciting new projects like text mining, image analysis and Big Data.