If you haven’t heard of this innovative word as yet, here’s what Kaggle is, in brief:
Founded in 2010, Kaggle is a premier online platform for data processing and predictive modeling competitions. A company arranges with Kaggle to post a dump of data with a proposed problem, and the site’s community of computer scientists, mathematicians and econometricians – collectively known better as data scientists – take on the task, posting proposed solutions.
The key to Kaggle is the community: 85,000 data scientists have entered competitions and each is ranked according to their skill and results in competitions. Every competitor can measure their talents against the veterans of their field. What’s more, the community is all about sharing and caring – in the company’s forums competitors can swap techniques and hone their skills.
Here are a few steps you could follow to become a pro at Kaggle!
Step 1 – Follow basic instructions
So many people get caught up in the nitty-gritty of coding and model building, they forget the most basic of instructions mentioned in the manual such as the initial date of submission. It’s important to understand the timeline so you can reproduce benchmarks, generate the correct submission format, and so on.
Step 2 – Know your data
In Kaggle competitions, overfitting a model can have disastrous results. Each data set has unique features; so it is advisable to play with the data and figure out its quirks and inconsistencies. Some of these oddities may provide huge insights and reveal the true nature of the data at hand. This data feeler (of sorts) needs to be done even before running any algorithms or preprocessing. In many cases, competitors have stumbled upon huge outliers at a later stage which could have easily been avoided by a proper examination of the data.
Step 3 – Understand the goal of data processing
Many beginners tend to worry too much about the technicalities of data processing when what actually matters more is understanding the data and what useful patterns could be modeled. In most cases, all you need to do is familiarize yourself with the data set and the domain and you’re all set!
Step 4 – Be active on the forum
It’s very important to subscribe to the forum to keep yourself updated on issues with the data set or the competition. In addition, it’s worth trying to figure out what your competitors are doing. There are even cases of code sharing within competitions; while it’s not a good idea to rely on such code (which could be guesswork), it’s important to be aware of its existence. Finally, reading the post-competition summaries on the forum is a valuable way of learning from the winners and improving over time.
Step 5 – Do your research
There is a lot of content which is available on the internet for beginners to work with. For any given problem, it is likely that there are people who have already published papers, benchmarks and code which you can learn from. Aside from actually winning the contest, which depends purely on your talent, effort and technique, gaining deeper knowledge and insight is the only sure reward of a competition. And that is sure to get you closer to winning the next!
In addition to these key steps, we would also advise that you form and work in teams in the initial competitions. A few heads put together may be more efficient in tackling complex data, but hey! That’s completely your call. Some people work best alone.
We’ll leave you with this: persistence is key. Kaggle contests can be incredibly fascinating and addictive, or downright frustrating. Don’t lose heart. Start off with easy competitions and data/domains that you are familiar with. With perseverance, you can always work your way up the ranks! Converging on a good model takes its own sweet time, so it’s always good to be enthusiastic and optimistic about it. And – this is worth repeating even though it has been ever so many times: remember that winning isn’t everything, its all about the competition!