Disclaimer: We were made aware by Kaggle of adversarial actions by one of the members of this panel. This panelist is no longer a Kaggle Grandmaster and no longer affiliated with H2O.ai as of January 10th, 2020.
Personally, I’m a firm believer and fan of Kaggle and definitely look at it as the home of Data Science. Kaggle Grandmasters are the heroes of Kaggle or definitely mine. I’ve been on a pursuit to depict and understand their journey into the field also if they’re still humans or have passed onto an alternate reality (not still sure about that one).
H2O World event recently had the biggest Kaggle Grandmaster Panel. This post will share my takeaways from the panel discussions along with a few notes from my previous interviews. Fun fact: I’ve interviewed all of the grandmasters on the panel that wear spectacles
Note to the reader, most of the notes include comments from the Grandmasters with added context for readability.
The questions were asked by Arno Candel, the same are represented here as headings.
Meet the Grandmasters
The panel consisted of 10/13 of H2O’s Grandmasters.
So, you thought cool ML Engineers work with models?
I get to talk to Kaggle GMs every day at work .
Sorry, I had to say that (I still pinch myself every day), back to the GMs:
An “Avengers Assemble” Moment from the video, where every GM introduced themself and their strengths:
- Kim Montgomery (Feature Engineering)
- Rohan Rao (Single Best Model Contrary to common practise on Kaggle)
- Shivam Bansal (Creating Data Stories and End to End Solutions)
- Dmitry Larko (Dmitry is one of the pioneers of Driverless AI)
- Pavel Pleskov (Computer Vision and Time Series)
- Yauhen Babakhin (Computer Vision)
- Mark Landry (aka “OG Data Scientist” at H2O. For the record, these aren’t my words )
- Sudalai Raj Kumar (aka SRK) (NLP, EDA Storytelling)
- Olivier Grellier (Time Series, Feature Engineering)
- Branden Murray (Feature Engineering)
How much time would you spend on Kaggle (when active)?
It really speaks about the passion of the best Kagglers, the majority of the panel agreed about spending a significant few hours, even half of their days on Kaggle.
What is the best skill set that Kaggle has taught you?
Kaggle is really a great learning platform if you are willing to put in the hours. GM Mark Landry, calls it “Homework”-you take an assignment home, work on it for a few months and then realize you couldn’t do as good as other competitors. This leads you to improve your skills and closing your gaps.
“The learnings are unlike any classroom or book, you won’t find the knowledge anywhere that you could by competing on Kaggle” — GM Babakhin
More than often, you’d team up and you’d end up working with a team of people that you wouldn’t have met and remotely contributing, GM Pavel says teaming up remotely and pushing a team to it’s best was one of his favorite takeaways.
After competing on a multitude of competitions and spending the insane amount of hours on the problems, you find common patterns in the problems and start to think in a more structured fashion when approaching these problems. Critical thinking and breaking these problems into steps, getting creative with every step is a takeaway for GM Shivam, SRK.
“You also learn one of the most important real-world skills: Making models that generalize well” – to quote GM Olivier.
What is the secret sauce to Kaggle?
Kaggle Competitions are like a Game of PUBG where everyone starts from scratch but the seasoned Kagglers know where to find the loot. To me, it feels like a race where noobs (Myself and alike) are running barefoot and the Kaggle GMs and Masters just whooze past us in their supercars of knowledge.
Babakhin says: Competitions are more like a marathon than a sprint so you should be prepared to run a lot of ideas, a lot of which will fail and you should be prepared for that.
“Data Science is all about the data and modeling, you really need to understand how to validate your data and the rest follows after that”, according to GM Dmitry
If you’re chasing the win, you’d want to ooze out every single digit to get to the top of the leaderboard, this would require building a lot of models and would require you to have the right ensembling strategy in place — added GM Shivam
Do you think Deep Learning will take care of everything?
Dmitry and Mark agreed on the point that deep learning could you help with modeling but in terms of automatically creating features, validating ideas, specifically to Kaggle validating an idea and thinking critically if the feature will reflect on Kaggle’s Private Leaderboard- Deep Learning may not be able to do that.
How many ideas would you play with at a given point in time?
The answers differed here depending on the particular kaggler’s style or as GM Olivier pointed out it might also depend on how far away is the competition end, that would affect if he would take a relaxed or more serious approach. Each avenger has their own fighting style though, right?
Kim would spend a lot of the time initially on feature engineering and focus on modeling towards the end of the competition.
Rohan, would focus on just 1 competition at a time and run multiple experiments in parallel.
How do you do feature selection?
Rohan Rao uses the help of Driverless AI for a lot of FE now
At one point in time, a smart person would have a great library as a wealth of knowledge. In 2019, a smart programmer has a rich library of code.
For Dmitry, it depends if the competition is similar to one he has competed in earlier. According to him, most grandmasters have pre-ready scripts that they can leverage.
More battle stories: Any mistakes or Regrets from a competition?
As GM Olivier mentioned, everyone would have sometime made a submission on Kaggle that they regret.
Note to the reader: When you compete on Kaggle, your final rankings are evaluated on a private leaderboard (which is the true rank). To get your rank, you are required to select your final submission at the end.
Olivier shared his takeaway from such an experience which made him think about generalizing better than just focussing on a public Leaderboard.
Mark shared a battle story from a competition where a public kernel that looked promising could have cost his team to lose a lot of positions.
Rohan advises looking at outliers, based on a competition where removing just ONE outlier would have landed him 1st position.
GM Babakhin also had a very interesting battle story where he just missed the submission deadline by 10 seconds. I can only imagine the adrenaline rush.
If you had limited time, what would you focus on?
This also speaks to the dedication or 10,000 rule broadly speaking. As Dmitry says, generally any skill requires a lot of time, dedication and focus.
Pavel suggests focussing time on writing quality code.
Most Underrated skills that Grandmasters have?
Even personally, my recent favorite quote was by Rohan:
“Kaggle is my favorite second Full-Time Job but it comes at a Sacrifice”
From the panel, he added, Management of time is crucial. A lot of things happen on and off Kaggle.
How do you stay up to date or learning resources?
If it wasn’t obvious, I’m kidding but do check out the channel or podcast, you can expect interviews from all of the GMs on the panel soon
For Pavel, his favorite course is fast.ai, which is one of the rare courses that always stays at the cutting-edge of Tech.
Shivam and SRK use Twitter where they follow top researchers and practitioners from the field, along with a few blogs.
If you’d like to find my favorite practitioners on twitter, you can subscribe to my list.
Teaming up strategy
For Mark, he likes to work on a problem with a person from the start till the end. Many GMs are more strategic about teaming up towards the end, bringing more models, etc.
Why join H2O.ai?
For Branden, the reason was a drive to be working on true data science products and made a switch of industries.
For Shivam, The vision and products of the company are one of the best in industry.
A unanimous agreement being that the company has THE STRONGEST DATA SCIENCE, TEAM. Even Kaggle doesn’t allow you to team up with 13 Grandmasters on any competition (as per rules).
What were my new impressions and takeaways after the interview?
Here are a few qualities of the panel that inspired me and even Kaggle:
The grandmasters are really smart humans (or maybe super-humans, my research is still underway). However, they have a crazy dedication for Kaggle. At their peak, many spent more than half of their day time on Kaggle. GM Pavel Pleskov even quit his full-time roles and took Kaggle as a full-time role.
- Kaggle as a learning platform:
Quoting GM (KazAnova) Marios Michailidis, “There are thousands of people competing on Kaggle. Even when you lose, you win” (In terms of gaining knowledge). Kaggle even with a few things that it cannot teach you about the real world DS that only the real world could, is one of the best platforms to learn about Data Science.
- Diversity of problems:
Kaggle has a wide variety of competitions and problems. These would again allow you to pick a problem from any domain and gain experience in it. Different Grandmasters shared their areas of expertise during the discussion. A lot of their wealth of knowledge comes from Kaggle.
Finally, the spirit of Kaggle-ing. There is a lot to be learned and gained by competing, engaging in forums and teaming up with people half-way across the world. At the end of the day, Kaggle is the home of Data Science and it has to be one of the greatest learning platforms on there.
If you’d like to check out interviews with Top Practitioners, Researchers and Kagglers about their Journey. Check out the Chai Time Data Science Podcast. Available both as audio and video.