January 23rd, 2020
Grandmaster Series: How a Passion for Numbers Turned This Mechanical Engineer into a Kaggle GrandmasterRSS Share Category: AutoML, Community, Company, Data Science, H2O Driverless AI, Kaggle, Makers, NLP
By: Parul Pandey
In conversation with Sudalai Rajkumar: A Kaggle Double Grandmaster and a Data Scientist at H2O.ai
It is rightly said that one should never seek praise. Instead, let the effort speak for itself. One of the essential traits of successful people is to never brag about their success but instead keep learning along the way. In the data science world, a name that resonates when we speak of humility is that of Sudalai Rajkumar, who is as famous for his humble nature as he is for his analytical prowess. It is indeed a privilege and an absolute honor to be working with him as a colleague and learning new things every day.
In this edition of the interviews, where I bring to light the journeys of successful data scientists, I shall be sharing my interaction with Sudalai Rajkumar, aka SRK, a Kaggle Competitions and Kernels Grandmaster, and a data scientist at H2O.ai. Sudalai completed his Engineering degree at PSG College of Technology and then went on to earn an executive degree in Business Analytics and Intelligence from the Indian Institute of Management-Bangalore.
SRK brings along with himself a decade of experience in machine learning and data science. He has a large following both in India and abroad, and is a massive inspiration for aspiring data scientists around the world. Apart from getting high ranks in several competitions on Kaggle, SRK is famous for his in-depth kernels too. In fact, he is the former No 1 in the Kernels section of Kaggle.
SRK announced the completion of a decade in the data science industry with a beautiful gratitude note on LinkedIn. Therefore, what better time than this to speak to the man himself about his journey into data science and his advice for the new entrants in this field.
Below is an excerpt from my conversation with Sudalai:
You have a background in Mechanical Engineering. How did the transition to software engineering happen?
SRK: When I finished my degree, I had two job offers — one in a well known mechanical engineering firm and the other in a Startup analytics firm. The mechanical engineering offer was a dream one for me, just like for any other fresh mechanical engineering graduate. However, the joining date was about four months away from graduation, and so I decided to take the other offer.
Initially, my idea was to join the analytics firm to understand more about the company and the nature of work since their interview process was very interesting. In the process, I got intrigued entirely by finding patterns in the data even though I was completely new to software engineering. This passion of mine for numbers made me continue that job and looking back today, I am extremely pleased that I made that decision.
How did your tryst with Kaggle begin, and what kept you motivated throughout your grandmaster’s journey?
SRK: Being from mechanical engineering, I had no formal education in software engineering or Data Science. Hence, I started taking up MOOCs to learn about the concepts. I came across algorithms like Random Forest, SVM, etc. in these courses but did not see anyone using them in the job. This made me look for avenues to experiment with these new algorithms to understand them better. That is how I stumbled upon Kaggle and started my Kaggle journey.
I would say Kaggle is also an addiction once we start doing it, and I am no exception to that. The addiction to build better models and get better ranks sometimes gets a hold on you. There were several failures in multiple competitions, but obsession and passion got me going. Of course, it took a lot of personal time after office hours, but there was immense learning along the way.
How do you decide which competitions to participate in?
SRK: In the initial days, I used to participate in almost all competitions. We can learn something new from each one of them.
I used to do at least two at a time because when I hit a roadblock in one of them, I used to go to the other one and come back after a couple of days to the first one with a fresh mindset. This has often helped me.
Currently, I participate mainly in NLP competitions and I am not that active in the other ones.
How do you typically approach a Kaggle problem? Any favorite ML resources(MOOCS, Blogs, etc.) that you would like to share?
SRK: There are some crucial aspects that I keep in mind while approaching a competition:
- The first thing I do is to set up a proper cross-validation framework for the problem.
- The next thing is to build a simple modeling pipeline to ensure that the end to end code is working fine.
- I read the forums and explore the kernels sections to understand more about the problem and the views of other people.
- I make sure to spend a considerable amount of time on exploratory analysis, reading relevant papers or articles, and feature engineering.
- The final steps are about modeling and ensembling.
This is an excellent course on data science competitions. Apart from that, it is good to read past solutions and try to implement some of the ideas in the ongoing ones. I created one handy kernel to help with this.
SRK’s Kernel on winning solutions of Kaggle competitions
As a Data Scientist at H2O.ai, what are your roles and in which specific areas do you work?
SRK: As a Data Scientist at H2O.ai, I am involved in the development of Driverless AI, an automated machine learning platform, specifically in the Natural Language Processing (NLP) area. My role consists in exploring the recent developments in the NLP field and integrating them into the product. NLP has seen tremendous changes in the last two years, and so my work is fun since it involves catching up with them.
<style=”padding-left: 40px;”>SRK along with H2O.ai colleagues during H2O World NY’19.
If you were to team up with grandmasters at H2O.ai, who would they be and why?
SRK: We are already working as a single team here at H2O.ai building the automated machine learning platform 😃
In the case of Kaggle competitions, all of them, of course! I have teamed up with Rohan, Marios, Mark and Mathias so far in Kaggle but not with others, and so would love to have an opportunity to team up with other GMs at H2O.ai to learn more new things.
What are some of the best things that you have learned via Kaggle that you apply in your professional work at H2O.ai?
SRK: Structured and logical thinking are some of the vital things that I learnt from Kaggle. In the data science domain, it is easy to get lost trying to solve a problem. Solving Kaggle problems helped me develop a structured and logical thinking process that I apply at my work.
Most of the time, when some new techniques or models come in the ML field, they are first experimented in the Kaggle competitions before becoming mainstream. Models like XGBoost, which is more used in the community today, had been first used extensively in Kaggle. So we can pick up these trends early and keep ourselves updated.
Model generalization and feature engineering techniques are a couple of more crucial learnings from Kaggle that have helped me in professional work.
Data science is rapidly evolving. How do you manage to keep up with all the latest developments?
SRK: Blogs and social media are two ways in which I keep myself updated about the latest developments. Some of the blogs I follow regularly are Analytics Vidhya, Towards Data Science, KDNuggets. These blogs help me in understanding the concepts in detail.
To keep up with the latest trends instantly, I follow a lot of data scientists and ML researchers on Twitter. ML community is very active on twitter. Whenever something new comes up, be it a paper or a project, there usually is a lot of activity about it on Twitter.
Are there any specific areas or problems where you would want to apply your expertise in ML?
SRK: I wish to apply Machine Learning for problems that help in the upliftment of society. For example, I recently collected water data from Chennai city (the city of my residence) when there was a drought to understand the water levels in different reservoirs. A simple forecasting model would have helped forecast the condition in advance and make plans for the same. Such projects will be of enormous benefit to people and help improve society.
One other area of my interest is to apply Natural Language Processing techniques for vernacular languages in India. There are a lot of resources for NLP in English, but there are not many available for vernacular languages.
Any words of advice for data science aspirants who have just started or wish to start their data science journey?
SRK: First and foremost, data science enthusiasts must understand whether this field is of actual interest to them or they want to get associated with it just because of the hype around it. It is a rapidly evolving field and requires continuous learning, and so only passion will help sustain in the long run.
Once the person understands the basic concepts of machine learning either from courses or books, the crucial step is to get hands-on knowledge. There are multiple ways to do that, including participating in data science hackathons, contributing to open source projects, writing blogs, doing internships, etc. One can take up one or a few of the above skills to hone and showcase their talents.
SRK’s journey is an inspiration for all of us. However, one important takeaway from this discussion is that it is essential to realize whether your interest truly lies in data science. This field is driven by passion and requires a lot of self-motivation and learning. Don’t merely choose this field due to the hype around it for that may fizzle off easily. Think of why you would want to become a Data Scientist in the first place and how you will put those skills to use in the long run.