In conversation with Fatih Öztürk: A Data Scientist and a Kaggle Competition Grandmaster.
In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.
In this interview, I shall be sharing my interaction with Fatih Öztürk. He is a Kaggle Competitions’ Grandmaster and a Data Scientist at H2O.ai . Fatih obtained a Bachelor’s in industrial engineering with honors at the Boğaziçi University, Istanbul . He worked as a Data Scientist at UrbanStat before joining H2O.ai. Fatih joined Kaggle almost four years ago and has won seven golds, including a solo one. He also holds the Master status in the discussion tier as well.
In this interview, we shall know more about his academic background, his passion for Kaggle, and his work as a Data Scientist. Here is an excerpt from my conversation with Fatih .
Fatih : My primary focus in Industrial Engineering was on Operations Research(OR), Supply Chains, and Statistics. Apart from these main courses, we also had the option to choose specific electives based on our interests. In my last semester, I took “Data Mining’’ as one of my elective courses. One of the reasons for opting for this choice was its popularity. While studying data mining, it was for the first time that I came across concepts like the random forest , classification , predicting things, etc. I found it pretty interesting and analogous to playing some competitive game. I realized that my passion lay in the field of data analysis, and I instantly knew what field I had to pursue after my graduation.
Fatih : My first job was as a Junior Data Scientist in a tech-startup. I was the only data scientist there, and we were working only for insurance-related companies there. A few months after joining the company, my boss found out about the Porto Seguro competition on kaggle, and he asked me if I could look at it since it was an insurance use-case. I was pleased about what I found out in that competition because I saw that people were sharing a lot. So during that competition, I realized two main things:
Competing and learning on kaggle go hand in hand. It is my primary motivation for participating in any competition. Being a Master or a GrandMaster is just a natural result of this process.
Fatih : I liked the Home Credit Default Risk competition. The datasets were not fully anonymized, and hence there was a lot of room for feature engineering. Trying to understand the domain of the competition and then being able to generate useful features was fun. Moreover, our team had a good validation strategy that turned out to be very successful for the private leaderboard in the end. We went from 29th place on the public leaderboard to 10th on the private one.
Fatih: For any competition, my first attempt is always to have a reliable validation scheme on my side. Having a well correlated CV-LB relation is everything. So how to achieve this? It mostly depends on the right exploratory data analysis(EDA). Figuring out how the test set differs from the train set (if so) and then mimicking this in your validation scheme is a good starting point. Besides doing EDA with plots and numbers, I also check adversarial validation scores in this regard.
After having a good validation strategy, I focus on finding useful things that are not shared on the public forum because having different tricks is crucial to land a good rank at the end.
Fatih: I use Python and, most of the time, work with JupyterLab. I also have a Google Colab pro account to get access to GPUs since I don’t have a local one. I find it is a good investment since we have limited GPU hours per week on Kaggle notebooks.
My favorite modeling algorithm is Lightgbm. I still think that it is a very efficient and production-friendly algorithm given how easy it is to tune and how fast it can get sufficiently good scores.
Fatih : I find people’s interest in data science quite noteworthy in Turkey, and it’s increasing every day. More and more students are choosing Computer Science as their major over other engineering majors. The main reason for this popularity is the overall adoption of data science in every industry.
The number of Turkish people that I encounter in kaggle competitions is also growing quite fast. This is heartwarming since this was not the case a few years ago. A similar situation is reflected in the meetup community as well. There has also been a rapid rise in both the number of the events and the students involved. Recently, a lot of Turkish companies have started hosting in-class competitions on Kaggle.
Fatih: I’m involved in POCs and other customer-related projects to help them benefit more from Driverless AI . Besides, I develop new apps via the Wave framework and testing Driverless AI with new datasets.
Fatih : I think social networks are the key to this. It’s almost impossible to remain up to date just by yourself. However, if you are in the right Slack channels and have a meaningful LinkedIn feed, it’s easier to follow the news. . Apart from this, joining kaggle competitions and regularly following the threads in competition forums is another useful resource.
Fatih: I want to join Computer Vision competitions in 2021. I’d be delighted to be placed in the top 50 as a solo competitor in one of these competitions. A gold medal as a team would also be fantastic, of course.
Fatih : I would suggest not worry too much about questions like — where to start, which courses to take, which tools to learn etc. Instead of dealing with all these questions initially, it is advisable to directly jump into a data science project or a competition and learn from others’ code. This is the way I improved myself by getting my hands dirty early on. Analyzing other peoples’ code and asking questions like — What does this code snippet do here? Why did the author code like this? How does it help in this project/competition? etc were some of the ways which allowed me to hone my skills. The next task is to answer these questions then. One could either search for the answers on the internet or make use of the discussion forums.
Fatih’s Kaggle’s achievements reflect his passion for problem-solving and his constant penchant for hard work. How he transitioned from industrial engineering into Data science and then went to achieve the title of a Kaggle GrandMaster in a span of two years is commendable.