February 15th, 2021

Learning from others is imperative to success on Kaggle says this Turkish GrandMaster

RSS icon RSS Category: Makers

In conversation with Fatih Öztürk: A Data Scientist and a Kaggle Competition Grandmaster.

In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.

In this interview, I shall be sharing my interaction with Fatih Öztürk. He is a Kaggle Competitions’ Grandmaster and a Data Scientist at H2O.ai. Fatih obtained a Bachelor’s in industrial engineering with honors at the Boğaziçi University, Istanbul. He worked as a Data Scientist at UrbanStat before joining H2O.ai. Fatih joined Kaggle almost four years ago and has won seven golds, including a solo one. He also holds the Master status in the discussion tier as well.

In this interview, we shall know more about his academic background, his passion for Kaggle, and his work as a Data Scientist. Here is an excerpt from my conversation with Fatih.


 

You have a background in Industrial Engineering. What prompted you to choose Data Science as a career?

Fatih: My primary focus in Industrial Engineering was on Operations Research(OR), Supply Chains, and Statistics. Apart from these main courses, we also had the option to choose specific electives based on our interests. In my last semester, I took “Data Mining’’ as one of my elective courses. One of the reasons for opting for this choice was its popularity. While studying data mining, it was for the first time that I came across concepts like the random forest, classification, predicting things, etc. I found it pretty interesting and analogous to playing some competitive game. I realized that my passion lay in the field of data analysis, and I instantly knew what field I had to pursue after my graduation.

How did your tryst with Kaggle begin, and what kept you motivated throughout your grandmaster’s journey?

Fatih’s Kaggle profile

Fatih: My first job was as a Junior Data Scientist in a tech-startup. I was the only data scientist there, and we were working only for insurance-related companies there. A few months after joining the company, my boss found out about the Porto Seguro competition on kaggle, and he asked me if I could look at it since it was an insurance use-case. I was pleased about what I found out in that competition because I saw that people were sharing a lot. So during that competition, I realized two main things: 

  • My learning rate was much higher when I was around kernels and discussions. 
  • My competitive side was triggered, and I learned that I liked competing a lot.

Competing and learning on kaggle go hand in hand. It is my primary motivation for participating in any competition. Being a Master or a GrandMaster is just a natural result of this process.

Can you tell us a little about your favorite Kaggle competition?

Fatih: I liked the Home Credit Default Risk competition. The datasets were not fully anonymized, and hence there was a lot of room for feature engineering. Trying to understand the domain of the competition and then being able to generate useful features was fun. Moreover, our team had a good validation strategy that turned out to be very successful for the private leaderboard in the end. We went from 29th place on the public leaderboard to 10th on the private one. 

How do you typically approach a Kaggle problem? 

Fatih: For any competition, my first attempt is always to have a reliable validation scheme on my side. Having a well correlated CV-LB relation is everything. So how to achieve this? It mostly depends on the right exploratory data analysis(EDA). Figuring out how the test set differs from the train set (if so) and then mimicking this in your validation scheme is a good starting point. Besides doing EDA with plots and numbers, I also check adversarial validation scores in this regard. 

After having a good validation strategy, I focus on finding useful things that are not shared on the public forum because having different tricks is crucial to land a good rank at the end.

Could you give us a sneak peek into your toolkit like a favorite programming language, IDE, Algorithms, etc

Fatih: I use Python and, most of the time, work with JupyterLab. I also have a Google Colab pro account to get access to GPUs since I don’t have a local one. I find it is a good investment since we have limited GPU hours per week on Kaggle notebooks.

My favorite modeling algorithm is Lightgbm. I still think that it is a very efficient and production-friendly algorithm given how easy it is to tune and how fast it can get sufficiently good scores.

 You regularly speak up in meetup events. How is the data science landscape in and around Turkey?

Fatih as one of the speakers at the Istanbul Tech Week event

Fatih: I find people’s interest in data science quite noteworthy in Turkey, and it’s increasing every day. More and more students are choosing Computer Science as their major over other engineering majors. The main reason for this popularity is the overall adoption of data science in every industry.

The number of Turkish people that I encounter in kaggle competitions is also growing quite fast. This is heartwarming since this was not the case a few years ago. A similar situation is reflected in the meetup community as well. There has also been a rapid rise in both the number of the events and the students involved. Recently, a lot of Turkish companies have started hosting in-class competitions on Kaggle.

As a Data Scientist at H2O.ai, what are your roles, and in which specific areas do you work?

Fatih, along with fellow kaggle Grandmasters at H2O.ai

Fatih: I’m involved in POCs and other customer-related projects to help them benefit more from Driverless AI. Besides, I develop new apps via the Wave framework and testing Driverless AI with new datasets.

ExploRNA wave app created by Fatih. You can read more about the app here.

The Data Science domain is rapidly evolving. How do you manage to keep up with all the latest developments?

Fatih: I think social networks are the key to this. It’s almost impossible to remain up to date just by yourself. However, if you are in the right Slack channels and have a meaningful LinkedIn feed, it’s easier to follow the news. . Apart from this, joining kaggle competitions and regularly following the threads in competition forums is another useful resource.

How do you plan to spend your time on kaggle in 2021? Any special milestones you want to achieve?

Fatih: I want to join Computer Vision competitions in 2021. I’d be delighted to be placed in the top 50 as a solo competitor in one of these competitions. A gold medal as a team would also be fantastic, of course. 😃

A word of advice for the Data Science aspirants who have just started or wish to start their Data Science journey?

Fatih: I would suggest not worry too much about questions like — where to start, which courses to take, which tools to learn etc. Instead of dealing with all these questions initially, it is advisable to directly jump into a data science project or a competition and learn from others’ code. This is the way I improved myself by getting my hands dirty early on. Analyzing other peoples’ code and asking questions like — What does this code snippet do here? Why did the author code like this? How does it help in this project/competition? etc were some of the ways which allowed me to hone my skills. The next task is to answer these questions then. One could either search for the answers on the internet or make use of the discussion forums. 


 

Fatih’s Kaggle’s achievements reflect his passion for problem-solving and his constant penchant for hard work. How he transitioned from industrial engineering into Data science and then went to achieve the title of a Kaggle GrandMaster in a span of two years is commendable. 


Read other interviews in this series:

About the Author

Parul Pandey

Parul focuses on the intersection of H2O.ai, data science and community. She works as a Principal Data Scientist and is also a Kaggle Grandmaster in the Notebooks category.

Leave a Reply

+
Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders

On April 19th, the H2O World  made its debut in India, marking yet another milestone

May 29, 2023 - by Parul Pandey
+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More