Ever wondered why data science is so competitive? After a highly successful H2O World event last week, we’re shining some light on what we’ve learned from some of the world’s best data scientists and how they go about winning these data science challenges such as Kaggle . In case you missed it, we held a Competitive Data Science Panel at H2O World for which we invited top-notch data scientists and we are very luck that they shared some of their priceless secrets with us!
Our panelists were (from left to right): + Jose Guerrero, #8 at Kaggle, formerly #1 + Guocong Song, #12 at Kaggle, formerly #8 + Mark Landry, #123 at Kaggle, formerly #110 + Chris Severs, data scientist at Ebay + Arno Candel, H2O.ai (moderator) Disclaimer: The views and opinions expressed herein are those of the author and the panelists and do not reflect the views and opinions of anyone else. Your changes of winning a Data Science competition will remain ~~infinitesimally~~ small. The information set forth herein has been obtained or derived from sources believed by the author to be reliable. However, the author does not make any representation or warranty, express or implied, as to the information’s accuracy or completeness, nor does the author recommend that the attached information serve as the basis of any data science challenge submission. And here’s what you’ve been waiting for! The key takeaways from the world’s top Kagglers!
Question: What’s the point of data science competitions?
Jose recommended that we watch the following video:
Fairly convincing, eh? Alright. Let’s get back to data science, and see what the experts had to say!
Question: What’s more important? Data exploration? Feature engineering/mining? Model tuning? Better algorithms?
Guocong: Netflix price, took Andrew Ng’s Coursera course for Machine Learning, was EE in former career, learned programming (can help a lot to be time-effective)
Jose: Kaggle is very addictive, reached Top position in December 2013, got since involved in lots of projects
Jose: Placed 3rd in my first competition (got $0, first place took $500k), will never forget
Question: What tools would it take to make you even better at Kaggle?
Jose: A tool to control sampling for cross-validation, bagging, time-series, geographical data, grouping. Need to keep data in the same folds, same bag to get fair estimates, also for ensembles
Guocong: Writes his own tools, data work flow, open-source projects, always looking for new tools
Mark: Workflow helper tool to keep track of stuff (log transform, change data, no more need) – checks correlation, for example
Chris: GPU support for H2O!
Question: What kind of hardware are you using?
Jose: Dual-Xeon server with 256GB
Guocong: 4-core with 32GB
Mark: Laptop with 8GB + EC2
Chris: 4000 node Hadoop cluster, 64 GPUs
Explore similar content by topic
At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.
Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.
Ready to see the H2O.ai platform in action?
Make data and AI deliver meaningful and significant value to your organization with our platform.