September 15th, 2020

My Experience at the World’s Best AI Company

RSS icon RSS Category: Makers

Blog post by Spencer Loggia

When H2O announced that remote work would continue through the summer due to Covid-19, I was a little disappointed. I expected that it would be difficult to connect with others as a new employee, especially as an intern.

My internship now comes to an end, and I realize how completely wrong I was. I’ve met and worked with people across the company, each with a unique set of skills I have learned from. Everyone was willing to help with whatever problem I had, and I felt that any question was valid. I was given real responsibility and felt like a full employee instead of an intern. 

There were definitely challenges unique to the remote experience. I had to be comfortable reaching out to people I’d never met before. I had to stay focused and efficient without any immediate guidance. I learned to navigate and work with the infrastructure of a whole company without anyone physically there to walk me through it. Ultimately, this all may have been a gift, because I had to understand the software I was working with and, to some extent, myself on a deeper level in order to succeed. 

My time at H2O wasn’t spent working on an isolated “Intern Project” as is common at other companies. That being said, my contributions can be broadly broken down into two separate projects below, as well as a variety of smaller tasks. 

Reducing the size of the MOJO Pipeline

The Mojo scoring pipeline is a lightweight and platform independent way for users to productionize completed experiments. Mojo works by serializing models using Google’s protobuf, and then loading them to make predictions using either a Java or C++ runtime. 

I determined that the Mojo pipeline was especially large for certain time series experiments. For anyone who doesn’t know, a time series is when the sequence of the data rows is important. For many datasets order is arbitrary, for example a task like image recognition. However, for others the order itself may be the most important feature, like when predicting the value of a product some distance into the future. Some methods for dealing with sequential data involve the use of “lag intervals”. This allows information from some prior rows to be used for the current prediction. 

For each lag interval, a transformation may be applied to the lagged data. This transformer is serialized to be used in the mojo scoring pipeline. With an ordinary experiment there should be no circumstance in which many identical transformers act on the same data. But with lag this is exactly what occurs, and it can result in whole columns being redundantly serialized and stored for each lag interval.

The solution in the end was rather simple, just hashing the relevant key and value data, then referencing previous protobuf files if they had been created. However, it required me to learn how DAI transforms data, how protobuf works, and how serialized models are read by the mojo runtime. In the end, I achieved an order of magnitude decrease in the size of the pipeline for certain time series experiments, which can help clients to successfully deploy mojos for very large datasets.

Enterprise Steam Client Testing

I was lucky enough to be able to work with H2O Enterprise Steam as well, which is a product that allows admins to securely manage H2O-3 and Sparkling Water clusters on Hadoop and Driverless instances on Kubernetes. Early on I was assigned a small task related to Steam, a bug fix or something of that nature. It turned out that the team could use help developing a testing suite for the Steam python client, which gave me a chance to see a completely different side of the company, and to learn more about Hadoop, Spark, and Docker. 

It was challenging at times to be split between unrelated tasks, but I think it painted a more accurate picture of what life at a rapidly growing company looks like than I would have gotten otherwise.

To Conclude

Besides that, I was able to work with some of the brightest minds in the field, and to use Driverless AI every day, which made even complicated, large-scale ML problems breathtakingly simple and efficient. I also had the invaluable experience of the hectic push before new version releases, it was a pleasure to participate in all the testing and bug catching necessary for new features. 

Everything I worked on might seem pretty mundane. After all, optimization, testing, and bug-chasing isn’t exactly glamorous – except that it really is. Certain new technologies have changed the very fabric of society, and it seems clear that AI is next. With its mission of democratizing AI, H2O will ensure that change happens in the best way possible. I am happy to have been able to contribute to that in some small way. 

About the Author

Spencer Loggia is a Junior at Johns Hopkins University majoring in Computer Science and Neuroscience. He has worked in three research labs focusing on resolving the structure of the neural networks involved in attention selectivity, developing software for modeling protein complex assembly, and viral engineering. He is especially interested in brain machine interfaces, general AI, and the use of machine learning to better understand biological systems.

About the Author

Jo-Fai Chow

Jo-fai (or Joe) has multiple roles (data scientist / evangelist / community manager) at Since joining the company in 2016, Joe has delivered H2O talks/workshops in 40+ cities around Europe, US, and Asia. Nowadays, he is best known as the H2O #360Selfie guy. He is also the co-organiser of H2O's EMEA meetup groups including London Artificial Intelligence & Deep Learning - one of the biggest data science communities in the world with more than 11,000 members.

Leave a Reply

Three Keys to Ethical Artificial Intelligence in Your Organization

There’s certainly been no shortage of examples of AI gone bad over the past few

September 23, 2022 - by Team
Using GraphQL, HTTPX, and asyncio in H2O Wave

Today, I would like to cover the most basic use case for H2O Wave, which is

September 21, 2022 - by Martin Turoci
머신러닝 자동화 솔루션 H2O Driveless AI를 이용한 뇌에서의 성차 예측

Predicting Gender Differences in the Brain Using Machine Learning Automation Solution H2O Driverless AI 아동기 뇌인지

August 29, 2022 - by Team
Make with Recap: Validation Scheme Best Practices

Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with session on

August 23, 2022 - by Blair Averett
Integrating VSCode editor into H2O Wave

Let’s have a look at how to provide our users with a truly amazing experience

August 18, 2022 - by Martin Turoci
5 Tips for Improving Your Wave Apps

Let’s quickly uncover a few simple tips that are quick to implement and have a

August 9, 2022 - by Martin Turoci

Start Your Free Trial