In September 2019 H2O.ai became a silver partner of the Faculty of Informatics at Czech Technical University in Prague. The main goal of this partnership is to make connections between students and companies to prepare an environment where students can use their knowledge in practice and gain real-work experiences.
In general, within the partnership, a company can offer internships, full-time or part-time jobs or some concrete project assignments for example as a part of the final thesis. Companies can present their offers via web portal or during job fairs which are organised by the university two times a year.
In H2O.ai we decided to offer internships via project assignments instead of any kind of jobs. Our main objective of the cooperation is to show how a fast growing AI company works and proposes meaningful assignments for students. Instead of making a slave and getting rid of some annoying and easy work. Last year our target people were primarily bachelor and master students of informatics who are studying some AI specialization. But this year we also prepared several topics which are web or QA oriented.
During the job fairs we like to talk with all students from the whole CTU and motivate them to study AI. We liked the idea a student can study and work together in symbiosis. Students usually try to find some part time job to start obtaining experiences as soon as possible, however it could also motivate them to stop studying too early.
During the academic year 2019/2020 we finally finished two astonishing projects. Both students contributed to our open-source Machine Learning platform H2O-3. They implemented Machine Learning algorithms which were missing in the library – TF-IDF algorithm and Extended Isolation Forest algorithm.
Adam was working on the Implementation of a new algorithm for anomaly detection. The standard Isolation Forest failed to detect the structure of the data and treated it as one rectangular blob with extensive rectangular bands. That is why the idea of the Extended Isolation Forest idea came up. The algorithm is described more in this paper . The image below shows how Extended Isolation Forest algorithms improves Isolation Forest anomaly detection algorithm – “ghost” clusters near (0,0) and (10,10) are reduced.
The reason was straightforward. I was looking for a good supervision and diploma thesis assignment with added value for the real world. Among the other SSP portal assignments at that time, H2O.ai had an assignment designed to “Ask us for more information” with an interesting reward on top. Then I asked, H2O.ai reacted immediately, and after the first interview, I have had a strong feeling that I have the opportunity to work with experts who love and know their field. Veronika came up with a brand new assignment designed for my diploma thesis. It turned out, I could connect my Java Developer skills with Data science and contribute to the open source project for the first time and bring a new algorithm into the production environment. When I considered that, I could hardly ask for a better assignment.
The most challenging for me was to dive into the anomaly detection field I knew nothing about, learn to use the H2O-3 open-source Machine Learning platform, and dig through the big codebase of this open-source product.
I got an invitation to the interview, where we talked about my knowledge and preferences. Since my preference was the diploma thesis, we focused only on the big tasks. After I agreed with the assignment, I got some initial sources to start, and it was up to me to discuss and ask for anything I needed to know. It was no problem to get an appointment or online call. Last but not least, I got enough space to finish university duties and plenty of help to write the thesis.
Besides all the experience contributing to the large and well-known open source project, I successfully finished my diploma thesis, studies, and not least, I applied and got a full-time job at H2O.ai.
Totally! I cooperated with industry partners on both my thesis and I was delighted with both experiences. In my case, I heard a lot about supervisors with which it is difficult to coordinate, not respond to email, not provide feedback, and more. I wanted to avoid this experience. My conviction was that if a company provides a project for a student together with one of their employees, they actually care about the project’s result. They also want to help and lead a student to a successful finish, and I was right.
You can get a motivated supervisor who wants to finish the project at least the same as you, sometimes even more than you. You get in touch with a company, and all contacts from a business are valuable. Last but not least, a cherry on top of all your hard work with a thesis and final exams will be an extra reward for all your effort. Why not combine business with pleasure.
Jan was working on the Implementation of algorithm for text data pre-processing. TF-IDF is a statistical measure that aims to reflect how important a word is to a document in a collection of documents (also known as a corpus). You can find a single page that explains TF-IDF over here: http://www.tfidf.com.
In the final year of my bachelor studies, I had just a few courses left, and I wanted to use my free time to its full potential. I already had some software development experience from my internships and part-time jobs, but I lacked exposure to the machine learning (ML) field that I was very interested in. I kept looking for interesting projects on the portal, and I found an offer from H2O.ai which seemed to be a great fit for me. It required you to work on an actual ML algorithm implementation, which is unusual in practice. So I got in touch with Veronika, and we found a topic which sounded interesting to me – “Implementation of TF-IDF”. Another great benefit of this collaboration was getting to contribute to an open-source project.
Generally, I could rely quite a lot on Veronika and other H2O members, and if I had any questions, I asked them via email or on GitHub. This made the collaboration much easier. But if I would have to pin out one thing, it would be getting familiar with a rather large codebase and getting used to the Map-Reduce style of models used in the H2O framework.
When I first got in touch with Veronika, I was doing my full-time internship abroad. We discussed possibilities, and later when I got back, we agreed on the actual topic. The first thing to do was getting familiar with the codebase. Then I studied TF-IDF, and we discussed how it could be implemented in the framework, and I worked on the implementation. Besides the TF-IDF few other things needed to be implemented such as the string “group by” used by the TF-IDF implementation.
I got experience in the implementation of actual machine learning algorithms. On top of that, I experienced the whole process of open-source contribution to a rather large and well-known open-source project. And besides all the experience, I also got a financial reward as a bonus.
Definitely. I believe this kind of experience gives you a head start to your career. Also, if you are not sure about your focus, you can use these projects to get exposure to some real-world work and maybe find out whether this particular field is for you or not. Besides that, it allows you to make some money during your studies.
COFIT job fair October 2019 and part of H2O.ai Prague team
For the academic year 2020/2021 we offer a lot of new and interesting assignments. Last year, all the assignments were about contributing to the open-source H2O-3 platform. This year we also prepared closed source assignments from Driverless and Steam platform. For example implementation of Timeseries AutoML UI/UX, CHIRP classifier in Python, Security Analysis of Driverless AI Web App, A Distance-preserving Matrix Sketch or implement some new algorithm to H2O-3 Open-source platform. In case you are interested and you are a student of information technology at CTU in Prague, contact us via academic-prague@h2o.ai or you can find all our assignments and its detailed description at Cooperation with Industry Portal ( https://is.fit.cvut.cz/group/ssp ).