March 27th, 2014

Google-scale Machine Learning & Deep Learning gets principal platform in Apache Mahout with Spark and H2O

RSS icon RSS Category: Uncategorized [EN]

H2O’s vision is direct and simple: scaling machine learning for powering intelligent applications. Our focus is distributed machine learning and a fully-featured set of industrial grade algorithms.
Apache Mahout is where people learn their chops in Machine Learning. Like R, It’s the “hello world” first place many new users get exposed to algorithms on big data. Making that experience beautiful, accessible and value-driven will make machine-learning ubiquitous and Mahout a movement to rival the success & utility of say, lucene and hadoop.
Apache Spark has great developer momentum and in-memory makes it ideal for implementing and extending algorithms.
Our vision and motivation is to re-ignite the community & double down on the identical founding visions of Mahout and H2O. Under one umbrella, Mahout can power intelligent applications for the enterprises and users.
Creating great software is hard, creating passionate communities is harder. Our belief is that a product is not complete without it’s community. This convergence will make Mahout the principal platform for integrating multiple ways of mining insights from data.
These are exciting times for Mahout. These initiatives will drive momentum to the Mahout as the umbrella platform for Machine Learning. It’s success will drive wide-scale adoption of scalable machine learning algorithms in the enterprise & H2O is committed to that unified vision. Spark is a terrific in-memory platform for that. Stratosphere will be another. Scala, R, Python, JS, Java and the Matrix APIs make it a polyglot modeling & programming universe. This will be fun.
We are excited at the possibilities of this convergence. A fan of Mahout ‘s vision and how it captured the imagination of machine learning enthusiasts over the years.. (Still fondly recollect Isabel’s spirited talk at ApacheCon years ago!) A real product, hacker and an open source developer culture is the need. The R community has also been looking for a package that solved distributed frames (in-memory) & parallel packages for the algorithms behind. Our team has executed on a lots of these inspirations fast & furiously in open source over the past two years. We hope to enrich & fulfill the day-to-day workflows of the Machine Learning users world-wide through Apache Mahout.
It all starts with the end (ml) user experience and how we can make it better.

Leave a Reply

H2O LLM DataStudio Part II: Convert Documents to QA Pairs for fine tuning of LLMs

Convert unstructured datasets to Question-answer pairs required for LLM fine-tuning and other downstream tasks with

September 22, 2023 - by Genevieve Richards, Tarique Hussain and Shivam Bansal
Building a Fraud Detection Model with H2O AI Cloud

In a previous article[1], we discussed how machine learning could be harnessed to mitigate fraud.

July 28, 2023 - by Asghar Ghorbani
A Look at the UniformRobust Method for Histogram Type

Tree-based algorithms, especially Gradient Boosting Machines (GBM's), are one of the most popular algorithms used.

July 25, 2023 - by Hannah Tillman and Megan Kurka
H2O LLM EvalGPT: A Comprehensive Tool for Evaluating Large Language Models

In an era where Large Language Models (LLMs) are rapidly gaining traction for diverse applications,

July 19, 2023 - by Srinivas Neppalli, Abhay Singhal and Michal Malohlava
Testing Large Language Model (LLM) Vulnerabilities Using Adversarial Attacks

Adversarial analysis seeks to explain a machine learning model by understanding locally what changes need

July 19, 2023 - by Kim Montgomery, Pramit Choudhary and Michal Malohlava
Reducing False Positives in Financial Transactions with AutoML

In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems

July 14, 2023 - by Asghar Ghorbani

Ready to see the platform in action?

Make data and AI deliver meaningful and significant value to your organization with our state-of-the-art AI platform.