October 25th, 2013

Strata NYC & Hadoop World: How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O

RSS icon RSS Category: Uncategorized [EN]
Fallback Featured Image


How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O
Srisatish Ambati (0xdata Inc), Cliff Click (0xdata Inc)
5:05pm Tuesday, 10/29/2013
Data Science Beekman Parlor – Sutton North
Data Modeling has been constrained through scale; Sampling still rules the day for Adhoc Analytics. Scale brings much needed change to the modeling world. In this talk we present the predictive power of using sophisticated algorithms on big datasets. With large data sizes comes the particularly hard problem of unbalanced data with multiple asymmetrically rare classes. Missing features pose unique problems for most Classification and Regression algorithms and proper handling can lead to greater predictive power. In the race for Better Predictions, H2O makes practical techniques accessible to manyone through an easy-to-use software product.
H2O is an open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms while keeping the widely used languages of R and JSON as an API. And integrates neatly into popular data ecosystems of hadoop, amazon s3, nosql and sql. We briefly discuss design choices in the implementation of Distributed Random Forest and Generalized Linear Modeling and bringing speed and scale to vox populi of Data Science, R. We take a peek at the elegant lego-like infrastructure that brings fine grained parallelism to math over simple distributed arrays.
A short hacking data demo presents the life cycle of Data Science: Powerful Data Manipulation via R at scale, Interactive Summarization over large datasets, Modeling using Elastic Net (GLM), Grid Search for best parameters & low-latency scoring.

Leave a Reply

+
H2O LLM DataStudio Part II: Convert Documents to QA Pairs for fine tuning of LLMs

Convert unstructured datasets to Question-answer pairs required for LLM fine-tuning and other downstream tasks with

September 22, 2023 - by Genevieve Richards, Tarique Hussain and Shivam Bansal
+
Building a Fraud Detection Model with H2O AI Cloud

In a previous article[1], we discussed how machine learning could be harnessed to mitigate fraud.

July 28, 2023 - by Asghar Ghorbani
+
A Look at the UniformRobust Method for Histogram Type

Tree-based algorithms, especially Gradient Boosting Machines (GBM's), are one of the most popular algorithms used.

July 25, 2023 - by Hannah Tillman and Megan Kurka
+
H2O LLM EvalGPT: A Comprehensive Tool for Evaluating Large Language Models

In an era where Large Language Models (LLMs) are rapidly gaining traction for diverse applications,

July 19, 2023 - by Srinivas Neppalli, Abhay Singhal and Michal Malohlava
+
Testing Large Language Model (LLM) Vulnerabilities Using Adversarial Attacks

Adversarial analysis seeks to explain a machine learning model by understanding locally what changes need

July 19, 2023 - by Kim Montgomery, Pramit Choudhary and Michal Malohlava
+
Reducing False Positives in Financial Transactions with AutoML

In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems

July 14, 2023 - by Asghar Ghorbani

Ready to see the H2O.ai platform in action?

Make data and AI deliver meaningful and significant value to your organization with our state-of-the-art AI platform.