Customize and deploy open source AI models, create your own digital assistants and business GPTs.
Open weight small vision-language models for OCR and Document AI.
Industry and Use Case AI Apps
From Credit Scoring and Customer Churn to Anti-Money Laundering
From Clinical Workflow to Predicting ICU Transfers
From Claims Management to Fraud Mitigation
From Predictive Maintenance to Transportation Optimization
From Content Personalization to Lead Scoring
From Assortment Optimization to Pricing Optimization
From Predictive Customer Support to Predictive Fleet Maintenance
Learn how USCF Health is applying H2O Document AI to automate workflows in healthcare
Learn how AES is transforming its energy business with AI and H2O.ai
Learn now IFFCO-Tokio uses the H2O AI Cloud to save over $1M annually by transforming their fraud prediction processes
Learn how Epsilon is increasing its customers' marketing ROI with H2O.ai
Gain expertise through engaging courses and earn certifications to thrive on your AI journey.
Get help and technology from the experts in H2O and access to Enterprise Team
Read the H2O.ai wiki for up-to-date resources about artificial intelligence and machine learning.
Learn the best practices for building responsible AI models and applications
By H2O.ai Team | minute read | December 31, 2013
Stephen Boyd's favorite way of summarizing a dataset at hand: “Understand the pathology of data. Sometimes it's not the pathology.” It's structure: dimensions, factors, outliers and principal components.
It's very much what data scientists want from Adhoc Analytics – Scope the data from enough angles and with different tools to get real intuition around it's structure. This often comes long before any advanced algorithms are run.
Like Linus (Pauling), look for forces and bonds within the data (and gather context by fusing more sources) – Then fire up imagination to probe & ask; Leading to insights that drive business decisions. An immediate consequence of fusing multiple data sources is the Curse of dimensionality.
One just has far more informative dimensions about one's customer these days. Knowing the top 100 good ones would enable faster categorization and modeling. And this pathology can come in simple and subtle ways, for example –
Single Feature Characteristics
Lots of useful single feature characteristics, include, range, standard deviation, mean, distribution, scatter plots.
Is it a constant column? Or mostly missing elements / NAs?
Multi-Feature & Inter-feature Characteristics
What features are nearly identical or share a linear relationship? (ex, delay, vs. arrival_time & departure_time)
What features share a non-linear relationship?
And how do those relations & feature characteristics influence the inquiry about the dataset at hand? Machine learning can help. So does big data – the regularization effects of big data are irrefutable.
It's a slick mystery: Different features intertwined in your data like characters in a Hitchcock thriller. Dial 'M' for Model.
At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.
Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.
Make data and AI deliver meaningful and significant value to your organization with our platform.