H2O.ai Blog
Filter By:
105 results Category: Year:AI for Climate Science: Insights from the LEAP Atmospheric Physics Competition
In April 2024, Kaggle hosted the LEAP-Atmospheric Physics using AI (ClimSim) competition. The competition aimed to use AI to improve climate modeling, challenging participants to develop machine learning models that could enhance climate projections and reduce uncertainty in future climate trends. The goal was to employ faster ML model...
Read moreModel Selection | Routing you to the best LLM
Learn how h2oGPTe routes user queries to the best LLM based on preferences for latency, cost, or accuracy for chat and retrieval augmented generation. Welcome to Enterprise h2oGPTe, your Generative AI platform for interacting with a wide range of LLMs for chat, document question answering with Retrieval Augmented Generation, new content ...
Read moreLLM DataStudio - V6.0 Release
H2O LLM DataStudio is a no-code application created to streamline data preparation tasks for Large Language Models (LLMs). The tool features three main components: Curate, Prepare, and Custom Eval. Curate - Conversion of documents (PDFs, DOC & audio/video files) into question-answer pairs and summarization pairs Prepare - Prepar...
Read moreH2O LLM DataStudio: V4.1 Release
H2O LLM DataStudio is a comprehensive no-code application designed to simplify data preparation tasks for Large Language Models (LLMs). This tool comprises three key components: Curate, Prepare, and Augment. Curate - Conversion of documents (PDFs, DOC & audio/video files) into question-answer pairs and summarization pairs Prepare ...
Read moreReducing False Positives in Financial Transactions with AutoML
In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems we deploy to safeguard ourselves are raising too many false alarms, with over 90% of fraud alerts being false positives. These false positives, not only frustrating for consumers but also costly for financial institutions, can eclipse t...
Read moreWinner's Insight: Navigating the Parkinson's Disease Prediction Challenge with AI
Parkinson’s disease, a condition affecting movement, cognition, and sleep, is escalating rapidly. By 2037, it is projected that around 1.6 million U.S. residents will be confronting this disease, resulting in significant societal and economic challenges. Studies have hinted that disruptions in proteins or peptides could be instrumental in...
Read moreGenerating LLM Powered Apps using H2O LLM AppStudio – Part1: Sketch2App
sketch2app is an application that let users instantly convert sketches to fully functional AI applications. This blog is Part 1 of the LLM AppStudio Blog Series and introduces sketch2app The H2O.ai team is dedicated to democratizing AI and making it accessible to everyone. One of the focus areas of our team is to simplify the adoption of...
Read moreH2O LLM DataStudio: Streamlining Data Curation and Data Preparation for LLMs related tasks
A no-code application and toolkit to streamline data preparation tasks related to Large Language Models (LLMs) H2O LLM DataStudio is a no-code application designed to streamline data preparation tasks specifically for Large Language Models (LLMs). It offers a comprehensive range of preprocessing and preparation functions such as text cl...
Read moreNavigating the challenges of time series forecasting
Jon Farland is a Senior Data Scientist and Director of Solutions Engineering for North America at H2O.ai. For the last decade, Jon has worked at the intersection of research, technology and energy sectors with a focus on developing large scale and real-time hierarchical forecasting systems. The machine learning models that drive these for...
Read more10 Consejos para Convertirte en un Científico de Datos Exitoso
La ciencia de datos llegó para quedarse. Los científicos de datos utilizan sus habilidades para ayudar a las empresas a tomar mejores decisiones sobre sus productos, servicios, a optimizar procesos, ahorrar y mejorar rentabilidad. Convertirse en un científico de datos de éxito implica muchos aspectos y el estudio continuo, ya que es un...
Read moreH2O.ai at NeurIPS 2022
H2O.ai is proud to participate in the 36th Conference on Neural Information Processing Systems (NeurIPS) 2022, one of the biggest and most prestigious international conferences in artificial intelligence. NeurIPS 2022 will be a Hybrid Conference from Monday, November 28th through Friday, December 9th, with an in-person event at the New Or...
Read moreMake with H2O.ai Recap: Validation Scheme Best Practices
Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with H2O.ai session on validation scheme best practices, our second accuracy masterclass. The session covered key concepts, different validation methods, data leaks, practical examples, and validation and ensembling. Key Concepts While the validation topics cove...
Read moreData Science with H2O.ai: An Introduction to Machine Learning and Predictive Modeling
Our own Jonathan Farland recently recorded a talk about machine learning and predictive modeling. In his talk, Jon also gave an overview of open source H2O and H2O AI Cloud . This video is a great resource for getting up to speed with the latest technology from H2O in half an hour. Some of you may prefer to go through the slides while l...
Read moreTackling Illegal, Unreported, and Unregulated (IUU) Fishing with AI
According to a report by the High-Level Panel for a Sustainable Ocean Economy, it is estimated that illegal, unreported, and unregulated (IUU) fishing accounts for 20 percent of the seafood and up to 50 percent in some areas. These activities not only affect the marine ecosystem but, in a way, are linked to climate change on the planet a...
Read moreRevisiting the Miracle of Istanbul
IntroductionOn May 25th, 2005, the UEFA Champions League final between AC Milan and Liverpool was held at the Atatürk Olympic Stadium in Istanbul. The match is still considered one of the greatest finals in football history. AC Milan took a 3-0 lead in the first half but Liverpool made a miraculous comeback in the second half to tie the g...
Read moreShapley Values - A Gentle Introduction
If you can’t explain it to a six-year-old, you don’t understand it yourself. – Albert Einstein One fear caused by machine learning (ML) models is that they are blackboxes that cannot be explained. Some are so complex that no one, not even domain experts, can understand why they make certain decisions. This is of particular concern when s...
Read more1st Place Winner's Blog - Kaggle 2021 Data Science and Machine Learning Survey
Kaggle, the largest global community of data scientists, conducted the 5th annual industry-wide survey that presented a truly comprehensive view of the state of data science and machine learning. A total of 25,973 responses were collected from participants from over 60 countries. Kaggle also launched the Data Science Survey Challenge in w...
Read moreAn Introduction to Time Series Modeling: Traditional Time Series Models and Their Limitations
In the first article in this series, we broke down the preprocessing and feature engineering techniques needed to build high-performing time series models. But we didn’t discuss the models themselves. In this article, we will dig into this. As a quick refresher, time series data has time on the x-axis and the value you are measuring (dema...
Read moreAmazon Redshift Integration for H2O.ai Model Scoring
We consistently work with our partners on innovative ways to use models in production here at H2O.ai, and we are excited to demonstrate our AWS Redshift integration for model scoring. Amazon Redshift is a very popular data warehouse on AWS. We wanted to expand on the existing capacities of using data from Redshift to train a model on the ...
Read moreMLB Player Digital Engagement Forecasting
Are you a baseball fan? If so, you may notice that things are heating up right now as the Major League Baseball (MLB ) World Series between Houston Astros and Atlanta Braves tied at 1-1.MLB Postseason 2021 Results as of October 28 (source) This also reminded me of the MLB Player Digital Engagement Forecasting competition in which my coll...
Read moreAn Introduction to Time Series Modeling: Time Series Preprocessing and Feature Engineering
Time is the only nonrenewable resource – Sri Ambati, Founder and CEO, H2O.ai. Prediction is very difficult, especially if it’s about the future – Niels Bohr, Nobel Prize-Winning Physicist. Despite its inherent difficulty, every business needs to make predictions. You may want to forecast sales or estimate demand or gauge future inventory ...
Read moreTime Series Forecasting Best Practices
Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best practices. The talk was well-received so we decided to turn it into a blog post. Below are some of the highlights from his talk. You can also follow the two software demos and try it yourself using our H2O AI Cloud .(Note : The video links with ...
Read moreFrom the game of Go to Kaggle: The story of a Kaggle Grandmaster from Taiwan
In conversation with Kunhao Yeh: A Data Scientist and Kaggle Grandmaster In these series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand...
Read moreWhat does it take to win a Kaggle competition? Let's hear it from the winner himself.
In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster. In this interview, I shall be ...
Read moreWhat it takes to become a World No 1 on Kaggle
In conversation with Guanshuo Xu: A Data Scientist, Kaggle Competitions Grandmaster, and a Ph.D. in Electrical Engineering. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. The intention behind these interviews...
Read moreSafer Sailing with AI
In the last week, the world watched as responders tried to free a cargo ship that had gone aground in the Suez Canal. This incident blocked traffic through a waterway that is critical for commerce. While the location was an unusual one, ship collisions, allisions , and groundings are not uncommon. With all the technology that mariners hav...
Read moreH2O AI Cloud: Democratizing AI for Every Person and Every Organization
Harnessing AI’s true potential by enabling every employee, customer, and citizen with sophisticated AI technology and easy-to-use AI applications. Democratization is an essential step in the development of AI, and AutoML technologies lie at the heart of it. AutoML tools have played a pivotal role in transforming the way we consume an...
Read moreUsing Python's datatable library seamlessly on Kaggle
Managing large datasets on Kaggle without fearing about the out of memory error Datatable is a Python package for manipulating large dataframes. It has been created to provide big data support and enable high performance. This toolkit resembles pandas very closely but is more focused on speed.It supports out-of-memoy datasets, multi-thr...
Read moreMeet the Data Scientist who just cannot stop winning on Kaggle.
In conversation with Philipp Singer: A Data Scientist, Kaggle Double Grandmaster, and a Ph.D. in Computer Science. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate an...
Read moreGrandmaster Series: The inspiring journey of the ‘Beluga’ of Kaggle World 🐋
In conversation with Gábor Fodor: A Data Scientist at H2O.ai and a Kaggle Competitions’ Grandmaster. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage othe...
Read moreAutomate your Model Documentation using H2O AutoDoc
Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes.The Federal Reserve’s 2011 guidelines state that without adequate documentation, model risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus ...
Read moreFrom GLM to GBM – Part 2
How an Economics Nobel Prize could revolutionize insurance and lending Part 2: The Business Value of a Better ModelIntroductionIn Part 1 , we proposed better revenue and managing regulatory requirements with machine learning (ML). We made the first part of the argument by showing how gradient boosting machines (GBM), a type of ML, can mat...
Read moreFrom GLM to GBM - Part 1
How an Economics Nobel Prize could revolutionize insurance and lending Part 1: A New Solution to an Old ProblemIntroductionInsurance and credit lending are highly regulated industries that have relied heavily on mathematical modeling for decades. In order to provide explainable results for their models, data scientists and statisticians i...
Read moreLessons of COVID-19 and Moving Forward: Key Takeaways
This week, we hosted our second virtual panel focused on how AI can empower healthcare organizations to make better decisions and save lives. Improved forecasting and predictions lead to higher chances in managing and mitigating adverse events, such as the COVID-19 pandemic. I’m proud to acknowledge that H2O.ai is committed to helping cus...
Read moreBrief Perspective on Key Terms and Ideas in Responsible AI
INTRODUCTIONAs fields like explainable AI and ethical AI have continued to develop in academia and industry, we have seen a litany of new methodologies that can be applied to improve our ability to trust and understand our machine learning and deep learning models. As a result of this, we’ve seen several buzzwords emerge. In this short po...
Read moreThree Ways Data and AI is Helping Against COVID19
We are in the midst of a global crisis that epidemiologists have warned us about. As of today, 180 countries and sovereign regions have confirmed cases of patients infected with COVID19 (from here ). Putting aside evidence that indicates the virulence of the disease could be much worse, the fast spread of the virus and the presence of hi...
Read moreIgniting the AI in Healthcare Community
Yesterday we held our first Community Discussion on AI in Healthcare. Our CEO and founder, Sri Ambati led the discussion between Niki Athanasiadou, Marios Michailidis, one of our Grandmasters , and myself. We had nearly 1,300 participants registered from over 45 countries, and over half of those joined live others are viewing the replay. ...
Read moreCOVID-19: Doing Good with Data + AI
During times of severe societal strain, individuals have historically shown an inclination to offer aid and assistance. Often these sacrifices have been at great cost to life or livelihood. In other cases, the efforts have been seemingly more mundane but nevertheless still essential. The efforts of the over 10,000 women code breakers of W...
Read moreHow H2O.ai is Reinventing Healthcare with AI
H2O.ai is hosting a virtual Meetup on AI and Healthcare: Best Practices for Better Outcomes. Join us on 26th March, for a community discussion to collaborate with us and leading healthcare organizations to share ideas and best practices including predicting hospital staffing needs, ICU transfers, as well as sepsis detection and more. Reg...
Read moreSummary of a Responsible Machine Learning Workflow
A paper resulting from a collaboration between H2O.AI and BLDS, LLC was recently published in a special “Machine Learning with Python” issue of the journal, Information (https://www.mdpi.com/2078-2489/11/3/137). In “A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing...
Read moreIt is a privilege to serve the world in its hour of need – H2O.ai response to the COVID-19 pandemic
During the COVID-19 pandemic, our world, our nations, states, counties, cities and communities face an unprecedented challenge with an urgent need to help our citizens and ultimately our national and global economy. At highest risk are senior citizens, at-risk populations (individuals with immunodeficiency, hypertension, diabetes) and our...
Read moreDetecting Money Laundering Networks Using H2O Driverless AI
Note: Dr. Ashrith Barthur (Principal Security Scientist, H2O.ai) and Sandip Sharma (Director of Solution Engineering, H2O.ai) will be speaking about solving money laundering and other real-world problems using machine learning at our upcoming webinar. You can grab a spot here. Artificial Intelligence has evolved from being a buzz word t...
Read moreAI & ML Platforms: My Fresh Look at H2O.ai Technology
2020: A new year, a new decade, and with that, I’m taking a new and deeper look at the technology H2O.ai offers for building AI and machine learning systems. I’ve been interested in H2O.ai since its early days as a company (it was 0xdata back then) in 2014. My involvement had been only peripheral, but now I’ve begun to work with this comp...
Read moreInterview with Patrick Hall | Machine Learning, H2O.ai & Machine Learning Interpretability
Audio Link: In this episode of Chai Time Data Science , Sanyam Bhutani interviews Patrick Hall, Sr. Director of Product at H2O.ai. Patrick has a background in Math and has completed a MS Course in Analytics.In this interview they talk all about Patrick’s journey into ML, ML Interpretability and his journey at H2O.ai, how his work has ev...
Read moreKey Takeaways from the 2020 Gartner Magic Quadrant for Data Science and Machine Learning
We are named a Visionary in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms (Feb 2020). We have been positioned furthest to the right for completeness of vision among all the vendors evaluated in the quadrant. So let’s walk you through the key strengths of our machine learning platforms. Automatic Machine Learn...
Read moreSpeed up your Data Analysis with Python’s Datatable package
A while ago, I did a write up on Python’s Datatable library . The article was an overview of the datatable package whose focus is on big data support and high performance. The article also compared datatable’s performance with the pandas’ library on certain parameters. This is the second article in the series with a two-fold objective: ...
Read moreParallel Grid Search in H2O
H2O-3 is, at its core, a platform for distributed, in-memory computing. On top of the distributed computation platform, the machine learning algorithms are implemented. At H2O.ai, we design every operation, be it data transformation, training of machine learning models or even parsing to utilize the distributed computation model. In ord...
Read moreThe Super Bowl and Data Science: Changing the NFL with the Power of Machine Learning
Super Bowl LIV came and went. The San Francisco 49ers vs the Kansas City Chiefs. Personally, being from the The Bay, I was rooting for the 49ers, but you can’t always get what you want. Whoever came out on top, though, we were all looking forward to a great game full of fantastic plays and the kind of gridiron tenacity where players lay i...
Read moreGrandmaster Series: How a Passion for Numbers Turned This Mechanical Engineer into a Kaggle Grandmaster
In conversation with Sudalai Rajkumar: A Kaggle Double Grandmaster and a Data Scientist at H2O.aiIt is rightly said that one should never seek praise. Instead, let the effort speak for itself. One of the essential traits of successful people is to never brag about their success but instead keep learning along the way. In the data science ...
Read moreInterview with Arno Candel | AutoML | Physics | CTDS.Show
In this episode, Sanyam Bhutani interviews Dr. Arno Candel: CTO at H2O.ai They talk about Arno’s journey into the field with amazing comments and insights by Arno applicable to the field. They talk all about Arno’s journey and ML, Automated Machine Learning Broadly speaking. Arno’s journey from Physics to Software Engineering to Machine L...
Read moreClimbing the AI and ML Maturity Model Curve
AI/ML Maturity Model Curve/StepsAI/ML Maturity models are published and updated periodically by a lot of vendors. The end goal is almost always about effecting transformation and automate processes in a short period and making AI the DNA/core of the business.One of the biggest challenges for businesses today is to clearly define what succ...
Read moreHow to write a Transformer Recipe for Driverless AI
What is a transformer recipe? A transformer (or feature) recipe is a collection of programmatic steps, the same steps that a data scientist would write a code to build a column transformation. The recipe makes it possible to engineer the transformer in training and in production. The transformer recipe, and recipes, in general, provide a...
Read moreTakeaways from the World’s largest Kaggle Grandmaster Panel
Disclaimer: We were made aware by Kaggle of adversarial actions by one of the members of this panel. This panelist is no longer a Kaggle Grandmaster and no longer affiliated with H2O.ai as of January 10th, 2020. Personally, I’m a firm believer and fan of Kaggle and definitely look at it as the home of Data Science. ...
Read moreA Full-Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer
Content originally posted in HackerNoon and Towards Data Science 15th of October, 2019 marks a special milestone, actually quite a few milestones. So I considered sharing it in the form a blog post, on a publication that has been home to all of my posts The online community has been too kind to me and these blog posts have been a method ...
Read morePredicting Failures from Sensor Data using AI/ML — Part 2
This is Part 2 of the blog post series and continuation of the original post, Predicting Failures from Sensor Data using AI/ML — Part 1 .Missing Values & Data ImbalanceOne of the things to note is that the hard-disk data set has a lot of missing values across its columns. Check out the Missing Data Heat Map on the training data set — ...
Read moreH2O Driverless AI: The Workbench for Data Science
This blog was written by Rohan Gupta and originally published here. 1. IntroductionIn today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you can get by with just intuitive knowledge. Especially if you’re on H2O’s Driverle...
Read moreH2O Driverless AI Acceleration with Intel DAAL
This week at Strata NY 2019 we will be demoing a custom recipe that incorporates the Intel Data Analytics Acceleration Libraray (DAAL) algorithm into Driverless AI. This blog will provide an introduction to Intel DAAL and how the Make-Your-Own-Recipe capability extends H2O Driverless AI. If you are at Strata NY 2019, stop by the Intel bo...
Read moreCustom recipes for Driverless AI: Prophet and pmdarima cases
Last updated: 09/23/19 H2O Driverless AI provides a great new feature called “custom recipes”. These recipes are essentially custom snippets of code which can incorporate any machine learning algorithm , any scorer/metric and any feature transformer. A user can create custom recipes using python utilizing any external library or his/her o...
Read moreNew Innovations in Driverless AI
What’s new in Driverless AIWe’re super excited to announce the latest release of H2O Driverless AI . This is a major release with a ton of new features and functionality. Let’s quickly dig into all of that: Make Your Own AI with Recipes for Every Use Case: In the last year, Driverless AI introduced time-series and NLP recipes to meet the...
Read moreDetecting Sarcasm is difficult, but AI may have an answer
Recently, while shopping for a laptop bag, I stumbled upon a pretty amusing customer review: “This is the best laptop bag ever. It is so good that within two months of use, it is worthy of being used as a grocery bag.” The innate sarcasm in the review is evident as the user isn’t happy with the quality of the bag. However, as the sentence...
Read moreCustom Machine Learning Recipes: The ingredients for success
Last updated: 07/23/19Machine learning is akin to cooking in several ways. A perfect dish originates from a tried-and-tested recipe, has the right combination of ingredients, and is baked at just the right temperature. Successful AI solutions work on the same principle. One needs fresh and right quality ingredients in the form of data, ...
Read moreLeads to Leases
There is such a large amount of unstructured data being produced by companies. I personally find it so interesting that there is so much meaning and hidden value in text, audio, and visual content. Until recently, much of this data would go unused. However, since the rise of machine learning and artificial intelligence, it became possibl...
Read moreArmadaHealth Uses AI to Match Patients with Specialists to Improve Health Outcomes
As an intern for H2O.ai, I am amazed to see how instrumental AI has been in transforming people’s lives for the better. Especially in healthcare, AI is bringing increased efficiency, ease, and helping people lead healthier lives. In this blog, I learned about how AI is helping potential patients find the right specialist for their needs a...
Read moreToward AutoML for Regulated Industry with H2O Driverless AI
Predictive models in financial services must comply with a complex regime of regulations including the Equal Credit Opportunity Act (ECOA), the Fair Credit Reporting Act (FCRA), and the Federal Reserve’s S.R. 11-7 Guidance on Model Risk Management. Among many other requirements, these and other applicable regulations stipulate predictive ...
Read moreUnderwrite.ai Transforms Credit Risk Decision-Making Using AI
Determining credit has been done by traditional techniques for decades. The challenge with traditional credit underwriting is that it doesn’t take into account all of the various aspects or features of an individual’s credit ability. Underwrite.ai, a new credit startup, saw this as an opportunity to apply machine learning and AI to impro...
Read moreThe Reproductive Science Center of SF Bay Area uses AI to Treat Infertility
Having your own baby may be a dream that many people have but some cannot realize until they seek specialized help. The Reproductive Science Center of SF Bay Area is one of the pioneer organizations conducting in-vitro fertilization. They strive to produce healthy babies for their patients. However, every patient has their own set of obst...
Read moreAn Overview of Python’s Datatable package
This blog originally appeared on Towardsdatascience.com “There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days”: Eric Schmidt If you are an R user, chances are that you have already been using the data.ta...
Read moreBuilding an Interpretable & Deployable Propensity AI/ML Model in 7 Steps…
To start with, you may have a tabular data set with a combination of: Dates/Timestamps Categorical Values Text strings Numeric Values A business sponsor wants to build a Propensity to Buy model from historical data.How many Steps does it take? Let’s find out. We are going to use H2O’s Driverless AI instance with 1 GPU (optional...
Read moreH2O.ai Automatic Machine Learning on Red Hat OpenShift Container Platform Delivers Data Science Ease and Flexibility at Scale
Last week at Red Hat Summit in Boston, Sri Ambati, CEO and Founder, demonstrated how to use our award-winning automatic machine learning platform, H2O Driverless AI , on Red Hat OpenShift Container Platform. You can watch the replay here .What we showed not only helps data scientists achieve results, it also enables them to scale their ...
Read moreAI/ML Projects — Don’t get stymied in the last mile
Data Scientists build AI/ML models from data, and then deploy it to production – in addition to a plethora of tasks around data insights, data cleansing etc., Part of the Data Scientist job description/requirement is making models available for transparency, auditability as well as explainability for both regulators as well as internal bu...
Read moreHortifrut uses AI to Determine the Freshness of Blueberries
Who doesn’t love sweet, delicious blueberries?Providing a steady supply of beautiful, tasty berries to the market is no small effort and Hortifrut, based in Chile, has been growing and distributing berries for the last 30 years. Today, they are using AI to provide fresh berries to the world everyday.Hortifrut, the largest global producer ...
Read moreCan Your Machine Learning Model Be Hacked?!
I recently published a longer piece on security vulnerabilities and potential defenses for machine learning models. Here’s a synopsis.IntroductionToday it seems like there are about five major varieties of attacks against machine learning (ML) models and some general concerns and solutions of which to be aware. I’ll address them one-by-o...
Read moreH2O World Explainable Machine Learning Discussions Recap
Earlier this year, in the lead up to and during H2O World, I was lucky enough to moderate discussions around applications of explainable machine learning (ML) with industry-leading practitioners and thinkers. This post contains links to these discussions, written answers and pertinent resources for some of the most common questions asked ...
Read moreH2O-3, Sparkling Water and Enterprise Steam Updates
We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.Below are some of the new features we have added:H2O-3 Yates (3.24.0.1) – 3/31/2019Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html Bug [PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local mach...
Read moreBuilding AI/ML models on Lending Club Data, with H2O.ai — Part 1
Lending Club publishes its basic loan databases to the public and a full version to its customers — anonymized of course. You can find the download page from this link (screenshot below): The publicly downloadable loan data has various attributes — roughly 150+ columns that have categorical, numeric, text and date fields. It also has a ‘...
Read moreHow to explain a model with H2O Driverless AI
The ability to explain and trust the outcome of an AI-driven business decision is now a crucial aspect of the data science journey. There are many tools in the marketplace that claim to provide transparency and interpretability around machine learning models but how does one actually explain a model? H2O Driverless AI provides robust inte...
Read moreWhat is Your AI Thinking? Part 3
In the past two posts we’ve learned a little about interpretable machine learning in general. In this post, we will focus on how to accomplish interpretable machine learning using H2O Driverless AI . To review, the past two posts discussed: Exploratory data analysis (EDA) Accurate and interpretable models Global explanations Local...
Read moreWhat is Your AI Thinking? Part 2
Explaining AI to the Business PersonWelcome to part 2 of our blog series: What is Your AI Thinking? We will explore some of the most promising testing methods for enhancing trust in AI and machine learning models and systems. We will also cover the best practice of model documentation from a business and regulatory standpoint.More Techniq...
Read moreWhat is Your AI Thinking? Part 1
Explaining AI to the Business PersonExplainable AI is in the news, and for good reason. Financial services companies have cited the ability to explain AI-based decisions as one of the critical roadblocks to further adoption of AI for their industry . Moreover, interpretability, fairness, and transparency of data-driven decision support sy...
Read moreWhat Business Leaders Need to Know About AI
The interest around artificial intelligence (AI) is at an all-time fevered pitch right now, and it’s important to understand why.AI can solve real business problems and address very complex situations. Organizations and business leaders should start with the idea of how AI can help by identifying a business problem or use case that they c...
Read moreFor Today’s BI Analyst - Accelerating your AI/ML efforts with Driverless AI
Whether you are starting out as a novice data scientist or a veteran in AI and Machine Learning, modern tools can guide you in creating some of the best models from your data. Not to mention, ease of moving models to production.Also don’t forget the experienced BI Analysts in your organization, who wants to play with data science , only t...
Read moreAnomaly Detection with Isolation Forests using H2O
IntroductionAnomaly detection is a common data science problem where the goal is to identify odd or suspicious observations, events, or items in our data that might be indicative of some issues in our data collection process (such as broken sensors, typos in collected forms, etc.) or unexpected events like security breaches, server failu...
Read moreLaunching the Academic Program … OR ... What Made My First Four Weeks at H2O.ai so Special!
We just launched the H2O.ai Academic Program at our sold-out H2O World London. With nearly 1000 people in attendance, we received the first online sign-up forms submitted by professors and students alike. This program will massively democratize AI in academia, increasing the number of AI-skilled graduates – with both technical and busine...
Read moreHow This AI Tool Breathes New Life Into Data Science
Ask any data scientist in your workplace. Any Data Science Supervised Learning ML/AI project will go through many steps and iterations before it can be put in production. Starting with the question of “Are we solving for a regression or classification problem?” Data Collection & Curation Are there Outliers? What is the Distribu...
Read moreWhat does NVIDIA’s Rapids platform mean for the Data Science community?
Today NVIDIA announced the launch of the RAPIDS suite of software libraries to enables GPU acceleration for data science workflows and we’re excited to partner with NVIDIA to bring GPU accelerated open source technology for the machine learning and AI community. “Machine learning is transforming businesses and NVIDIA GPUs are speeding...
Read moreAutomatic Feature Engineering for Text Analytics - The Latest Addition to Our Kaggle Grandmasters' Recipes
According to Kaggle’s ‘The State of Machine Learning and Data Science ’ survey , text data is the second most used data type at work for data scientists. There are a lot of interesting text analytics applications like sentiment prediction, product categorization, document classification and so on. In the latest version (1.3) of our Driver...
Read moreH2O for Inexperienced Users
Some background: I am a rising senior in highschool, and the summer of 2018, I interned at H2O.ai. With no ML experience beyond Andrew Ng’s Introduction to Machine Learning course on Coursera and a couple of his deep learning courses, I initially found myself slightly overwhelmed by the variety of new algorithms H2O has to offer in both ...
Read moreThe different flavors of AutoML
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software (e.g. H2O , scikit-learn , keras ). Although these tools have made it easy to train and evaluate ma...
Read moreHow to Frame Your Business Problem for Automatic Machine Learning
Over the last several years, machine learning has become an integral part of many organizations’ decision-making at various levels. With not enough data scientists to fill the increasing demand for data-driven business processes, H2O.ai has developed a product called Driverless AI that automates several time consuming aspects of a typica...
Read moreAI in Healthcare - Redefining Patient & Physician Experiences
Register for the Meetup Here Patients, physicians, nurses, health administrators and policymakers are beneficiaries of the rapid transformations in health and life sciences. These transformations are being driven by new discoveries (etiology, therapies, and drugs/implants), market reconfiguration and consolidation, a movement to value-bas...
Read moreCome meet the Makers!
NVIDIA’s GPU Technology Conference (GTC) Silicon Valley, March 26-29th is the premier AI and deep learning event, providing you with training, insights, and direct access to the industry’s best and brightest. It’s where you will see the latest breakthroughs in self-driving cars, smart cities, healthcare, high-performance computing, virtu...
Read moreNew features in H2O 3.18
Wolpert Release (H2O 3.18)There’s a new major release of H2O and it’s packed with new features and fixes! We named this release after David Wolpert , who is famous for inventing Stacking (aka Stacked Ensembles ). Stacking is a central component in H2O AutoML , so we’re very grateful for his contributions to machine learning! He is also fa...
Read moreH2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise
November 30, 2017 | Data Science, Machine Learning | H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise
Read moreLaying a Strong Foundation for Data Science Work
By William Merchan, CSO, DataScience.com In the past few years, data science has become the cornerstone of enterprise companies’ efforts to understand how to deliver better customer experiences. Even so, when DataScience.com commissioned Forrester to survey over 200 data-driven businesses last year, only 22% reported they were leverag...
Read moreH2O.ai Releases H2O4GPU, the Fastest Collection of GPU Algorithms on the Market, to Expedite Machine Learning in Python
H2O4GPU is an open-source collection of GPU solvers created by H2O.ai. It builds on the easy-to-use scikit-learn Python API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algor...
Read moreScalable Automatic Machine Learning: Introducing H2O's AutoML
Prepared by: Erin LeDell, Navdeep Gill & Ray Peck In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts...
Read moreStacked Ensembles and Word2Vec now available in H2O!
Prepared by: Erin LeDell and Navdeep Gill MathJax.Hub.Config({ tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]} }); Stacked Ensembles ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = my_models) Python:ensemble = H2OStackedEnsembleEstimator(base_models=my_models) ensemble.train(x=x, y=y, training...
Read moreUsing Sentiment Analysis to Measure Election Surprise
Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset. One recent event that ...
Read moreCreating a Binary Classifier to Sort Trump vs. Clinton Tweets Using NLP
The problem : Can we determine if a tweet came from the Donald Trump Twitter account (@realDonaldTrump) or the Hillary Clinton Twitter account (@HillaryClinton) using text analysis and Natural Language Processing (NLP) alone? The Solution : Yes! We’ll divide this tutorial into three parts, the first on how to gather the necessary data, t...
Read moreWhen is the Best Time to Look for Apartments on Craigslist?
A while ago I was looking for an apartment in San Francisco. There are a lot of problems with finding housing in San Francisco, mostly stemming from the fierce competition. I was checking Craigslist every single day. It still took me (and my girlfriend) a few months to find a place — and we had to sublet for three weeks in between. Thankf...
Read moreDistracted Driving
Last week, we started to examine the 7.2% increase in traffic fatalities from 2014 to 2015, the reversal of a near decade-long downward trend. We then broke out the data by various accident classifications , such as “speeding” or “driving with a positive BAC,” and identified those classifications that had the greatest increase. One label...
Read moreFatal Traffic Accidents Rise in 2015
On Tuesday, August 30th, the National Highway Traffic Safety Administration released their annual dataset of traffic fatalities asking interested parties to use the dataset to identify the causes of an increase of 7.2% in fatalities from 2014 to 2015. As part of H2O.ai ‘s vision of using artificial intelligence for the betterment of soci...
Read moreRed herring bites
At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them or not; and ii) show how it scales with...
Read moreFast csv writing for R
R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2...
Read more