Welcome to the Community

We look forward to seeing what you make, maker!

Learn

Self-paced Courses

View All

Technical Documentation

View All

Blogs

Read All

YouTube

Watch All

H2O.ai Fights Fire Challenge

Help first responders and the public with new AI applications that can be used to help save lives and property

Learn More

Find A Meetup Near You

View on Meetup

LOADING...

Slack Community

Discuss, learn and explore with peers and H2O.ai employees the H2O AI Cloud platform, products and services.

Join the Slack Community

Already a member? Login

Stack Overflow

Drunkpiano

Get rule interpretations in h2o rulefit model

Following the example for h2o rulefit model from the documentation (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/rulefit.html), I checked the variable importance of the rules or linear terms used in the model. However, the variables were labelled such as "M0T49N14", which hinders the true conditions derived from tree models. The last column was NA, which was supposed to detail the rules. [enter image description here][1] [1]: https://i.sstatic.net/TM9votrJ.jpg Any idea how the rulefit model should be correctly interpreted, or is it a bug? Thanks!

Ezzeddine Saaid

Machine Failure Prediction Using H2O AutoMl code

i just started learning about H20 AutoMl, so i have this project i'm working on google colab, i'm trying to write a code for Machine Failure Prediction using a NASA Turbofan Jet Engine Data Set from [https://Kaggle.com/datasets/behrad3d/nasa-cmaps][1] , but when i run the AutoMl RMSE is not right, it ether return 0, close to zero 0.06, or values like this 5.72724e-05, i tried a lot of things but nothing worked, as i mentioned before i'm still learning, can someone check my code and explain to me what i should do? or just fix my code but add comments please because i want to understand my mistake, thanks. Note: a friend sent the code to a person who claim to have a PHD, and that person sent back a screenshot after an hour showing 18 on RMSE, but when my friend asked for the code, the person requested a 2000$ for the code which i don't understand why? why so much? maybe he thought i need it for a master or phd thesis or something. My code: # Mount Google Drive to access the dataset from google.colab import drive drive.mount('/content/drive') # Install necessary packages !pip install h2o pandas numpy scikit-learn matplotlib seaborn # Import required libraries import h2o import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from h2o.automl import H2OAutoML from h2o.estimators.deeplearning import H2ODeepLearningEstimator from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, mean_absolute_error # Initialize H2O h2o.init() # Define base path for dataset folder dataset_path = "/content/drive/MyDrive/CMaps/" # Function to load and preprocess the dataset def load_dataset(file_path, rul_file=None, is_train=True): """Loads and preprocesses the dataset. Args: file_path (str): Path to the dataset file. rul_file (str, optional): Path to the RUL file (for test data). Defaults to None. is_train (bool, optional): Whether it's training data. Defaults to True. Returns: pandas.DataFrame: The loaded and preprocessed dataframe. """ # Define column names columns = ["unit_number", "time_in_cycles", "operational_setting_1", "operational_setting_2", "operational_setting_3"] + \ [f"sensor_{i}" for i in range(1, 22)] # 21 sensors # Load data into Pandas DataFrame df = pd.read_csv(file_path, sep=" ", header=None, names=columns, engine="python") # Replace missing values (NaN) with 0 df = df.fillna(0) # Replace NaN with 0 # Calculate Remaining Useful Life (RUL) for each engine max_cycles = df.groupby("unit_number")["time_in_cycles"].max() df["RUL"] = df.apply(lambda row: max_cycles[row["unit_number"]] - row["time_in_cycles"], axis=1) return df # Load training and test data train_file = dataset_path + "train_FD001.txt" test_file = dataset_path + "test_FD001.txt" rul_file = dataset_path + "RUL_FD001.txt" # Corrected to match the actual filename train_df = load_dataset(train_file, is_train=True) test_df = load_dataset(test_file, rul_file, is_train=False) # Check if data is loaded correctly print(train_df.head()) print(test_df.head()) # Define path to the training dataset file (update to the correct path) file_path = "/content/drive/MyDrive/CMaps/train_FD001.txt" # Update to the correct path # Define column names for the dataset columns = ["unit_number", "time_in_cycles", "operational_setting_1", "operational_setting_2", "operational_setting_3"] + \ [f"sensor_{i}" for i in range(1, 22)] # 21 sensors # Load data into Pandas DataFrame df = pd.read_csv(file_path, sep=" ", header=None, names=columns, engine="python") # Remove empty columns (if any) due to formatting issues df = df.dropna(axis=1, how="all") # Calculate Remaining Useful Life (RUL) for each engine max_cycles = df.groupby("unit_number")["time_in_cycles"].max() df["RUL"] = df.apply(lambda row: max_cycles[row["unit_number"]] - row["time_in_cycles"], axis=1) # Loading Data df = pd.read_csv(file_path, sep=" ", header=None, names=columns, engine="python") # Replace missing values (NaN) with 0 instead of removing rows/columns df = df.fillna(0) # Replace NaN with 0 # Calculate Remaining Useful Life (RUL) again for each engine after filling missing values max_cycles = df.groupby("unit_number")["time_in_cycles"].max() df["RUL"] = df.apply(lambda row: max_cycles[row["unit_number"]] - row["time_in_cycles"], axis=1) # Select relevant sensors for the analysis selected_sensors = [ "sensor_2", "sensor_3", "sensor_4", "sensor_7", "sensor_8", "sensor_9", "sensor_11", "sensor_12", "sensor_13", "sensor_14", "sensor_15", "sensor_17", "sensor_20", "sensor_21" ] # Define features and target variable features = ["time_in_cycles"] + selected_sensors X = df[features] y = df["RUL"] # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Convert the data into H2OFrame format train_h2o = h2o.H2OFrame(pd.concat([X_train, y_train], axis=1)) test_h2o = h2o.H2OFrame(pd.concat([X_test, y_test], axis=1)) # Define input columns and target column target = "RUL" features = X_train.columns.tolist() # Initialize AutoML and train the model aml = H2OAutoML(max_models=20, seed=42, max_runtime_secs=600) # You can adjust max_runtime_secs as per your preference aml.train(x=features, y=target, training_frame=train_h2o, validation_frame=test_h2o) # Check the leaderboard to view the models' performance leaderboard = aml.leaderboard print(leaderboard) [1]: https://www.kaggle.com/datasets/behrad3d/nasa-cmaps

Jorge Ovalle

H2O Python AutoML differences between model_performance(train=True) and model_performance(test_data=data_train)

I am working on a binary classification task with class imbalance using H2O AutoML with Python. After training models with H2O AutoML, I obtain serious differences when I compute `model_performance(train=True)` and `model_performance(test_data=data_train)`. To my understanding they should output the same results, but with the first I obtain a AUC of \~0.7 and with the later a AUC of \~0.98. `balance_classes` option is set to `False` and I use `fold_column` for proper stratification and `weights_column` option to balance each fold.

Soma Holiday

h2o Dataframe Causes Function to Hang h2o.remove()

I have a data science project utilizing h2o where I setup a loop of heatmap visualizations for explain-ability and to measure overfitting. I want to be able to call the heatmap via a reusable function so I can return the heatmap to display alone or export a series of them to PDF. When I return the figure from the function it hangs. I've debugged by checking the time prior to return and the first statement after return and it takes around 200 seconds. I spent a bunch of time trying to debug the timing but no matter what I did...it didn't return for 200 seconds. Inevitably, I figured out that there was some sort of garbage collection happening with the h2o dataframe when the function returned. I was able to add the line h2o.remove(shocked_hf) to the function to confirm this. This statement now took 200 seconds and the function returned fine. Here is a snippet of code that shows how the H2OFrame was created: # create dataframe with simulated data to test model shocked_df = pd.DataFrame(shocked_rows) # this h2o frame is only 625 rows by 107 columns shocked_hf = h2o.H2OFrame(shocked_df) # this next statement takes around 200 seconds h2o.remove(shocked_hf) What is going on here? I'd like to call this function multiple times so there is really no reason to clean up this variable. Even if you do clean it up, there has to be a faster way. I've seen some thoughts of using manual garbage collection, however I think that will just introduce other issues. I think I may need to include the loop inside the function as a stopgap solution, but this just doesn't feel right.

View More on Stack Overflow

Product Resources

Get started with our products

Generative AI

Predictive AI

On-Premise Platform

Managed Cloud

Hybrid Cloud

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

ENERGY

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

Submit AI 100 2025 Nomination

2025 Gartner® Magic Quadrant™

H2O AI 100 2024

Welcome to the Community

Learn

H2O.ai Fights Fire Challenge

Help first responders and the public with new AI applications that can be used to help save lives and property

Slack Community

Discuss, learn and explore with peers and H2O.ai employees the H2O AI Cloud platform, products and services.

Stack Overflow

Get rule interpretations in h2o rulefit model

Machine Failure Prediction Using H2O AutoMl code

H2O Python AutoML differences between model_performance(train=True) and model_performance(test_data=data_train)

h2o Dataframe Causes Function to Hang h2o.remove()

Product Resources

Datatable

H2O-3

H2O AI Feature Store

H2O Document AI

H2O Driverless AI

H2O Hydrogen Torch

H2O MLOps

H2O Sparkling Water

H2O Wave

Try the H2O AI Cloud for free for 90 days

Become part of our community by trying H2O.ai with a free 90-day trial

Why H2O.ai

Products

Resources

Insights