Return to page

Welcome to the Community

We look forward to seeing what you make, maker!

Learn


Self-paced Courses

View All

 

Docs Docs


Technical Documentation

View All

 


Blogs

Read All

 


YouTube

Watch All

 

H2O.ai Fights Fire Challenge

Help first responders and the public with new AI applications that can be used to help save lives and property

Learn More

June 7, 2022

 

No-code Deep Learning w/ H2O AI Cloud + Call for Speaker + GOTOams Ticket Raffle
 

June 8, 2022

 

Beers & Bytes: AI and ML Networking Event



June 8, 2022

 

How to Scale and Operationalize AI Models in Public Sector Organizations
 

April 28th, 2022

 

Deep Dive into New Capabilities of the H2O AI Cloud Platform


Slack Community

Discuss, learn and explore with peers and H2O.ai employees the H2O AI Cloud platform, products and services.

Join the Slack Community

 

Already a member? Login

Stack Overflow

How do I access h2o xgb model input features after saving a model to disk and reloading it?

I'm using h2o's xgboost implementation in Python. I've saved a model to disk and I'm trying to load it later on for analysis and predicting. I'm trying to access the input features list or, even better, the feature list used by the model which does not include the features it decided not to use. The way people advise doing this is to use `varimp` function to get the variable importance and while this does remove features that aren't used in the model this actually gives you the variable importance of intermediate features created by OHE the categorical features, not the original categorical feature names. I've searched for how to do this and so far I've found the following but no concrete way to do this: 1. [Someone asking something very similar][1] to this and being told the feature has been requested in Jira 2. [Said Jira ticket][2] which has been marked resolved but I believe says this was implemented but not customer visible. 3. [A similar ticket][3] requesting this feature (original categorical feature importance) for variable importance heatmaps but it is still open. 4. [Someone else who found an unofficial way to access the columns][4] with `model._model_json['output']['names']` but that doesn't give the features that weren't used by the model and they are told to use a different method [that doesn't work if you have saved the model to disk and reloaded it][5] (which I am doing). The only option I see is to just use the varimp features, split on period character to break the OHE feature names, select the first part of all the splits, and then run a set over everything to get the unique column names. But I'm hoping there's a better way to do this. [1]: https://stackoverflow.com/questions/51543937/h2o-model-list-dtypes-for-each-feature [2]: https://h2oai.atlassian.net/browse/PUBDEV-5801 [3]: https://h2oai.atlassian.net/browse/PUBDEV-7830?jql=text%20~%20%22varimp%20categorical%22 [4]: https://stackoverflow.com/questions/45153176/is-there-a-supported-way-to-get-list-of-features-used-by-a-h2o-model-during-its [5]: https://stackoverflow.com/questions/45153176/is-there-a-supported-way-to-get-list-of-features-used-by-a-h2o-model-during-its#comment120188912_45154577

How do I generate the archetypes of new dataset from the GLRM predict function

I have used these sites as reference and though has been resourceful, I'm unable to regenerate the reduced dimensions of new datasets via the glrm predict function - https://bradleyboehmke.github.io/HOML/GLRM.html - https://github.com/h2oai/h2o-tutorials/blob/master/best-practices/glrm/GLRM-BestPractices.Rmd I work in the Sparklyr environment with H2o. I'm keen to use the GLRM function to reduce dimensions to cluster. Though from the model, i am able to access the PCAs or Arch, i would like to generate the Archs from the GRLM predict function on new datasets. Appreciate your help. Here is the training of the GLRM model on the training dataset ```r glrm_model <- h2o.glrm( training_frame = train, cols = glrm_cols, loss = "Absolute", model_id = "rank2", seed = 1234, k = 5, transform = "STANDARDIZE", loss_by_col_idx = losses$index, loss_by_col = losses$loss ) # Decompose training frame into XY X <- h2o.getFrame(glrm_model@model$representation_name) #as h2o frame ``` The Arch Types from the training dataset: ```r X Arch1 Arch2 Arch3 Arch4 Arch5 1 0.10141381 0.10958071 0.26773514 0.11584502 0.02865024 2 0.11471676 0.06489475 0.01407714 0.24536782 0.10223535 3 0.08848878 0.26742082 0.04915022 0.11693702 0.03530641 4 -0.03062604 0.29793032 -0.07003814 0.01927975 0.52451867 5 0.09497268 0.12915324 0.21392107 0.08574152 0.03750636 6 0.05857743 0.18863508 0.14570623 0.08695144 0.03448957 ``` But when i wish use the trained GLRM model on new dataset to regenerate these arch types, I got the full dimensions instead of the Arch types as per X above? I'm using these Arch as features for clustering purposes. ```r # Generate predictions on a validation set (if necessary): glrm_pred <- h2o.predict(glrm_model, newdata = test) glrm_pred reconstr_price reconstr_bedrooms reconstr_bathrooms reconstr_sqft_living reconstr_sqft_lot reconstr_floors reconstr_waterfront reconstr_view reconstr_condition reconstr_grade reconstr_sqft_above reconstr_sqft_basement reconstr_yr_built reconstr_yr_renovated 1 -0.8562455 -1.03334892 -1.9903167 -1.3950774 -0.2025564 -1.6537486 0 4 5 13 -1.20187061 -0.6584413 -1.25146116 -0.3042907 2 -0.7940549 -0.29723926 -0.7863867 -0.4364751 -0.1666500 -0.8527297 0 4 5 13 -0.13831432 -0.6545514 0.54821146 -0.3622902 3 -0.7499614 -0.18296317 0.1970824 -0.3989486 -0.1532677 0.4914559 0 4 5 13 -0.09100889 -0.6614534 1.38779632 -0.1844416 4 -1.0941432 0.08954988 0.7872987 -0.2087964 -0.1599888 0.8254916 0 4 5 13 0.11973488 -0.6623575 2.70176558 -0.2363486 5 0.3727360 0.82848389 0.4965246 1.1134378 -0.9013011 -1.3388791 0 4 5 13 0.08427185 2.1354440 -0.07213625 -1.2213866 6 -0.4042458 -0.59876839 -0.9685556 -0.7093578 -0.1745297 -0.5061798 0 4 5 13 -0.43503836 -0.6628391 -0.55165408 -0.2207544 reconstr_lat reconstr_long reconstr_sqft_living15 reconstr_sqft_lot15 1 -0.07307503 -0.4258959 -1.0132778 -0.1964471 2 -0.52124543 0.7283153 0.1242903 -0.1295341 3 -0.56113519 0.6011221 -0.1616215 -0.1624136 4 -0.99759090 1.3032420 0.1556193 -0.1569607 5 0.70028433 -0.6436112 1.1400189 -0.9272790 6 -0.02222403 -0.2257382 -0.4859787 -0.1817499 [6416 rows x 18 columns] ``` thank you

AttributeError: &#39;H2OFrame&#39; object has no attribute &#39;to_html&#39;

I just installed the most recent version of h2o for Python. And it generates the following error: import h2o h2o.init() h2o_df = h2o.H2OFrame(some_df) the error: Traceback (most recent call last): File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 417, in execTableCommand success, res = exec_table_command(command, command_type, File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_tables.py", line 43, in exec_table_command res += repr(tmp_var.head().to_html(notebook=True, AttributeError: 'H2OFrame' object has no attribute 'to_html' Traceback (most recent call last): File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 417, in execTableCommand success, res = exec_table_command(command, command_type, File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_tables.py", line 43, in exec_table_command res += repr(tmp_var.head().to_html(notebook=True, AttributeError: 'H2OFrame' object has no attribute 'to_html' Traceback (most recent call last): File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 417, in execTableCommand success, res = exec_table_command(command, command_type, File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_tables.py", line 43, in exec_table_command res += repr(tmp_var.head().to_html(notebook=True, AttributeError: 'H2OFrame' object has no attribute 'to_html' Traceback (most recent call last): File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 417, in execTableCommand success, res = exec_table_command(command, command_type, File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_tables.py", line 43, in exec_table_command res += repr(tmp_var.head().to_html(notebook=True, AttributeError: 'H2OFrame' object has no attribute 'to_html' Traceback (most recent call last): File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 417, in execTableCommand success, res = exec_table_command(command, command_type, File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_tables.py", line 43, in exec_table_command res += repr(tmp_var.head().to_html(notebook=True, AttributeError: 'H2OFrame' object has no attribute 'to_html' Traceback (most recent call last): File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 417, in execTableCommand success, res = exec_table_command(command, command_type, File "C:\Users\some_user\AppData\Local\JetBrains\DataSpell 2022.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_tables.py", line 43, in exec_table_command res += repr(tmp_var.head().to_html(notebook=True, AttributeError: 'H2OFrame' object has no attribute 'to_html' It also dumps all previous calls and output of h2o. What is wrong here? **UPDATE** I guess I have to add that I am running it in DataSpell. Everything seems to be fine in Jupyter notebook.

Product Resources

Get started with our products

Datatable
 

View on Github
 

H2O-3
 

View on Github
 

H2O AI Feature Store
 

Learn More

H2O Document AI
 

View on Github
Learn More

H2O Driverless AI
 

View on Github
Learn More

H2O Hydrogen Torch
 

Learn More
Product Brief

H2O MLOps
 

Learn More
Product Brief

H2O Sparkling Water
 

View on Github
Learn More

Try the H2O AI Cloud for free for 90 days

Get Started
 

Become part of our community by trying H2O.ai with a free 90-day trial