Return to page

Welcome to the Community

We look forward to seeing what you make, maker!

Learn


Self-paced Courses

View All

 

Docs Docs


Technical Documentation

View All

 


Blogs

Read All

 


YouTube

Watch All

 

H2O.ai Fights Fire Challenge

Help first responders and the public with new AI applications that can be used to help save lives and property

Learn More

Find A Meetup Near You
June 7, 2022

 

No-code Deep Learning w/ H2O AI Cloud + Call for Speaker + GOTOams Ticket Raffle
 

June 8, 2022

 

Beers & Bytes: AI and ML Networking Event



June 8, 2022

 

How to Scale and Operationalize AI Models in Public Sector Organizations
 

April 28th, 2022

 

Deep Dive into New Capabilities of the H2O AI Cloud Platform


View on Meetup

Slack Community

Discuss, learn and explore with peers and H2O.ai employees the H2O AI Cloud platform, products and services.

Join the Slack Community

 

Already a member? Login

Stack Overflow

Skipgram probabilities with H2o.ai using R

I am trying to create skipgram probabilities using H2o.ai with the goal of predicting next word / surrounding word probabilities in a large corpus of medical ICD10 diagnoses e.g., " D126 K5730 R109 R1011 R1084". I read the excellent tutorial by Julia Silge regarding the use of the R, Tidyverse package to create vector representations of news articles [here][1], and have managed to replicate the tutorial with a small sample of my data. Unfortunately, I can not scale the code to handle my data which consists of 7 M patient records over several years of experience, ( I would share the data but it is highly sensitive). My problems start right at model creation where I exceed my Mac's 16 GB memory (vector memory space exceeded), with a small sample of my data (500k patients x 450 diagnoses). tidy_skipgrams <- hacker_news_text %>% unnest_tokens(ngram, text, token = "ngrams", n = 8) %>% mutate(ngramID = row_number()) %>% unite(skipgramID, postID, ngramID) %>% unnest_tokens(word, ngram) Would it be possible to replicate Julia's tutorial using a multi-threaded program such as H2o.ai or other tool? I am specifically interested in words that occur together. I would use to fid diagnoses that occur together normalized_prob %>% filter(word1 == "facebook") %>% arrange(-p_together) and what Julia calls "word math". I would use in a scenario.. Patient has diabetes but not hypertension. What diseases might they encounter? mystery_product <- word_vectors["iphone",] - word_vectors["apple",] + word_vectors["microsoft",] search_synonyms(word_vectors, mystery_product) If anyone has seen similar examples in R, I would appreciate you sharing them. [1]: https://juliasilge.com/blog/tidy-word-vectors/

Product Resources

Get started with our products

Datatable
 

View on Github
 

H2O-3
 

View on Github
 

H2O AI Feature Store
 

Learn More

H2O Document AI
 

View on Github
Learn More

H2O Driverless AI
 

View on Github
Learn More

H2O Hydrogen Torch
 

Learn More
Product Brief

H2O MLOps
 

Learn More
Product Brief

H2O Sparkling Water
 

View on Github
Learn More

Try the H2O.ai Cloud for free for 90 days

Get Started
 

Become part of our community by trying H2O.ai with a free 90-day trial