H2O Gen AI Ecosystem Overview - Level 2

Udemy Course Rating: 4.6

Last Updated 06 / 2025 English

Intermediate

Created By Sanyam Bhutani

This course builds on the Level 1 overview with a deeper look at Large Language Models and practical GenAI workflows.

Learn how to work with RAG techniques, fine-tune models, prepare datasets, and evaluate performance using tools like Enterprise h2oGPTe, LLM DataStudio, EvalGPT, and the GenAI AppStore.

Led by Kaggle Grandmaster Sanyam Bhutani, the course includes Python labs, research-based materials, and guided practice using H2O.ai tools across the GenAI ecosystem.

What you'll learn

Large Language Model Fundamentals
Build understanding of how LLMs work and their role in enterprise AI applications.

RAG Implementation Techniques
Apply practical retrieval-augmented generation methods using Enterprise H2OGPTe.

Fine-Tuning with LLM DataStudio
Configure and train language models using H2O's specialized fine-tuning platform.

Dataset Preparation Best Practices
Structure and prepare data effectively for training and evaluating language models.

Model Evaluation Methodologies
Use H2O.ai EvalGPT and assessment frameworks to measure model performance and quality.

H2O GenAI Platform Navigation
Work with GenAI AppStore, H2O.ai Wave, and integrated ecosystem tools for end-to-end workflows.

Course Playlist on YouTube

Welcome to our hands-on GenAI LLM training! Dive into the entire life cycle of Large Language Models (LLMs) with practical exercises. From foundational concepts to advanced topics like RAG with LLMs, dataset prep, model fine-tuning, and app creation, we've got you covered. Explore each step with Python notebooks and interactive exercises. Whether you're new or experienced, this course equips you with valuable insights and skills for mastering GenAI tech. Join us!

In this video, we will dive into the basics of large language models (LLMs) pipeline. We'll explore how these models can do more than just predict the next word in a sentence – they can think, reason, and even philosophize. We'll try to understand how LLMs are trained and why they're so powerful. Then, we'll take a look at the inner workings of LLMs, focusing on their architecture and how data preparation plays a crucial role in their performance. We'll also discuss about evaluation methods and techniques for enhancing model performance. One exciting aspect we'll cover is retrieval-augmented generation (RAG), where LLMs use stored documents to generate responses. We'll also touch on prompt engineering, which can further improve LLM performance, and the importance of guardrails in keeping these models on track and unbiased. Finally, we'll briefly mention GenAI Apps – applications that leverage LLMs – and how you can explore them further. This video aims to set the stage for practical exercises in upcoming labs. Ready to explore the world of large language models? Let's dive in!

Learn to summarize documents, create LinkedIn posts, and uncover insights with Enterprise GPTe and Python Notebooks. Follow our hands-on exercises for practical skills and valuable insights, whether you're a beginner or advanced user. Instructions to access Enterprise H2O GPTe: You can gain access to Enterprise H2O GPTe by logging in using your h2o.ai Managed Cloud Account, your Gmail or GitHub credentials via the following link: h2ogpte.genai.h2o.ai Access to the h2oGPT research paper pdf file: h2oGPT: Democratizing Large Language Models The Link for the Python LAB 1 can be found here: LAB 1 - RAG.ipynb Please use the following link for the: rag_url = ‘https://h2ogpte.genai.h2o.ai/’

Discover the potential of Gen AI Apps in this brief video. Explore our Gen Cloud for free access to a myriad of innovative ideas and inspirations. From Call Center GPT to Enterprise H2O GPTe, our offerings provide a glimpse into the possibilities. Dive in, browse, and experiment at your leisure—all at no cost. This class does not have any Python Notebook Lab. Instructions to access GenAI AppStore: You can gain access to GenAI AppStore via the following link: genai.h2o.ai/appstore Instructions to access the H2O.ai Wave Documentation App: You can gain access to Wave App Documentation via the following link: wave.h2o.ai

Lab Three introduces fine-tuning language models. You'll learn to fine-tune both small and large models, beginning with a simple model using Hugging Face's library and progressing to larger models like GPT-3 using LLM Studio. The process involves dataset preparation, tokenization, model setup, and training. Finally, you'll compare fine-tuning traditional models with large language models and explore dataset preparation techniques as homework. Here's how to access LLM DataStudio for training purposes: 1. Visit our Aquarium platform at aquarium.h2o.ai. 2. Watch the following video to learn how to create an account on Aquarium: Accessing h2o.ai Aquarium Labs. 3. After you've gained access to Aquarium, navigate to the LLM Data Studio Lab. 4. Start an instance to access the user interface through the LLM Data Studio URL link at the page's bottom. The instance will be available for you to use for 120 minutes, at the end of which all its data will be erased. Enjoy your training session with LLM Data Studio! 💡Watch our Aquarium walkthrough here: https://youtu.be/FSBlJeSadgw The Link for the Python LAB 3 can be found here: LAB 3 - Fine Tuning.ipynb

In this fourth lab, we'll focus on dataset preparation for Downstream NLP tasks. We'll explore various techniques programmatically in Python, using libraries like PyTorch Transformers, pandas, NumPy, and Matplotlib. The dataset we'll work with consists of LinkedIn influencer posts collected in 2021, containing metadata such as the influencer's name, number of followers, timespan, content, media type, and more. After loading the dataset into the S3 bucket, we'll examine its contents, including the number of examples and influencers. Next, we'll sample a subset of the dataset and begin cleaning it. We'll remove profanity using a threshold approach and conduct quality checks based on the Flesh-Kincaid Grade Level. Additionally, we'll write custom functions to handle whitespace, maximum length, and column selection. After cleaning the dataset, we'll further refine it by selecting the top-performing posts based on reactions. With the cleaned dataset in hand, we'll utilize H2O GPT to generate titles for the influencer content, employing zero-shot prompting. For fine-tuning, we'll create instructions for H2O GPT and run it over the entire dataset. Alternatively, we'll explore LLM Data Studio, a tool specifically designed for LLM-based tasks. This tool streamlines the data preparation process by automatically converting files into question-answer pairs and providing options for cleaning, augmenting, and quality checking. Your homework for this lab is to upload your own documents to Data Studio, experiment with different settings, and observe the outputs. Understanding the nuances of data preparation for LLMs is essential for effectively utilizing these models. Once you've completed this task, we'll move on to the final lab, where we'll learn how to evaluate LLMs. Here's how to access LLM DataStudio for training purposes: 1. Visit our Aquarium platform at aquarium.h2o.ai. 2. Watch the following video to learn how to create an account on Aquarium: Accessing h2o.ai Aquarium Labs. 3. After you've gained access to Aquarium, navigate to the LLM Data Studio Lab. 4. Start an instance to access the user interface through the LLM Data Studio URL link at the page's bottom. The instance will be available for you to use for 120 minutes, at the end of which all its data will be erased. Enjoy your training session with LLM Data Studio! Please be aware that the h2oGPT exercise featured in the current video (found in the One Step Further section of LAB 4 accompanying this notebook) is solely for demonstration purposes. The endpoint used in the demonstration will not function for you. You can access the influencers_data.csv file at the following link: LinkedIn Influencers' Data The Link for the Python LAB 4 can be found here: LAB 4 - Data Preparation.ipynb To access h2oGPT for learning purposes, visit our h2oGPT platform using the link provided: gpt.h2o.ai. You'll have open access using the credentials: username: guest password: guest

In this final lab, you will focus on evaluating large language models (LLMs) programmatically. You will learn to compare LLMs using methods like blue score and Rouge score, but these methods have limitations. The lab introduces a more effective approach: using a third language model as a judge to compare LLMs. The scores are being assigned based on comparing responses from different models. GPT-3.5 is used as the judge in this case, but any model could serve. The lab concludes by encouraging you to further explore model evaluation, watch additional lectures on H2O LLM evaluation, and consider taking a quiz for certification. Feel free to take a look at a more detailed presentation of our LLM EvalGPT app made by Andreea Turcu at the following link: Introducing H2O LLM EvalGPT Instructions to access H2O.ai EvalGPT: You can gain access publicly to H2O.ai EvalGPT via the following link: evalgpt.ai Please be aware that the h2oGPT exercise featured in the current video (found in the One Step Further section of LAB 4 accompanying this notebook) is solely for demonstration purposes. The endpoint used in the demonstration will not function for you. You can access the influencers_data.csv file at the following link: LinkedIn Influencers' Data The Link for the Python LAB 5 can be found here: LAB 5 - Evaluation.ipynb To access h2oGPT for learning purposes, visit our h2oGPT platform using the link provided: gpt.h2o.ai. You'll have open access using the credentials: username: guest password: guest