Today, I would like to cover the most basic use case for H2O Wave, which is collecting a bunch of data and displaying them in a nice and clean way. The goal is to build a simple dashboard that shows how H2O Wave compares against its main competitors in terms of popularity and codebase metrics. The main competitors in question are:
Basic project setup
- Create an empty directory.
- Create a
- Create and activate a Python virtual environment (
python3 -m venv venv && source venv/bin/activate).
- Install H2O Wave (
pip install h2o-wave).
Afterward, let’s write some very basic setup code.
For the script to run, we need to have a Wave server already running and then simply run
python script.py within our activated virtual env.
Why script and not an app?
The reason is simple: We don’t need user interactions. All we want is to just display data in a nice readable way. This also makes it cheaper to run in production and easier to maintain as data updates do not require restarts in comparison to an app. For more detailed info check the official documentation on apps and scripts.
For this project, we will need 2 things, necessary to communicate with API endpoints:
- Github personal access token (see obtaining instructions).
- Twitter bearer token (see obtaining instructions).
When done, to keep things simple, let’s just set them as variables. IMPORTANT: Do not commit them to your repo. If you wish so, rewrite them to environment variables instead.
GraphQL is a somewhat new alternative to traditional REST API. The main advantage it offers is that you can query the data and handpick only those parts that you need instead of getting full data that you may not use anyway.
Let’s define our query that should get us some interesting data about the Github repositories in question:
Let’s break down the query. The first thing to note is fragment definition. This is just a piece of reusable blueprint to save ourselves from manual copy-pasting it for all 4 repositories. Inside, we specify what data (fields) we are interested in. You can browse the whole Github GraphQL schema to explore all the possibilities it offers.
The second part consists of actual queries for the 4 repositories we want to compare. The nice thing about GraphQL is that it allows you to get the needed data in a single request whereas it would take 4 separate ones (one for each repo) compared to the REST approach.
H2O Wave comes with an HTTP client library called HTTPX as its own dependency so let’s take advantage of that and use it as well — no additional installation is needed.
First, we need to define helper functions for data fetching from the designated data sources.
All the functions get
client parameter which is an HTTP client instance (more on that in the next section) to allow making HTTP requests. The second common parameter is
data which is a dictionary to aggregate all the fetched data for later display.
In order to gain performance and maintenance benefits, we use httpx.Client.
So far so good. The code is simple and easy to reason about. We fetch the data one by one. The whole data fetching takes around
6.5 seconds. Since we are creating a static dashboard where data are fetched just once, it’s not a big deal, but let’s learn how to optimize for cases when performance matters.
Asyncio to the rescue, a library that brings concurrency model to Python. The main bottleneck of our code is that request 2 does not start until request 1 is finished and so on (request chaining). This would only be valid if requests depended on each other, but they don’t. What we want to do instead is start all the requests simultaneously (without waiting) and only wait until they are all finished.
Firstly, we need to wrap all the Wave script code into an async function, let’s call it
main. Then, we need to run the main function in a so-called event loop to take advantage of concurrency.
Then convert all helper data fetching functions to async/await as well.
The last remaining thing is to call all the functions. For that, we first need to understand 2 asyncio functions.
asyncio.gather= helper function that takes a list of futures and resolves once they are all resolved. Also returns a future itself.
Equipped with this knowledge, we are ready for the final concurrent data fetching code.
The async code now fetches all the data in around
0.8 seconds which is an 8x performance increase. Not bad at all 🙂
Display the data
The hardest part is done, let’s unleash the power of H2O Wave and see how simple it actually is to create the dashboard (spoiler: < 80 lines of code).
Our full code (± 200 lines of code). Note that you need to obtain auth tokens first (see the Authentication tokens section at the beginning of this post) to run the code successfully.
That after rendering should result in.
As you can see, H2O Wave is still lacking in popularity so asking questions on StackOverflow, and showing off your apps/scripts on Twitter or other social media networks is highly appreciated!