February 9th, 2022

How to Create Your Spotify EDA App with H2O Wave

RSS icon RSS Category: H2O AI Cloud, Technical, Technical Posts, Wave

In this article, I will show you how to build a Spotify Exploratory Data Analysis (EDA) app using H2O Wave from scratch.

H2O Wave is an open-source Python development framework for interactive AI apps. You do not need to know Flask, HTML, CSS, etc. H2O Wave has ready-to-use user-interface components and charts, including dashboard templates, dialogs, themes, widgets, and many more. You just need to customize them for your needs and create your apps with very little effort.

Spotify EDA App

Downloading Spotify Data

In order to download your data, you have to sign in to your Spotify account. Click on “Account” and you should be able to find the “download your data” section in the privacy settings tab. From there you can request your data. It may take up to 3–4 days to reach your e-mail. Check your files and find the stream history data in JSON format. That is the data we need.

Download Data from Spotify

Installing H2O Wave

To install H2O Wave, follow the instructions here. Please note that the installation process has been simplified since version 0.20 with pip install h2o-wave. You will also need to install data manipulation libraries like Pandas. We recommend installing Wave and other libraries in a virtual environment so do check out the documentation for more information.

Data Preparation

We can use the read_json method from Pandas to analyze our JSON data. After transforming the data into a basic data frame, we will have data like this :

Spotify Data

We just have 4 columns: time, artist, track name, and msPlayed – playing time in milliseconds.

Let’s do some basic feature engineering and create some new features (Day, Hour, Month, Year, WeekdayOrNot, Minutes) with the script below.

Basic Feature Engineering

Creating Session ID

There is no session id in the given data. Luckily, we have the end time for each song. With this column, we can create a fictional session id with the code below. It checks the previous song’s end time (data is sorted by time) and sees if there is a 30-minute interval between the songs. If the time interval is more than 30 minutes, it assigns a new session id by increasing the previous row’s session id by 1. It checks all the songs in the data with a simple for loop.

Adding Session ID

For example, there is a gap of more than 30 minutes between the end time of the first song (12:42) and that of the second song (17:16). Therefore, we increase the session id of the second song by 1.

We can now find out how many sessions we have and the average session duration easily by grouping the data as shown below:

Finding the Longest Session

Since we have sessions ids, we can find the longest session and its duration with the code below. We can use the results as new features and feed them into the Wave App later.

Longest Session

Most Streamed Songs and Artists

With a simple group by method, we can find the top 5 most streamed songs and artists as shown below:

Top Songs

Top Artists

Most Streamed Artist by Month

Now, for our Wave app, we will create a new data frame that shows the most listened to artists for each month. We can create a rank column and give a value according to the total minutes played. After that, we can filter the data easily and find the first ranked artist by month.

Most Streamed Artist by Month

Hour, Day, and Month Trends

Similarly, we can use the groupby method to visualize other trends.

Visualizing Trends

Putting Everything Together – H2O Wave App

We want our app to be interactive so that users can upload and analyze their own Spotify data.

For the Wave app, we just need one python file. It starts with @app decorator and the page name (e.g. /spotify). @app is just a decorator for your query handler (or request handler). After that, we have a function that defines the page design. In a Wave app, we can access the page using the query context q. The query context carries useful information about the active UI event, including who issued the event, what information was entered, which buttons were clicked, and so on; and q.page always refers to the page defined at the decorator route @app('/spotify') (in this case).

To add a card for data upload, we visit the H2O Wave example features section here. We need to create a q.page["Stream_History"] section and fill it with a ui.form_card card object. In Wave apps, we can add card objects to pages and fill the page with these cards. We can think of these cards as the base design units. Inside this card object, it should be a text file card to show our messages to users and a file_upload card that enables uploading the file to the system.

What about the positioning? That is easy. Using the ui.form_card card object, we can define the card position on the screen with the box method. For example, box = '1 1 3 6' means “from the first column and first row, create a card with a size of 3 columns and 6 rows“.

After that, we can define q.client.data_path, if there is no folder in the user server, we can create a new folder. When the user uploads data, q.args.datasets ( we named it as ‘datasets’ in file_upload card) will be instantly created with the uploaded file. So we can write an if function. If it is true/we have submitted data, we can call another function and show new cards to the user with the new function (handle_uploaded_data).

This is the landing page with the first “upload data” card

We see that our “Upload Data” card starts from the first column/row (zero point for the screen) and with a size of 3 columns and 6 rows.

In the handle_uploaded_data function which we call after any data upload, there should be some code for data preparation and data visualization. First, we record our data as q.args.datasets in memory. We download the data to the client with q.site.download method and get the name of the path. With that, we can carry out the data manipulation steps as mentioned above.

Adding More Visualizations

Let’s add the most liked artist for each month’s analysis as a graph. First, let’s check out H2O Wave App Gallery for ready-to-use plot codes:

We can create a ui.plot_card and fill it with relevant information. First, we prepare a df_monthly_artists data frame showing the most streamed artists by month. Then we can use ui. plot card to visualize the data as shown below:

For this graph, we can use box=[4,1,3,3] so it will be shown next to the “upload data” card. The graph should look like this:

Adding a Table

Let’s look at the Wave App gallery again for another code example.

Again, we create a new df_top_songs data frame to store the relevant information. We can add another ui.form_card like the previous example and use the q.page.add method to add this new form card to our page. In ui.form_card, we can also add a ui. table object to show the table.

Running the App Locally

You can continue to add more cards to the page as shown below. When you are done with the python script (e.g. spotify_app.py), you can start the app with wave run spotify_app in terminal and visit http://localhost:10101/spotify in browser.

Spotify EDA App

Wave App Deployment

Ready to give it a try? H2O AI Cloud provides a user-friendly, one-stop service for hosting Wave apps. I have uploaded the app to H2O AI Cloud. You can find the Spotify app from “App Store”.

Spotify EDA App on H2O AI Cloud

You can also upload your own Wave app to the “App Store”. First, you will need to package the Wave app into a zip file. Check out this video for more information.

For this Spotify app example, I have already prepared the zip file so you can just download h2o_wave_spotify_eda.zip from this GitHub repository and import the app as shown below. You can also find the source code from the same repository.

H2O AI Cloud → My Apps → Import new App
You can also change the app visibility (private/public) with just a few clicks

Shortly after that, you will be able to visit the Spotify EDA app from “My Apps”.

That’s it. I hope you find this tutorial useful. Sign up for a 90-day free trial today to get a hands-on experience.

About the Author

H2O.ai Team

At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.

Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.

Leave a Reply

Developing and Retaining Data Science Talent

It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is

May 12, 2022 - by Jon Farland
The H2O.ai Wildfire Challenge Winners Blog Series – Team Too Hot Encoder

Note: this is a community blog post by Team Too Hot Encoder - one of

May 10, 2022 - by H2O.ai Team
The H2O.ai Wildfire Challenge Winners Blog Series – Team HTB

Note: this is a community blog post by Team HTB - one of the H2O.ai

May 10, 2022 - by H2O.ai Team
Bias and Debiasing

An important aspect of practicing machine learning in a responsible manner is understanding how models

April 15, 2022 - by Kim Montgomery
Comprehensive Guide to Image Classification using H2O Hydrogen Torch

In this article, we will learn how to build state-of-the-art models in computer vision and

March 29, 2022 - by H2O.ai Team
H2O Wave Snippet Plugin for PyCharm

Note: this blog post by Shamil Dilshan Prematunga was first published on Medium. What is PyCham? PyCharm

March 24, 2022 - by Shamil Prematunga

Start Your Free Trial