Return to page

H2O.ai WIKI

Shapley Values

What are Shapley Values?

Shapley values are a game theory method that distributes gains and losses to each contributor, based on the margin of contribution given. This concept is often used in game theory, which involves distributing effects to several contributors, dependent on the contributions made by each. This is particularly useful when multiple players are working together and the contributions made are unequal to each other. Shapley values can also be used in machine learning models to explain feature values and outputs. 

 

Use Cases of Shapley Values

Because the Shapley values are based on marginal credit allocation, they can be utilized in many different industries. Below are some suggested use cases.

Marketing Analytics

Many channels are available to marketers, creating various ways to reach potential customers. The Shapley values enable marketers to analyze what channels are most and least effective at bringing in customers.

 

Human Resources

Human resources can utilize this method by applying it to their scheduling and staffing needs. Each worker can be assigned as a contributor and paired properly with other employees at important times. 

Machine learning (ML) models

Shapley values can be used in ML models to ensure fair and unbiased results. There are different methods that can be used in ML models, more commonly referred to as SHAP values (Shapley Additive exPlanations). To read more about Shapley values in Python, click here.

 

Calculating Shapley Values

Below are simplified steps of how to calculate the Shapley value for a single feature (F). 

  1. Create a set of all possible feature combinations (coalitions)

  2. Calculate the average model prediction

  3. For each coalition (feature combination), calculate the difference between the model’s prediction without F and the average prediction

  4. For each coalition, calculate the difference between the model’s prediction with F and the average prediction

  5. For each coalition, calculate how much F changed the model’s prediction from the average (subtract step 3 value from step 4 value), this is the marginal contribution of F

  6. Shapley value is the average of all the values calculated in step 5 (the average of F’s marginal contributions)

In short, the Shapley value of a feature (F) is the average marginal contribution F provides the model across all possible coalitions. 

 

Limitations and disadvantages of Shapley values

While there are many advantages to using Shapley values in ML models, there are some disadvantages. A prominent disadvantage of this method is the time required to calculate results. With the robust amount of coalitions possible, the computing time for determining the output is significant. 

Shapley values require a large selection coalition or subset of features, which causes the number of total features to scale largely. In order to calculate, this method uses approximation techniques and formulates tree or linear models when applicable, but exact computations are unable to be found. Between long amounts of time and inexact results, the Shapley value method may not be the correct method for every model.

 

Shapley Values and H2O

Shapley and SHAP values are included features in AI Hybrid Cloud and H2O-3, through the use of H2O Driverless AI. This provides the ability to compare the Shapley values of the original features with the Shapley values of the transformed features, allowing the user to note differences in the predictive power of the two sets. 

 

Shapley Values Resources from H2O

Shapley Values: A Gentle Introduction