Shapley values are a game theory method that distributes gains and losses to each contributor, based on the margin of contribution given. This concept is often used in game theory, which involves distributing effects to several contributors, dependent on the contributions made by each. This is particularly useful when multiple players are working together and the contributions made are unequal to each other. Shapley values can also be used in machine learning models to explain feature values and outputs.
Because the Shapley values are based on marginal credit allocation, they can be utilized in many different industries. Below are some suggested use cases.
Many channels are available to marketers, creating various ways to reach potential customers. The Shapley values enable marketers to analyze what channels are most and least effective at bringing in customers.
Human resources can utilize this method by applying it to their scheduling and staffing needs. Each worker can be assigned as a contributor and paired properly with other employees at important times.
Shapley values can be used in ML models to ensure fair and unbiased results. There are different methods that can be used in ML models, more commonly referred to as SHAP values (Shapley Additive exPlanations). To read more about Shapley values in Python, click here.
Below are simplified steps of how to calculate the Shapley value for a single feature (F).
Create a set of all possible feature combinations (coalitions)
Calculate the average model prediction
For each coalition (feature combination), calculate the difference between the model’s prediction without F and the average prediction
For each coalition, calculate the difference between the model’s prediction with F and the average prediction
For each coalition, calculate how much F changed the model’s prediction from the average (subtract step 3 value from step 4 value), this is the marginal contribution of F
Shapley value is the average of all the values calculated in step 5 (the average of F’s marginal contributions)
In short, the Shapley value of a feature (F) is the average marginal contribution F provides the model across all possible coalitions.
While there are many advantages to using Shapley values in ML models, there are some disadvantages. A prominent disadvantage of this method is the time required to calculate results. With the robust amount of coalitions possible, the computing time for determining the output is significant.
Shapley values require a large selection coalition or subset of features, which causes the number of total features to scale largely. In order to calculate, this method uses approximation techniques and formulates tree or linear models when applicable, but exact computations are unable to be found. Between long amounts of time and inexact results, the Shapley value method may not be the correct method for every model.
Shapley and SHAP values are included features in AI Hybrid Cloud and H2O-3, through the use of H2O Driverless AI. This provides the ability to compare the Shapley values of the original features with the Shapley values of the transformed features, allowing the user to note differences in the predictive power of the two sets.