Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement model interpretation/explanation #311

Open
zhiruiwang opened this issue Mar 31, 2022 · 2 comments
Open

[FEA] Implement model interpretation/explanation #311

zhiruiwang opened this issue Mar 31, 2022 · 2 comments
Labels
P2 question Further information is requested status/needs-triage

Comments

@zhiruiwang
Copy link

🚀 Feature request

Would like to understand if there is any plan to implement/integrate model interpretation/explanation techniques like DeepLIFT or GradientExplainer from SHAP into Merlin?

SHAP natively supports model interpretation for TF and Pytorch, but the data and TF/PyTorch models in Merlin are wrapped in Merlin Classes, so it's not straightforward to directly apply SHAP on top of the model outcome.

If your team can integrate SHAP into Merlin that would be tremendously helpful!

Motivation

As recommendation systems are generally applied in real-world business problems, the ability to make the model a white box is extremely important when presenting the model outcome to the business stakeholders. SHAP already has support for TF and Pytorch models, if it can be integrated into Merlin Models, then the maturity of the product will be up to the next level!

@rnyak rnyak added question Further information is requested P2 and removed question Further information is requested labels Sep 13, 2022
@EvenOldridge
Copy link
Member

We're working on a model evaluation framework that allows for slicing, but we don't have plans for this on our roadmap yet. @zhiruiwang is this something you'd be interested in contributing?

@zhiruiwang
Copy link
Author

Hi @EvenOldridge,

I actually was trying to explore applying SHAP to Merlin models myself earlier.

Since Merlin models made modifications to Keras input, output, and layering structure(Blocks), I can't directly apply DeepExplainer or GradientExplainer to Merlin models although the underlying model is tf.keras. It would require someone who's very familiar with the internal of Merlin models to modify the SHAP package extensively to make those two explainers fit to Merlin's structure.

However, I did make it work using KernelExplainer to intepret Merlin models. I will post some quick code here, your team can take it and put it into example notebooks or integrate it into the codebase as functions.

The code below is applied to the DLRM model after it is trained in RecSys 22 tutorial notebook 2

import shap
from shap import KernelExplainer

# Turn Merlin Dataset intoto pandas since SHAP only accepts pd or np data
valid_pd = (valid.to_ddf().compute().to_pandas()
            [schema.column_names] # Select relevant columns
            .sample(frac=1, random_state=42) # Shuffle the dataset
            .reset_index(drop=True))

def model_fn(inputs):
  """Wrap Merlin model into a function that takes in numpy array passed by SHAP and make prediction
  """
  # SHAP will turn data into numpy array. Converting it back to pd DataFrame
  inputs_pd = pd.DataFrame(inputs)
  # Assign column names to pd DataFrame
  inputs_pd.columns = schema.column_names
  # Wrap pd DataFrame in Merlin Dataset
  dataset = Dataset(inputs_pd)
  # Assign schema to dataset
  dataset.schema = schema
  # Make prediction
  return model.predict(dataset, batch_size=1024).flatten()

# Use a selection of 100 samples represent "typical" feature values
explainer = KernelExplainer(model_fn, valid_pd.iloc[:100,:])
# Use 500 perterbation samples to estimate the SHAP values for 200 samples
shap_values = explainer.shap_values(valid_pd.iloc[300:500,:], nsamples=500)

# Plot SHAP summary plot for 200 samples
shap.summary_plot(shap_values, valid_pd.iloc[300:500,:])

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 question Further information is requested status/needs-triage
Projects
None yet
Development

No branches or pull requests

3 participants