Skip to content

Commit

Permalink
Merge pull request #74 from ZanMervic/scoring_sheet
Browse files Browse the repository at this point in the history
[BLOG] Explain: Scoring Sheet widget blog
  • Loading branch information
markotoplak authored Jan 11, 2024
2 parents 688e35e + 3af5f4a commit 277a601
Show file tree
Hide file tree
Showing 8 changed files with 87 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
author: "Žan Mervič"
date: "2024-01-11"
draft: false
title: "Scoring Sheets: Transform Data into Insightful Scores"
url: "explain-scoring-sheet"
thumbImage: "thumbnail-image.png"
frontPageImage: "thumbnail-image.png"
blog: ["explain", "prototypes", "scoring sheet"]
shortExcerpt: "Orange's Scoring Sheet widget provides explainable machine learning predictions using a simple scoring system. Each feature's influence obtains an integer score, making it easier to understand and communicate the model, crucial in sectors where transparency is vital."
longExcerpt: "Orange's Scoring Sheet widget provides explainable machine learning predictions using a simple scoring system. Each feature's influence obtains an integer score, making it easier to understand and communicate the model, crucial in sectors where transparency is vital."
---

Machine learning models are becoming increasingly powerful and complex tools. This can be very useful in areas such as finance, where accuracy is paramount. However, model complexity can be a double-edged sword, especially when trying to explain the reasoning behind predictions, which can be crucial. Consider the field of medicine, where a model might predict the probability of cancer. In such a scenario, the stakes are incredibly high. It is not just about the model spitting out a prediction; the reasoning behind why it gave that number is just as important; why should a doctor or the patient trust this result?

This is where the Scoring Sheet and its companion, the Scoring Sheet Viewer widgets come into play. These widgets strive to provide explainable predictions, enabling professionals in fields like healthcare to make better-informed decisions.

By now, you should understand the need for explainable and interpretable machine learning models, so let's dive into the Scoring Sheet and see how it works. Let's try to apply the Scoring Sheet widget in a real-world scenario in which we will try to predict the risk of heart disease using the [Heart Disease](https://archive.ics.uci.edu/ml/datasets/heart+Disease) dataset.

<WindowScreenshot src="workflow.png"/>

The workflow above shows the most straightforward way of using the Scoring Sheet widgets. Here, after training the Scoring Sheet model using our dataset, we input it into the Scoring Sheet Viewer widget, which presents us with a, you guessed it, scoring sheet. It shows each feature's contribution to the final score, where a higher score indicates a greater chance for an individual to be classified with the target class. Each feature's contribution can be positive or negative, indicating whether it increases or decreases the risk.

<WindowScreenshot src="scoring-sheet-widget.png"/>

Before we continue with the example, let's try to understand how to use the Scoring Sheet widget and what each parameter does.

- **Number of Attributes After Feature Selection** - The underlying model requires binary features; therefore, it discretizes continuous features and one-hot encodes categorical ones. This parameter helps to reduce the potentially large number of resulting features and ensures faster learning by selecting only the best features for model training.

- **Maximum Number of Decision Parameters** - Model size: balancing complexity and explainability. More parameters (features) can increase accuracy but make explanation harder.

- **Maximum Points per Decision Parameter** - The range of points each decision parameter (feature) can contribute. A wider range can increase model complexity and accuracy but may reduce explainability.

- **Number of Input Features Used** - Specifies how many original features (before binarization) the decision parameters can originate from. This can ensure that each parameter originates from a unique feature or when only a subset of features is desired.

To keep this blog at a reasonable length, we can't go into too much detail on how the Scoring Sheet widget works. However, we should mention the backbone of the widget, a clever algorithm called [FasterRisk](https://github.com/jiachangliu/FasterRisk). If you want to learn more about it, consult the [paper](https://arxiv.org/abs/2210.05846).

Let's return to the example and focus on the Scoring Sheet Viewer widget.

<WindowScreenshot src="workflow-table.png"/>

I've modified the workflow by dividing the data, with a portion routed to the Table widget. This setup allows us to select instances and observe how the Scoring Sheet performs with new, unseen data.

The model for heart disease data features five decision parameters, with points ranging from -5 to 5. We have set the target class to '1,' indicating the 'presence' of heart disease. Thus, positive-value decision parameters increase the risk of heart disease, while those with negative values reduce it.

Consider the selected instance from the Data Table widget. It has a 'slope peak exc ST' attribute value of 'upsloping,' which reduces risk by 3 points. However, it also has the 'chest pain' attribute set to 'asymptomatic,' increasing the risk by 5 points. This results in a total score of 2, corresponding to a 71.6% probability of having heart disease.

And there you have it. Understanding and reading the scoring sheet is straightforward. I am sure you can't wait to try out the new Scoring Sheet widgets yourself. Still, before you do, I should tell you that there also exists an Explain add-on for Orange, which offers other visualizations that can help you better understand your models.

<WindowScreenshot src="visualizations.png"/>

While the Scoring Sheet provides easily understandable explanations, its requirement for binarized attributes can limit its usability in some scenarios. Its visualization tool, the Scoring Sheet Viewer, is also exclusive to this model. A Nomogram offers more freedom because it is not limited to binarized features but is only compatible with Naive Bayes and some linear models. Feature importance highlights the most influential variables in your model, which is perfect for feature selection. SHAP offers instance-level explanations of the contributions of feature combinations. Lastly, ICE Plots show how value changes in a feature affect predictions, which is invaluable for “black-box” models. Unlike Scoring Sheet and Nomogram, Feature Importance, SHAP, and ICE Plots can be used with any model.

<WindowScreenshot src="scoring.png"/>

While the main focus of this blog was explainability, accuracy is also important. We've compared the ScoringSheet model with some other models on different datasets. From the results in the Test and Score widget, we can see that the Scoring Sheet model is not the most accurate but is also not far from it. However, the Scoring Sheet model is much more easily explainable than most other models.

If you want to try the Scoring Sheet widgets for yourself, you should download the Prototypes add-on and the [workflow](explain-scoring-sheet.ows), which we used for the example.
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<?xml version='1.0' encoding='utf-8'?>
<scheme version="2.0" title="" description="">
<nodes>
<node id="0" name="Datasets" qualified_name="Orange.widgets.data.owdatasets.OWDataSets" project_name="Orange3" version="" title="Datasets" position="(71.0, 74.0)" />
<node id="1" name="Scoring Sheet" qualified_name="orangecontrib.prototypes.widgets.owscoringsheet.OWScoringSheet" project_name="Orange3-Prototypes" version="" title="Scoring Sheet" position="(297.0, 66.0)" />
<node id="2" name="Scoring Sheet Viewer" qualified_name="orangecontrib.prototypes.widgets.owscoringsheetviewer.OWScoringSheetViewer" project_name="Orange3-Prototypes" version="" title="Scoring Sheet Viewer" position="(472.0, 74.0)" />
<node id="3" name="Data Sampler" qualified_name="Orange.widgets.data.owdatasampler.OWDataSampler" project_name="Orange3" version="" title="Data Sampler" position="(188.0, 74.0)" />
<node id="4" name="Data Table" qualified_name="Orange.widgets.data.owtable.OWDataTable" project_name="Orange3" version="" title="Data Table" position="(287.0, 161.0)" />
</nodes>
<links>
<link id="0" source_node_id="1" sink_node_id="2" source_channel="Model" sink_channel="Classifier" enabled="true" source_channel_id="model" sink_channel_id="classifier" />
<link id="1" source_node_id="0" sink_node_id="3" source_channel="Data" sink_channel="Data" enabled="true" source_channel_id="data" sink_channel_id="data" />
<link id="2" source_node_id="3" sink_node_id="1" source_channel="Data Sample" sink_channel="Data" enabled="true" source_channel_id="data_sample" sink_channel_id="data" />
<link id="3" source_node_id="3" sink_node_id="4" source_channel="Remaining Data" sink_channel="Data" enabled="true" source_channel_id="remaining_data" sink_channel_id="data" />
<link id="4" source_node_id="4" sink_node_id="2" source_channel="Selected Data" sink_channel="Data" enabled="true" source_channel_id="selected_data" sink_channel_id="data" />
</links>
<annotations />
<thumbnail />
<node_properties>
<properties node_id="0" format="literal">{'controlAreaVisible': True, 'header_state': b'\x00\x00\x00\xff\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\xcc\x00\x00\x00\x07\x01\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\xff\x00\x00\x00\x81\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00 \x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x01\x06\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00O\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00N\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00A\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\xe8\x00\x00\x00\x00d', 'savedWidgetGeometry': b'\x01\xd9\xd0\xcb\x00\x03\x00\x00\xff\xff\xff\xd5\xff\xff\xf9\xd8\x00\x00\x04"\xff\xff\xfb\xea\xff\xff\xff\xd6\xff\xff\xf9\xf6\x00\x00\x04!\xff\xff\xfb\xe9\x00\x00\x00\x01\x00\x00\x00\x00\x0c\x00\xff\xff\xff\xd6\xff\xff\xf9\xf6\x00\x00\x04!\xff\xff\xfb\xe9', 'selected_id': 'core\\heart_disease.tab', 'splitter_state': b'\x00\x00\x00\xff\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x01,\x00\x00\x00\xc8\x01\xff\xff\xff\xff\x01\x00\x00\x00\x02\x00', '__version__': 1}</properties>
<properties node_id="1" format="literal">{'auto_apply': True, 'controlAreaVisible': True, 'custom_features_checkbox': False, 'learner_name': '', 'max_points_per_param': 5, 'num_attr_after_selection': 20, 'num_decision_params': 5, 'num_input_features': 1, 'savedWidgetGeometry': b'\x01\xd9\xd0\xcb\x00\x03\x00\x00\x00\x00\x03\xce\x00\x00\x00\xb7\x00\x00\x04\xe0\x00\x00\x02\x00\x00\x00\x03\xcf\x00\x00\x00\xd5\x00\x00\x04\xdf\x00\x00\x01\xff\x00\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x03\xcf\x00\x00\x00\xd5\x00\x00\x04\xdf\x00\x00\x01\xff', '__version__': 1}</properties>
<properties node_id="2" format="literal">{'controlAreaVisible': True, 'savedWidgetGeometry': b'\x01\xd9\xd0\xcb\x00\x03\x00\x00\x00\x00\x00\xb7\x00\x00\x01j\x00\x00\x03\x0e\x00\x00\x02\xee\x00\x00\x00\xb8\x00\x00\x01\x88\x00\x00\x03\r\x00\x00\x02\xed\x00\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x00\xb8\x00\x00\x01\x88\x00\x00\x03\r\x00\x00\x02\xed', '__version__': 1}</properties>
<properties node_id="3" format="literal">{'compatibility_mode': False, 'controlAreaVisible': True, 'number_of_folds': 10, 'replacement': False, 'sampleSizeNumber': 1, 'sampleSizePercentage': 95, 'sampleSizeSqlPercentage': 0.1, 'sampleSizeSqlTime': 1, 'sampling_type': 0, 'savedWidgetGeometry': b'\x01\xd9\xd0\xcb\x00\x03\x00\x00\x00\x00\x02\x9d\x00\x00\x00\xd4\x00\x00\x03c\x00\x00\x02\\\x00\x00\x02\x9d\x00\x00\x00\xd4\x00\x00\x03c\x00\x00\x02\\\x00\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x02\x9d\x00\x00\x00\xd4\x00\x00\x03c\x00\x00\x02\\', 'selectedFold': 1, 'sql_dl': False, 'stratify': False, 'use_seed': True, '__version__': 2}</properties>
<properties node_id="4" format="literal">{'auto_commit': True, 'color_by_class': True, 'controlAreaVisible': False, 'dist_color_RGB': (220, 220, 220, 255), 'savedWidgetGeometry': b'\x01\xd9\xd0\xcb\x00\x03\x00\x00\x00\x00\x02\x16\x00\x00\x01\x18\x00\x00\x03\xee\x00\x00\x02d\x00\x00\x02\x17\x00\x00\x016\x00\x00\x03\xed\x00\x00\x02c\x00\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x02\x17\x00\x00\x016\x00\x00\x03\xed\x00\x00\x02c', 'select_rows': True, 'selected_cols': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 'selected_rows': [4], 'show_attribute_labels': True, 'show_distributions': False, '__version__': 2}</properties>
</node_properties>
<session_state>
<window_groups />
</session_state>
</scheme>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 277a601

Please sign in to comment.