From fc41bdd8a95065784135a90492f29e3e23fbe614 Mon Sep 17 00:00:00 2001
From: AryanAgarwal27 <67140930+AryanAgarwal27@users.noreply.github.com>
Date: Wed, 1 Jan 2025 20:03:08 -0800
Subject: [PATCH] #1021 Fintech example documentation for Cleanlab
 implementation (#96)

---
 1021_fintech_documentation/Final.ipynb     | 2707 ++++++++++++++++++++
 1021_fintech_documentation/Requirement.txt |    4 +
 2 files changed, 2711 insertions(+)
 create mode 100644 1021_fintech_documentation/Final.ipynb
 create mode 100644 1021_fintech_documentation/Requirement.txt

diff --git a/1021_fintech_documentation/Final.ipynb b/1021_fintech_documentation/Final.ipynb
new file mode 100644
index 0000000..dd6e17c
--- /dev/null
+++ b/1021_fintech_documentation/Final.ipynb
@@ -0,0 +1,2707 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "id": "b2eebf0d-31ff-4ce0-b2b7-4d82ee61150b",
+      "metadata": {
+        "id": "b2eebf0d-31ff-4ce0-b2b7-4d82ee61150b"
+      },
+      "source": [
+        "# Detecting Data Quality Issues in Credit Card Fraud Detection Using Cleanlab\n",
+        "\n",
+        "In this 5-minute quickstart tutorial, we will use **Cleanlab's Datalab** to detect various issues in a tabular dataset commonly encountered in financial applications. This tutorial focuses on the **Credit Card Fraud Detection dataset**, which contains thousands of transaction records labeled as fraudulent or non-fraudulent. The dataset includes features such as transaction amount and anonymized variables for privacy.\n",
+        "\n",
+        "### Cleanlab Helps Uncover:\n",
+        "- **Label errors**: Mislabeled transactions, such as fraudulent cases incorrectly marked as non-fraudulent.\n",
+        "- **Outliers**: Transactions with abnormal patterns that deviate significantly from the rest of the dataset.\n",
+        "- **Near-duplicates**: Repeated transactions or entries that may distort results or impact model performance.\n",
+        "\n",
+        "Using Cleanlab, we automatically identify examples that are likely mislabeled or problematic, improving the overall data quality for better fraud detection performance. You can adapt this tutorial to detect and correct issues in your own financial tabular datasets.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "27fcddca-534f-4851-8c80-688e7cb7ff79",
+      "metadata": {
+        "id": "27fcddca-534f-4851-8c80-688e7cb7ff79"
+      },
+      "source": [
+        "## Quickstart\n",
+        "\n",
+        "Already have (out-of-sample) `pred_probs` from a model trained on your original data labels?\n",
+        "Have a `knn_graph` computed between dataset examples (reflecting similarity in their feature values)?\n",
+        "Run the code below to find issues in your dataset.\n"
+      ]
+    },
+    {
+      "cell_type": "raw",
+      "id": "3dc09d4e-499e-4aed-942e-57f1df4deca7",
+      "metadata": {
+        "id": "3dc09d4e-499e-4aed-942e-57f1df4deca7"
+      },
+      "source": [
+        "from cleanlab import Datalab\n",
+        "lab = Datalab(data=your_dataset, label_name=\"column_name_of_labels\")\n",
+        "lab.find_issues(pred_probs=your_pred_probs, knn_graph=knn_graph)\n",
+        "\n",
+        "lab.get_issues()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "791717d7-d140-4d85-b516-9a6e6f28c7c0",
+      "metadata": {
+        "id": "791717d7-d140-4d85-b516-9a6e6f28c7c0"
+      },
+      "source": [
+        "# 1. Install Required Dependencies\n",
+        "\n",
+        "To get started, install the required packages for this tutorial using pip:\n",
+        "\n",
+        "```bash\n",
+        "!pip install \"cleanlab[datalab]\" scikit-learn pandas numpy\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Install required libraries with correct versions\n",
+        "!pip install \"cleanlab[datalab]\" \"numpy\" \"pandas==1.3.3\" \"scikit-learn==1.0.2\" \"scikit-image==0.18.3\"\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "f7TpME1a6-Db",
+        "outputId": "9f5642dc-ae12-464b-e65f-77d0f57b4ce0"
+      },
+      "id": "f7TpME1a6-Db",
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (1.22.0)\n",
+            "Requirement already satisfied: pandas==1.3.3 in /usr/local/lib/python3.10/dist-packages (1.3.3)\n",
+            "Requirement already satisfied: scikit-learn==1.0.2 in /usr/local/lib/python3.10/dist-packages (1.0.2)\n",
+            "Requirement already satisfied: scikit-image==0.18.3 in /usr/local/lib/python3.10/dist-packages (0.18.3)\n",
+            "Requirement already satisfied: cleanlab[datalab] in /usr/local/lib/python3.10/dist-packages (2.5.0)\n",
+            "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.10/dist-packages (from pandas==1.3.3) (2.8.2)\n",
+            "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.10/dist-packages (from pandas==1.3.3) (2024.2)\n",
+            "Requirement already satisfied: scipy>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn==1.0.2) (1.11.4)\n",
+            "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.10/dist-packages (from scikit-learn==1.0.2) (1.4.2)\n",
+            "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn==1.0.2) (3.5.0)\n",
+            "Requirement already satisfied: matplotlib!=3.0.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-image==0.18.3) (3.8.0)\n",
+            "Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.10/dist-packages (from scikit-image==0.18.3) (3.4.2)\n",
+            "Requirement already satisfied: pillow!=7.1.0,!=7.1.1,>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from scikit-image==0.18.3) (11.0.0)\n",
+            "Requirement already satisfied: imageio>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from scikit-image==0.18.3) (2.36.1)\n",
+            "Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.10/dist-packages (from scikit-image==0.18.3) (2024.9.20)\n",
+            "Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-image==0.18.3) (1.4.1)\n",
+            "Requirement already satisfied: tqdm>=4.53.0 in /usr/local/lib/python3.10/dist-packages (from cleanlab[datalab]) (4.66.6)\n",
+            "Requirement already satisfied: termcolor>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from cleanlab[datalab]) (2.5.0)\n",
+            "Requirement already satisfied: datasets>=2.7.0 in /usr/local/lib/python3.10/dist-packages (from cleanlab[datalab]) (3.2.0)\n",
+            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (3.16.1)\n",
+            "Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (17.0.0)\n",
+            "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (0.3.8)\n",
+            "Requirement already satisfied: requests>=2.32.2 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (2.32.3)\n",
+            "Requirement already satisfied: xxhash in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (3.5.0)\n",
+            "Requirement already satisfied: multiprocess<0.70.17 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (0.70.16)\n",
+            "Requirement already satisfied: fsspec<=2024.9.0,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets>=2.7.0->cleanlab[datalab]) (2024.9.0)\n",
+            "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (3.11.10)\n",
+            "Requirement already satisfied: huggingface-hub>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (0.26.5)\n",
+            "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (24.2)\n",
+            "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.7.0->cleanlab[datalab]) (6.0.2)\n",
+            "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.18.3) (1.2.1)\n",
+            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.18.3) (0.12.1)\n",
+            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.18.3) (4.55.3)\n",
+            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.18.3) (1.4.7)\n",
+            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.18.3) (3.2.0)\n",
+            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7.3->pandas==1.3.3) (1.17.0)\n",
+            "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.7.0->cleanlab[datalab]) (2.4.4)\n",
+            "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.7.0->cleanlab[datalab]) (1.3.1)\n",
+            "Requirement already satisfied: async-timeout<6.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.7.0->cleanlab[datalab]) (4.0.3)\n",
+            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.7.0->cleanlab[datalab]) (24.2.0)\n",
+            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.7.0->cleanlab[datalab]) (1.5.0)\n",
+            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.7.0->cleanlab[datalab]) (6.1.0)\n",
+            "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.7.0->cleanlab[datalab]) (0.2.1)\n",
+            "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.7.0->cleanlab[datalab]) (1.18.3)\n",
+            "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.23.0->datasets>=2.7.0->cleanlab[datalab]) (4.12.2)\n",
+            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->datasets>=2.7.0->cleanlab[datalab]) (3.4.0)\n",
+            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->datasets>=2.7.0->cleanlab[datalab]) (3.10)\n",
+            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->datasets>=2.7.0->cleanlab[datalab]) (2.2.3)\n",
+            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->datasets>=2.7.0->cleanlab[datalab]) (2024.8.30)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "id": "2af19c9f-970f-4f2e-ab48-392b376c0b98",
+      "metadata": {
+        "id": "2af19c9f-970f-4f2e-ab48-392b376c0b98"
+      },
+      "outputs": [],
+      "source": [
+        "import random\n",
+        "import numpy as np\n",
+        "import pandas as pd\n",
+        "\n",
+        "from sklearn.model_selection import cross_val_predict\n",
+        "from sklearn.preprocessing import StandardScaler\n",
+        "from sklearn.linear_model import LogisticRegression\n",
+        "from sklearn.neighbors import NearestNeighbors\n",
+        "\n",
+        "\n",
+        "from cleanlab import Datalab\n",
+        "\n",
+        "# Set random seed for reproducibility\n",
+        "SEED = 42\n",
+        "np.random.seed(SEED)\n",
+        "random.seed(SEED)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "3ef577c5-b3f1-4d5b-9b26-9324651b12fd",
+      "metadata": {
+        "id": "3ef577c5-b3f1-4d5b-9b26-9324651b12fd"
+      },
+      "source": [
+        "# 2. Load and Process the Data\n",
+        "\n",
+        "We will now load the Credit Card Fraud Detection dataset, which contains features like transaction amounts and anonymized variables, along with labels indicating whether the transaction is fraudulent (`1`) or non-fraudulent (`0`).\n",
+        "\n",
+        "First, we load the dataset and display the first few rows to get an overview of the data structure.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "id": "ea80ab8d-6461-47d9-ac45-7048540b4650",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "id": "ea80ab8d-6461-47d9-ac45-7048540b4650",
+        "outputId": "35249fcf-22d4-433c-faa9-5871138a6db0"
+      },
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "   TransactionID             TransactionDate   Amount  MerchantID  \\\n",
+              "0              1  2024-04-03 14:15:35.462794  4189.27         688   \n",
+              "1              2  2024-03-19 13:20:35.462824  2659.71         109   \n",
+              "2              3  2024-01-08 10:08:35.462834   784.00         394   \n",
+              "3              4  2024-04-13 23:50:35.462850  3514.40         944   \n",
+              "4              5  2024-07-12 18:51:35.462858   369.07         475   \n",
+              "\n",
+              "  TransactionType      Location  IsFraud  \n",
+              "0          refund   San Antonio        0  \n",
+              "1          refund        Dallas        0  \n",
+              "2        purchase      New York        0  \n",
+              "3        purchase  Philadelphia        0  \n",
+              "4        purchase       Phoenix        0  "
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-1fa2024e-e3de-4936-9fa2-090dc948076d\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>TransactionID</th>\n",
+              "      <th>TransactionDate</th>\n",
+              "      <th>Amount</th>\n",
+              "      <th>MerchantID</th>\n",
+              "      <th>TransactionType</th>\n",
+              "      <th>Location</th>\n",
+              "      <th>IsFraud</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>1</td>\n",
+              "      <td>2024-04-03 14:15:35.462794</td>\n",
+              "      <td>4189.27</td>\n",
+              "      <td>688</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>San Antonio</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2</td>\n",
+              "      <td>2024-03-19 13:20:35.462824</td>\n",
+              "      <td>2659.71</td>\n",
+              "      <td>109</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>Dallas</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3</td>\n",
+              "      <td>2024-01-08 10:08:35.462834</td>\n",
+              "      <td>784.00</td>\n",
+              "      <td>394</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>New York</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>4</td>\n",
+              "      <td>2024-04-13 23:50:35.462850</td>\n",
+              "      <td>3514.40</td>\n",
+              "      <td>944</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>Philadelphia</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>5</td>\n",
+              "      <td>2024-07-12 18:51:35.462858</td>\n",
+              "      <td>369.07</td>\n",
+              "      <td>475</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>Phoenix</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-1fa2024e-e3de-4936-9fa2-090dc948076d')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-1fa2024e-e3de-4936-9fa2-090dc948076d button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-1fa2024e-e3de-4936-9fa2-090dc948076d');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "<div id=\"df-551315e6-4862-409f-9786-3910638f4897\">\n",
+              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-551315e6-4862-409f-9786-3910638f4897')\"\n",
+              "            title=\"Suggest charts\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "     width=\"24px\">\n",
+              "    <g>\n",
+              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "    </g>\n",
+              "</svg>\n",
+              "  </button>\n",
+              "\n",
+              "<style>\n",
+              "  .colab-df-quickchart {\n",
+              "      --bg-color: #E8F0FE;\n",
+              "      --fill-color: #1967D2;\n",
+              "      --hover-bg-color: #E2EBFA;\n",
+              "      --hover-fill-color: #174EA6;\n",
+              "      --disabled-fill-color: #AAA;\n",
+              "      --disabled-bg-color: #DDD;\n",
+              "  }\n",
+              "\n",
+              "  [theme=dark] .colab-df-quickchart {\n",
+              "      --bg-color: #3B4455;\n",
+              "      --fill-color: #D2E3FC;\n",
+              "      --hover-bg-color: #434B5C;\n",
+              "      --hover-fill-color: #FFFFFF;\n",
+              "      --disabled-bg-color: #3B4455;\n",
+              "      --disabled-fill-color: #666;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart {\n",
+              "    background-color: var(--bg-color);\n",
+              "    border: none;\n",
+              "    border-radius: 50%;\n",
+              "    cursor: pointer;\n",
+              "    display: none;\n",
+              "    fill: var(--fill-color);\n",
+              "    height: 32px;\n",
+              "    padding: 0;\n",
+              "    width: 32px;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart:hover {\n",
+              "    background-color: var(--hover-bg-color);\n",
+              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "    fill: var(--button-hover-fill-color);\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart-complete:disabled,\n",
+              "  .colab-df-quickchart-complete:disabled:hover {\n",
+              "    background-color: var(--disabled-bg-color);\n",
+              "    fill: var(--disabled-fill-color);\n",
+              "    box-shadow: none;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-spinner {\n",
+              "    border: 2px solid var(--fill-color);\n",
+              "    border-color: transparent;\n",
+              "    border-bottom-color: var(--fill-color);\n",
+              "    animation:\n",
+              "      spin 1s steps(1) infinite;\n",
+              "  }\n",
+              "\n",
+              "  @keyframes spin {\n",
+              "    0% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "      border-left-color: var(--fill-color);\n",
+              "    }\n",
+              "    20% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    30% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    40% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    60% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    80% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "    90% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "  }\n",
+              "</style>\n",
+              "\n",
+              "  <script>\n",
+              "    async function quickchart(key) {\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#' + key + ' button');\n",
+              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
+              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
+              "      try {\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'suggestCharts', [key], {});\n",
+              "      } catch (error) {\n",
+              "        console.error('Error during call to suggestCharts:', error);\n",
+              "      }\n",
+              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+              "    }\n",
+              "    (() => {\n",
+              "      let quickchartButtonEl =\n",
+              "        document.querySelector('#df-551315e6-4862-409f-9786-3910638f4897 button');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "    })();\n",
+              "  </script>\n",
+              "</div>\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "variable_name": "fraud_data",
+              "summary": "{\n  \"name\": \"fraud_data\",\n  \"rows\": 100000,\n  \"fields\": [\n    {\n      \"column\": \"TransactionID\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 28867,\n        \"min\": 1,\n        \"max\": 100000,\n        \"num_unique_values\": 100000,\n        \"samples\": [\n          75722,\n          80185,\n          19865\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TransactionDate\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"num_unique_values\": 100000,\n        \"samples\": [\n          \"2024-08-18 01:11:35.918051\",\n          \"2024-06-09 07:44:35.939541\",\n          \"2024-06-10 08:55:35.558368\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1442.4159985963513,\n        \"min\": 1.05,\n        \"max\": 4999.77,\n        \"num_unique_values\": 90621,\n        \"samples\": [\n          3273.37,\n          4040.01,\n          4120.55\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"MerchantID\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 288,\n        \"min\": 1,\n        \"max\": 1000,\n        \"num_unique_values\": 1000,\n        \"samples\": [\n          702,\n          152,\n          346\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TransactionType\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"purchase\",\n          \"refund\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Location\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 10,\n        \"samples\": [\n          \"Houston\",\n          \"Dallas\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"IsFraud\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 1,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ],
+      "source": [
+        "fraud_data = pd.read_csv(\"credit_card_fraud_dataset.csv\")\n",
+        "fraud_data.head()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "id": "a2e0a535-7ece-4607-ba19-ee6e7eac9374",
+      "metadata": {
+        "id": "a2e0a535-7ece-4607-ba19-ee6e7eac9374"
+      },
+      "outputs": [],
+      "source": [
+        "# Select relevant features and labels\n",
+        "X_raw = fraud_data[[\"Amount\", \"TransactionType\", \"Location\"]]\n",
+        "y = fraud_data[\"IsFraud\"]"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "2600e93b-60d9-4d54-b8a6-092563c19aff",
+      "metadata": {
+        "id": "2600e93b-60d9-4d54-b8a6-092563c19aff"
+      },
+      "source": [
+        "We will now preprocess the dataset to prepare it for analysis. This involves:\n",
+        "1. Selecting relevant features (e.g., `Amount`, `TransactionType`, `Location`).\n",
+        "2. Encoding categorical variables (e.g., `TransactionType` and `Location`) using one-hot encoding.\n",
+        "3. Standardizing numerical variables (e.g., `Amount`) to ensure all features are on a similar scale.\n",
+        "\n",
+        "Next, we assign the preprocessed features to `X` and the labels (`IsFraud`) to `y`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "id": "814c39d6-4490-47db-8a0a-ded45fc5a09a",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "814c39d6-4490-47db-8a0a-ded45fc5a09a",
+        "outputId": "6d8567f0-8396-450c-abc1-9fba07e1c4f5"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "     Amount  TransactionType_refund  Location_Dallas  Location_Houston  \\\n",
+            "0  1.173161                    True            False             False   \n",
+            "1  0.112740                    True             True             False   \n",
+            "2 -1.187661                   False            False             False   \n",
+            "3  0.705284                   False            False             False   \n",
+            "4 -1.475326                   False            False             False   \n",
+            "\n",
+            "   Location_Los Angeles  Location_New York  Location_Philadelphia  \\\n",
+            "0                 False              False                  False   \n",
+            "1                 False              False                  False   \n",
+            "2                 False               True                  False   \n",
+            "3                 False              False                   True   \n",
+            "4                 False              False                  False   \n",
+            "\n",
+            "   Location_Phoenix  Location_San Antonio  Location_San Diego  \\\n",
+            "0             False                  True               False   \n",
+            "1             False                 False               False   \n",
+            "2             False                 False               False   \n",
+            "3             False                 False               False   \n",
+            "4              True                 False               False   \n",
+            "\n",
+            "   Location_San Jose  \n",
+            "0              False  \n",
+            "1              False  \n",
+            "2              False  \n",
+            "3              False  \n",
+            "4              False  \n"
+          ]
+        }
+      ],
+      "source": [
+        "# One-hot encode categorical features\n",
+        "categorical_features = [\"TransactionType\", \"Location\"]\n",
+        "X_encoded = pd.get_dummies(X_raw, columns=categorical_features, drop_first=True)\n",
+        "\n",
+        "# Standardize numerical features\n",
+        "numeric_features = [\"Amount\"]\n",
+        "scaler = StandardScaler()\n",
+        "X_encoded[numeric_features] = scaler.fit_transform(X_encoded[numeric_features])\n",
+        "\n",
+        "# Display preprocessed data\n",
+        "print(X_encoded.head())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "d394b6cd-34fd-4ec0-8f6f-99d13dd481ab",
+      "metadata": {
+        "id": "d394b6cd-34fd-4ec0-8f6f-99d13dd481ab"
+      },
+      "source": [
+        "### 3. Select a Classification Model and Compute Out-of-Sample Predicted Probabilities\n",
+        "\n",
+        "To detect potential label errors in the **Credit Card Fraud Detection dataset**, Cleanlab requires **probabilistic predictions** for every data point. However, predictions generated on the same data used for training can be **overfitted** and unreliable. For accurate results, Cleanlab works best with **out-of-sample** predicted class probabilities—i.e., predictions for data points excluded from the model during training.\n",
+        "\n",
+        "---\n",
+        "\n",
+        "### Why Use Out-of-Sample Predictions?\n",
+        "\n",
+        "Out-of-sample predictions ensure that the model hasn't seen the data points during training. This approach:\n",
+        "- **Prevents overfitting**: Predictions are not biased by the training process.\n",
+        "- **Improves reliability**: Probabilities are closer to real-world performance.\n",
+        "- **Supports Cleanlab's analysis**: Enables Cleanlab to accurately identify mislabeled data and other issues.\n",
+        "\n",
+        "---\n",
+        "\n",
+        "### How We Generate Out-of-Sample Predictions\n",
+        "\n",
+        "We use **K-fold cross-validation**, which:\n",
+        "1. Splits the dataset into `K` folds.\n",
+        "2. Trains the model on `K-1` folds and predicts probabilities on the excluded fold.\n",
+        "3. Repeats this for all folds so that every data point gets a prediction from a model that has not seen it during training.\n",
+        "\n",
+        "This ensures every data point has **out-of-sample predicted probabilities**.\n",
+        "\n",
+        "---\n",
+        "\n",
+        "### Model: Logistic Regression\n",
+        "\n",
+        "For this tutorial, we use **Logistic Regression**, a simple and interpretable model commonly used in fraud detection tasks. It predicts the probability of each class (`0` for non-fraud, `1` for fraud) based on the input features.\n",
+        "\n",
+        "---\n",
+        "\n",
+        "### Predicted Probabilities\n",
+        "\n",
+        "The output of cross-validation is an array of **predicted probabilities** (`pred_probs`):\n",
+        "- **Rows** correspond to individual transactions.\n",
+        "- **Columns** represent the probabilities of each class (`0` and `1`).\n",
+        "\n",
+        "For example:\n",
+        "| Transaction ID | Probability (Non-Fraud) | Probability (Fraud) |\n",
+        "|----------------|--------------------------|----------------------|\n",
+        "| 1              | 0.92                     | 0.08                 |\n",
+        "| 2              | 0.65                     | 0.35                 |\n",
+        "| ...            | ...                      | ...                  |\n",
+        "\n",
+        "These probabilities are a critical input for Cleanlab to identify potential label issues in the dataset.\n",
+        "\n",
+        "Next, we will use these probabilities to construct a **K-Nearest Neighbors (KNN) graph** for analyzing data quality.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "id": "dd2ccbc6-824a-44cd-bd1d-26e0a39e23a4",
+      "metadata": {
+        "id": "dd2ccbc6-824a-44cd-bd1d-26e0a39e23a4"
+      },
+      "outputs": [],
+      "source": [
+        "# Define the classification model\n",
+        "clf = LogisticRegression(max_iter=1000, random_state=SEED)\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "id": "bb9327e7-bc63-45a0-8d2c-7950186c961f",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "bb9327e7-bc63-45a0-8d2c-7950186c961f",
+        "outputId": "1024212d-a062-4a11-8599-f792c7d48892"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Shape of predicted probabilities: (100000, 2)\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Perform K-fold cross-validation to compute out-of-sample predicted probabilities\n",
+        "num_crossval_folds = 5\n",
+        "pred_probs = cross_val_predict(\n",
+        "    clf,\n",
+        "    X_encoded,     # Preprocessed feature matrix\n",
+        "    y,             # Labels\n",
+        "    cv=num_crossval_folds,\n",
+        "    method=\"predict_proba\"  # Get predicted probabilities\n",
+        ")\n",
+        "\n",
+        "# Display the shape of the predicted probabilities array\n",
+        "print(\"Shape of predicted probabilities:\", pred_probs.shape)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "2681aad2-d84d-4713-bed8-aa1204223fd5",
+      "metadata": {
+        "id": "2681aad2-d84d-4713-bed8-aa1204223fd5"
+      },
+      "source": [
+        "# 4. Construct K Nearest Neighbors Graph\n",
+        "\n",
+        "The **KNN graph** represents the similarity between examples in the dataset. It helps Cleanlab identify issues like:\n",
+        "- **Outliers**: Data points that are far from others in feature space.\n",
+        "- **Duplicates or Near-Duplicates**: Examples that are unusually close to each other.\n",
+        "\n",
+        "For tabular data, we define similarity using the **Euclidean distance** between feature values.\n",
+        "\n",
+        "We use scikit-learn's `NearestNeighbors` class to construct this graph:\n",
+        "1. Compute pairwise distances between all examples.\n",
+        "2. Represent the graph as a sparse matrix, with nonzero entries indicating the distance to nearest neighbors.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 8,
+      "id": "6c49780c-5d65-4360-ad0b-90e6717ecddb",
+      "metadata": {
+        "id": "6c49780c-5d65-4360-ad0b-90e6717ecddb"
+      },
+      "outputs": [],
+      "source": [
+        "# Create a KNN model with Euclidean distance as the metric\n",
+        "knn = NearestNeighbors(metric=\"euclidean\")\n",
+        "\n",
+        "# Fit the KNN model to the preprocessed feature values\n",
+        "knn.fit(X_encoded.values)\n",
+        "\n",
+        "# Construct the KNN graph as a sparse matrix\n",
+        "knn_graph = knn.kneighbors_graph(mode=\"distance\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "b27cf2de-e276-438a-8f0a-9a5de3d1757a",
+      "metadata": {
+        "id": "b27cf2de-e276-438a-8f0a-9a5de3d1757a"
+      },
+      "source": [
+        "# 5. Use Cleanlab to Find Dataset Issues\n",
+        "\n",
+        "With the given labels, predicted probabilities, and the KNN graph, Cleanlab can help us identify various issues in the **Credit Card Fraud Detection dataset**, such as:\n",
+        "\n",
+        "- **Label Issues**: Transactions where the assigned label (fraud or non-fraud) is likely incorrect.\n",
+        "- **Outliers**: Transactions with anomalous patterns that differ significantly from the rest.\n",
+        "- **Near-Duplicates**: Transactions that are highly similar or repeated.\n",
+        "- **Class Imbalance**: Uneven representation of classes in the dataset.\n",
+        "\n",
+        "We use Cleanlab's **Datalab** class to audit the dataset for these issues. The process involves:\n",
+        "1. Wrapping the dataset (preprocessed features and labels) into a dictionary format.\n",
+        "2. Creating a `Datalab` object to analyze the dataset.\n",
+        "3. Detecting and reporting various types of data quality issues."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "id": "58ca8740-1e44-4959-8567-ec4ee2535bfa",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "58ca8740-1e44-4959-8567-ec4ee2535bfa",
+        "outputId": "dc353a68-3607-46e0-fb87-07fcf651c288"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Finding label issues ...\n",
+            "Finding outlier issues ...\n",
+            "Finding near_duplicate issues ...\n",
+            "Finding non_iid issues ...\n",
+            "Finding class_imbalance issues ...\n",
+            "Finding underperforming_group issues ...\n",
+            "\n",
+            "Audit complete. 12043 issues found in the dataset.\n"
+          ]
+        }
+      ],
+      "source": [
+        "from cleanlab import Datalab\n",
+        "# Wrap the dataset into a dictionary\n",
+        "data = {\"X\": X_encoded.values, \"y\": y}\n",
+        "\n",
+        "# Create a Datalab object\n",
+        "lab = Datalab(data, label_name=\"y\")\n",
+        "\n",
+        "# Use Cleanlab to find issues in the dataset\n",
+        "lab.find_issues(pred_probs=pred_probs, knn_graph=knn_graph)\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 10,
+      "id": "ef82352b-eb2e-4c7a-91d0-436fc61e16ac",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "ef82352b-eb2e-4c7a-91d0-436fc61e16ac",
+        "outputId": "73358440-6c05-4fb6-9beb-ad5fc35055cc"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Dataset Information: num_examples: 100000, num_classes: 2\n",
+            "\n",
+            "Here is a summary of various issues found in your data:\n",
+            "\n",
+            "     issue_type  num_issues\n",
+            " near_duplicate        8639\n",
+            "        outlier        1797\n",
+            "class_imbalance        1000\n",
+            "          label         607\n",
+            "\n",
+            "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n",
+            "See which examples in your dataset exhibit each issue via: `datalab.get_issues(<ISSUE_NAME>)`\n",
+            "\n",
+            "Data indices corresponding to top examples of each issue are shown below.\n",
+            "\n",
+            "\n",
+            "------------------ near_duplicate issues -------------------\n",
+            "\n",
+            "About this issue:\n",
+            "\tA (near) duplicate issue refers to two or more examples in\n",
+            "    a dataset that are extremely similar to each other, relative\n",
+            "    to the rest of the dataset.  The examples flagged with this issue\n",
+            "    may be exactly duplicated, or lie atypically close together when\n",
+            "    represented as vectors (i.e. feature embeddings).\n",
+            "    \n",
+            "\n",
+            "Number of examples with this issue: 8639\n",
+            "Overall dataset quality in terms of this issue: 0.5894\n",
+            "\n",
+            "Examples representing most severe instances of this issue:\n",
+            "       is_near_duplicate_issue  near_duplicate_score near_duplicate_sets  distance_to_nearest_neighbor\n",
+            "62583                     True                   0.0             [55080]                           0.0\n",
+            "30333                     True                   0.0             [13617]                           0.0\n",
+            "12827                     True                   0.0             [15703]                           0.0\n",
+            "66741                     True                   0.0             [82920]                           0.0\n",
+            "45125                     True                   0.0             [95476]                           0.0\n",
+            "\n",
+            "\n",
+            "---------------------- outlier issues ----------------------\n",
+            "\n",
+            "About this issue:\n",
+            "\tExamples that are very different from the rest of the dataset \n",
+            "    (i.e. potentially out-of-distribution or rare/anomalous instances).\n",
+            "    \n",
+            "\n",
+            "Number of examples with this issue: 1797\n",
+            "Overall dataset quality in terms of this issue: 0.3784\n",
+            "\n",
+            "Examples representing most severe instances of this issue:\n",
+            "       is_outlier_issue  outlier_score\n",
+            "43484              True       0.003062\n",
+            "4659               True       0.007290\n",
+            "67602              True       0.007582\n",
+            "91994              True       0.007898\n",
+            "52696              True       0.008608\n",
+            "\n",
+            "\n",
+            "------------------ class_imbalance issues ------------------\n",
+            "\n",
+            "About this issue:\n",
+            "\tExamples belonging to the most under-represented class in the dataset.\n",
+            "\n",
+            "Number of examples with this issue: 1000\n",
+            "Overall dataset quality in terms of this issue: 0.0100\n",
+            "\n",
+            "Examples representing most severe instances of this issue:\n",
+            "       is_class_imbalance_issue  class_imbalance_score  given_label\n",
+            "68852                      True                   0.01            1\n",
+            "22652                      True                   0.01            1\n",
+            "33819                      True                   0.01            1\n",
+            "5781                       True                   0.01            1\n",
+            "44573                      True                   0.01            1\n",
+            "\n",
+            "Additional Information: \n",
+            "Rarest Class: 1\n",
+            "\n",
+            "\n",
+            "----------------------- label issues -----------------------\n",
+            "\n",
+            "About this issue:\n",
+            "\tExamples whose given label is estimated to be potentially incorrect\n",
+            "    (e.g. due to annotation error) are flagged as having label issues.\n",
+            "    \n",
+            "\n",
+            "Number of examples with this issue: 607\n",
+            "Overall dataset quality in terms of this issue: 0.9939\n",
+            "\n",
+            "Examples representing most severe instances of this issue:\n",
+            "       is_label_issue  label_score  given_label  predicted_label\n",
+            "6901             True     0.006965            1                0\n",
+            "7933             True     0.007031            1                0\n",
+            "13204            True     0.007065            1                0\n",
+            "16276            True     0.007086            1                0\n",
+            "7546             True     0.007124            1                0\n"
+          ]
+        }
+      ],
+      "source": [
+        "lab.report()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Label Issues\n",
+        "The report indicates that Cleanlab identified several label issues in the dataset. These are data entries where the given labels may not match the actual label, as estimated by Cleanlab. Each issue includes a numeric label score that quantifies how likely the label is correct (lower scores indicate higher likelihood of being mislabeled)."
+      ],
+      "metadata": {
+        "id": "qBcATrTFCWqJ"
+      },
+      "id": "qBcATrTFCWqJ"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Retrieve label issues\n",
+        "label_issues = lab.get_issues(\"label\")\n",
+        "print(label_issues.head())\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "pee_lWpiCiIV",
+        "outputId": "d5bcf570-0051-4b92-df49-20c3479b88b1"
+      },
+      "id": "pee_lWpiCiIV",
+      "execution_count": 11,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "   is_label_issue  label_score  given_label  predicted_label\n",
+            "0           False     0.990469            0                0\n",
+            "1           False     0.991203            0                0\n",
+            "2           False     0.988302            0                0\n",
+            "3           False     0.990321            0                0\n",
+            "4           False     0.991149            0                0\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Filter rows with label issues\n",
+        "label_issues_filtered = label_issues[label_issues['is_label_issue'] == True]\n",
+        "print(label_issues_filtered.head())\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "8pGqVz8RDeoF",
+        "outputId": "322a6eb8-4b2f-4597-9b8f-614c97887a45"
+      },
+      "id": "8pGqVz8RDeoF",
+      "execution_count": 12,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "     is_label_issue  label_score  given_label  predicted_label\n",
+            "190            True     0.007187            1                0\n",
+            "191            True     0.007622            1                0\n",
+            "208            True     0.007177            1                0\n",
+            "319            True     0.008984            1                0\n",
+            "506            True     0.009220            1                0\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Sort the label issues by label_score (lower scores indicate higher likelihood of being mislabeled)\n",
+        "sorted_issues = label_issues.sort_values(\"label_score\").index\n",
+        "\n",
+        "# View the most likely label errors\n",
+        "X_raw.iloc[sorted_issues].assign(\n",
+        "    given_label=y.iloc[sorted_issues],\n",
+        "    predicted_label=label_issues[\"predicted_label\"].iloc[sorted_issues]\n",
+        ").head()\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "id": "m1KP2zEWDfaE",
+        "outputId": "6fc9c1b0-30a0-4c3f-f015-44e42202c166"
+      },
+      "id": "m1KP2zEWDfaE",
+      "execution_count": 13,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "        Amount TransactionType  Location  given_label  predicted_label\n",
+              "6901    346.13        purchase  San Jose            1                0\n",
+              "7933     25.91          refund  San Jose            1                0\n",
+              "13204   963.84        purchase  San Jose            1                0\n",
+              "16276  1093.22        purchase  San Jose            1                0\n",
+              "7546    598.78          refund  San Jose            1                0"
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-b5f43500-9ac4-48f8-ae62-3335bd3814fc\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Amount</th>\n",
+              "      <th>TransactionType</th>\n",
+              "      <th>Location</th>\n",
+              "      <th>given_label</th>\n",
+              "      <th>predicted_label</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>6901</th>\n",
+              "      <td>346.13</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>San Jose</td>\n",
+              "      <td>1</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>7933</th>\n",
+              "      <td>25.91</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>San Jose</td>\n",
+              "      <td>1</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>13204</th>\n",
+              "      <td>963.84</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>San Jose</td>\n",
+              "      <td>1</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>16276</th>\n",
+              "      <td>1093.22</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>San Jose</td>\n",
+              "      <td>1</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>7546</th>\n",
+              "      <td>598.78</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>San Jose</td>\n",
+              "      <td>1</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b5f43500-9ac4-48f8-ae62-3335bd3814fc')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-b5f43500-9ac4-48f8-ae62-3335bd3814fc button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-b5f43500-9ac4-48f8-ae62-3335bd3814fc');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "<div id=\"df-51b6006f-0304-4e73-a574-1d60b6abb89c\">\n",
+              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-51b6006f-0304-4e73-a574-1d60b6abb89c')\"\n",
+              "            title=\"Suggest charts\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "     width=\"24px\">\n",
+              "    <g>\n",
+              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "    </g>\n",
+              "</svg>\n",
+              "  </button>\n",
+              "\n",
+              "<style>\n",
+              "  .colab-df-quickchart {\n",
+              "      --bg-color: #E8F0FE;\n",
+              "      --fill-color: #1967D2;\n",
+              "      --hover-bg-color: #E2EBFA;\n",
+              "      --hover-fill-color: #174EA6;\n",
+              "      --disabled-fill-color: #AAA;\n",
+              "      --disabled-bg-color: #DDD;\n",
+              "  }\n",
+              "\n",
+              "  [theme=dark] .colab-df-quickchart {\n",
+              "      --bg-color: #3B4455;\n",
+              "      --fill-color: #D2E3FC;\n",
+              "      --hover-bg-color: #434B5C;\n",
+              "      --hover-fill-color: #FFFFFF;\n",
+              "      --disabled-bg-color: #3B4455;\n",
+              "      --disabled-fill-color: #666;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart {\n",
+              "    background-color: var(--bg-color);\n",
+              "    border: none;\n",
+              "    border-radius: 50%;\n",
+              "    cursor: pointer;\n",
+              "    display: none;\n",
+              "    fill: var(--fill-color);\n",
+              "    height: 32px;\n",
+              "    padding: 0;\n",
+              "    width: 32px;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart:hover {\n",
+              "    background-color: var(--hover-bg-color);\n",
+              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "    fill: var(--button-hover-fill-color);\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart-complete:disabled,\n",
+              "  .colab-df-quickchart-complete:disabled:hover {\n",
+              "    background-color: var(--disabled-bg-color);\n",
+              "    fill: var(--disabled-fill-color);\n",
+              "    box-shadow: none;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-spinner {\n",
+              "    border: 2px solid var(--fill-color);\n",
+              "    border-color: transparent;\n",
+              "    border-bottom-color: var(--fill-color);\n",
+              "    animation:\n",
+              "      spin 1s steps(1) infinite;\n",
+              "  }\n",
+              "\n",
+              "  @keyframes spin {\n",
+              "    0% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "      border-left-color: var(--fill-color);\n",
+              "    }\n",
+              "    20% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    30% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    40% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    60% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    80% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "    90% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "  }\n",
+              "</style>\n",
+              "\n",
+              "  <script>\n",
+              "    async function quickchart(key) {\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#' + key + ' button');\n",
+              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
+              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
+              "      try {\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'suggestCharts', [key], {});\n",
+              "      } catch (error) {\n",
+              "        console.error('Error during call to suggestCharts:', error);\n",
+              "      }\n",
+              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+              "    }\n",
+              "    (() => {\n",
+              "      let quickchartButtonEl =\n",
+              "        document.querySelector('#df-51b6006f-0304-4e73-a574-1d60b6abb89c button');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "    })();\n",
+              "  </script>\n",
+              "</div>\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "summary": "{\n  \"name\": \")\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 438.6116871789898,\n        \"min\": 25.91,\n        \"max\": 1093.22,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          25.91,\n          598.78,\n          963.84\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TransactionType\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"refund\",\n          \"purchase\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Location\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"San Jose\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"given_label\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 1,\n        \"max\": 1,\n        \"num_unique_values\": 1,\n        \"samples\": [\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"predicted_label\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 0,\n        \"num_unique_values\": 1,\n        \"samples\": [\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {},
+          "execution_count": 13
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Example Review of Label Issues\n",
+        "\n",
+        "The dataframe below shows the original label (`given_label`) for examples that Cleanlab finds most likely to be mislabeled, as well as an alternative `predicted_label` for each example.\n",
+        "\n",
+        "| Amount  | TransactionType | Location  | given_label | predicted_label |\n",
+        "|---------|------------------|-----------|-------------|-----------------|\n",
+        "| 346.13  | purchase         | San Jose  | 1           | 0               |\n",
+        "| 25.91   | refund           | San Jose  | 1           | 0               |\n",
+        "| 963.84  | purchase         | San Jose  | 1           | 0               |\n",
+        "| 1093.22 | purchase         | San Jose  | 1           | 0               |\n",
+        "| 598.78  | refund           | San Jose  | 1           | 0               |\n",
+        "\n",
+        "These examples have been labeled incorrectly and should be carefully re-examined:\n",
+        "- **Entry 1**: A purchase of 346.13 labeled as fraudulent (`1`) is predicted to be non-fraudulent (`0`).\n",
+        "- **Entry 2**: A refund of  25.91 is similarly labeled as fraudulent but predicted as non-fraudulent.\n",
+        "- **Entry 4**: A purchase of $1093.22 also seems misclassified as fraudulent.\n",
+        "\n",
+        "The predicted labels suggest a potential mislabeling pattern for transactions in `San Jose`. Transactions with relatively lower amounts or refunds might have been mislabeled as fraudulent. This should be reviewed with additional domain knowledge or transaction metadata for confirmation.\n",
+        "\n",
+        "Such insights are crucial for improving the dataset's quality and ensuring the model learns from accurate labels.\n"
+      ],
+      "metadata": {
+        "id": "-ApyX5r6FTmI"
+      },
+      "id": "-ApyX5r6FTmI"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "\n",
+        "### Outlier Issues\n",
+        "\n",
+        "According to the report, our dataset contains some outliers. We can see which examples are outliers (and a numeric quality score quantifying how typical each example appears to be) via the `get_issues` method. We sort the resulting DataFrame by Cleanlab’s outlier quality score to see the most severe outliers in our dataset."
+      ],
+      "metadata": {
+        "id": "_zzPdWl0GFOY"
+      },
+      "id": "_zzPdWl0GFOY"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "outlier_results = lab.get_issues(\"outlier\")\n",
+        "sorted_outliers = outlier_results.sort_values(\"outlier_score\").index\n",
+        "\n",
+        "X_raw.iloc[sorted_outliers].head()"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "id": "D7VClp15GIXC",
+        "outputId": "ec6c23c3-1802-42ba-d2aa-69537251da5d"
+      },
+      "id": "D7VClp15GIXC",
+      "execution_count": 14,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "        Amount TransactionType      Location\n",
+              "43484  4999.73        purchase       Chicago\n",
+              "4659   2114.37          refund  Philadelphia\n",
+              "67602  3255.47        purchase      San Jose\n",
+              "91994  1147.93          refund       Chicago\n",
+              "52696  4005.05        purchase   San Antonio"
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-46bca668-ea2a-40b7-a280-39e7933beb22\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Amount</th>\n",
+              "      <th>TransactionType</th>\n",
+              "      <th>Location</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>43484</th>\n",
+              "      <td>4999.73</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>Chicago</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4659</th>\n",
+              "      <td>2114.37</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>Philadelphia</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>67602</th>\n",
+              "      <td>3255.47</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>San Jose</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>91994</th>\n",
+              "      <td>1147.93</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>Chicago</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>52696</th>\n",
+              "      <td>4005.05</td>\n",
+              "      <td>purchase</td>\n",
+              "      <td>San Antonio</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-46bca668-ea2a-40b7-a280-39e7933beb22')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-46bca668-ea2a-40b7-a280-39e7933beb22 button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-46bca668-ea2a-40b7-a280-39e7933beb22');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "<div id=\"df-274c6028-115e-4dcc-a3d3-3c4f0df19f78\">\n",
+              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-274c6028-115e-4dcc-a3d3-3c4f0df19f78')\"\n",
+              "            title=\"Suggest charts\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "     width=\"24px\">\n",
+              "    <g>\n",
+              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "    </g>\n",
+              "</svg>\n",
+              "  </button>\n",
+              "\n",
+              "<style>\n",
+              "  .colab-df-quickchart {\n",
+              "      --bg-color: #E8F0FE;\n",
+              "      --fill-color: #1967D2;\n",
+              "      --hover-bg-color: #E2EBFA;\n",
+              "      --hover-fill-color: #174EA6;\n",
+              "      --disabled-fill-color: #AAA;\n",
+              "      --disabled-bg-color: #DDD;\n",
+              "  }\n",
+              "\n",
+              "  [theme=dark] .colab-df-quickchart {\n",
+              "      --bg-color: #3B4455;\n",
+              "      --fill-color: #D2E3FC;\n",
+              "      --hover-bg-color: #434B5C;\n",
+              "      --hover-fill-color: #FFFFFF;\n",
+              "      --disabled-bg-color: #3B4455;\n",
+              "      --disabled-fill-color: #666;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart {\n",
+              "    background-color: var(--bg-color);\n",
+              "    border: none;\n",
+              "    border-radius: 50%;\n",
+              "    cursor: pointer;\n",
+              "    display: none;\n",
+              "    fill: var(--fill-color);\n",
+              "    height: 32px;\n",
+              "    padding: 0;\n",
+              "    width: 32px;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart:hover {\n",
+              "    background-color: var(--hover-bg-color);\n",
+              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "    fill: var(--button-hover-fill-color);\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart-complete:disabled,\n",
+              "  .colab-df-quickchart-complete:disabled:hover {\n",
+              "    background-color: var(--disabled-bg-color);\n",
+              "    fill: var(--disabled-fill-color);\n",
+              "    box-shadow: none;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-spinner {\n",
+              "    border: 2px solid var(--fill-color);\n",
+              "    border-color: transparent;\n",
+              "    border-bottom-color: var(--fill-color);\n",
+              "    animation:\n",
+              "      spin 1s steps(1) infinite;\n",
+              "  }\n",
+              "\n",
+              "  @keyframes spin {\n",
+              "    0% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "      border-left-color: var(--fill-color);\n",
+              "    }\n",
+              "    20% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    30% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    40% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    60% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    80% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "    90% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "  }\n",
+              "</style>\n",
+              "\n",
+              "  <script>\n",
+              "    async function quickchart(key) {\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#' + key + ' button');\n",
+              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
+              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
+              "      try {\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'suggestCharts', [key], {});\n",
+              "      } catch (error) {\n",
+              "        console.error('Error during call to suggestCharts:', error);\n",
+              "      }\n",
+              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+              "    }\n",
+              "    (() => {\n",
+              "      let quickchartButtonEl =\n",
+              "        document.querySelector('#df-274c6028-115e-4dcc-a3d3-3c4f0df19f78 button');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "    })();\n",
+              "  </script>\n",
+              "</div>\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "summary": "{\n  \"name\": \"X_raw\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1519.3915375570575,\n        \"min\": 1147.93,\n        \"max\": 4999.73,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          2114.37,\n          4005.05,\n          3255.47\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TransactionType\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"refund\",\n          \"purchase\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Location\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          \"Philadelphia\",\n          \"San Antonio\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {},
+          "execution_count": 14
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "\n",
+        "\n",
+        "\n",
+        "\n",
+        "#### **Key Observations**:\n",
+        "1. **Entry 1**: A purchase transaction with an unusually high amount of `$4999.73` in Chicago may represent a legitimate but rare high-value transaction or could be indicative of an error.\n",
+        "2. **Entry 2**: A refund for `$2114.37` in Philadelphia seems unusually high compared to typical refund amounts and should be verified.\n",
+        "3. **Entry 5**: Another high-value purchase transaction of `$4005.05` in San Antonio is rare and should be reviewed for validity.\n",
+        "\n",
+        "#### **Next Steps**:\n",
+        "- **Investigate Outliers**:\n",
+        "  - Validate whether these transactions are legitimate or the result of data errors.\n",
+        "  - Cross-check these entries against metadata such as timestamps, merchants, and customer profiles for better context.\n",
+        "- **Handle Outliers**:\n",
+        "  - **Retain**: If the transaction is valid, keep it in the dataset for training.\n",
+        "  - **Remove**: If the transaction is deemed erroneous or unrepresentative, exclude it from the dataset to avoid skewing the model's learning.\n",
+        "\n",
+        "These steps will ensure that the dataset is representative and does not include suspicious entries that could affect the performance of fraud detection models.\n",
+        "  "
+      ],
+      "metadata": {
+        "id": "qQYl8X5RG9F6"
+      },
+      "id": "qQYl8X5RG9F6"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Near-Duplicate Issues\n",
+        "\n",
+        "According to the report, our dataset contains some sets of nearly duplicated examples. We can see which examples are (nearly) duplicated (and a numeric quality score quantifying how dissimilar each example is from its nearest neighbor in the dataset) via `get_issues`. We sort the resulting DataFrame by Cleanlab’s near-duplicate quality score to see the examples in our dataset that are most nearly duplicated.\n",
+        "\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "STlYZFJRRDtO"
+      },
+      "id": "STlYZFJRRDtO"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "duplicate_results = lab.get_issues(\"near_duplicate\")\n",
+        "duplicate_results.sort_values(\"near_duplicate_score\").head()"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "id": "VHcPnNYbQZ-n",
+        "outputId": "7dc6f1fe-ac78-4c77-96e5-176c7f3a6a16"
+      },
+      "id": "VHcPnNYbQZ-n",
+      "execution_count": 15,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "       is_near_duplicate_issue  near_duplicate_score near_duplicate_sets  \\\n",
+              "62583                     True                   0.0             [55080]   \n",
+              "30333                     True                   0.0             [13617]   \n",
+              "12827                     True                   0.0             [15703]   \n",
+              "66741                     True                   0.0             [82920]   \n",
+              "45125                     True                   0.0             [95476]   \n",
+              "\n",
+              "       distance_to_nearest_neighbor  \n",
+              "62583                           0.0  \n",
+              "30333                           0.0  \n",
+              "12827                           0.0  \n",
+              "66741                           0.0  \n",
+              "45125                           0.0  "
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-42757ddd-f3d9-4e8c-b95c-254d8e92043d\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>is_near_duplicate_issue</th>\n",
+              "      <th>near_duplicate_score</th>\n",
+              "      <th>near_duplicate_sets</th>\n",
+              "      <th>distance_to_nearest_neighbor</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>62583</th>\n",
+              "      <td>True</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>[55080]</td>\n",
+              "      <td>0.0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>30333</th>\n",
+              "      <td>True</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>[13617]</td>\n",
+              "      <td>0.0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>12827</th>\n",
+              "      <td>True</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>[15703]</td>\n",
+              "      <td>0.0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>66741</th>\n",
+              "      <td>True</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>[82920]</td>\n",
+              "      <td>0.0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>45125</th>\n",
+              "      <td>True</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>[95476]</td>\n",
+              "      <td>0.0</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-42757ddd-f3d9-4e8c-b95c-254d8e92043d')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-42757ddd-f3d9-4e8c-b95c-254d8e92043d button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-42757ddd-f3d9-4e8c-b95c-254d8e92043d');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "<div id=\"df-c0ddef4d-eaea-4231-8ef5-001220e1699a\">\n",
+              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-c0ddef4d-eaea-4231-8ef5-001220e1699a')\"\n",
+              "            title=\"Suggest charts\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "     width=\"24px\">\n",
+              "    <g>\n",
+              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "    </g>\n",
+              "</svg>\n",
+              "  </button>\n",
+              "\n",
+              "<style>\n",
+              "  .colab-df-quickchart {\n",
+              "      --bg-color: #E8F0FE;\n",
+              "      --fill-color: #1967D2;\n",
+              "      --hover-bg-color: #E2EBFA;\n",
+              "      --hover-fill-color: #174EA6;\n",
+              "      --disabled-fill-color: #AAA;\n",
+              "      --disabled-bg-color: #DDD;\n",
+              "  }\n",
+              "\n",
+              "  [theme=dark] .colab-df-quickchart {\n",
+              "      --bg-color: #3B4455;\n",
+              "      --fill-color: #D2E3FC;\n",
+              "      --hover-bg-color: #434B5C;\n",
+              "      --hover-fill-color: #FFFFFF;\n",
+              "      --disabled-bg-color: #3B4455;\n",
+              "      --disabled-fill-color: #666;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart {\n",
+              "    background-color: var(--bg-color);\n",
+              "    border: none;\n",
+              "    border-radius: 50%;\n",
+              "    cursor: pointer;\n",
+              "    display: none;\n",
+              "    fill: var(--fill-color);\n",
+              "    height: 32px;\n",
+              "    padding: 0;\n",
+              "    width: 32px;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart:hover {\n",
+              "    background-color: var(--hover-bg-color);\n",
+              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "    fill: var(--button-hover-fill-color);\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart-complete:disabled,\n",
+              "  .colab-df-quickchart-complete:disabled:hover {\n",
+              "    background-color: var(--disabled-bg-color);\n",
+              "    fill: var(--disabled-fill-color);\n",
+              "    box-shadow: none;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-spinner {\n",
+              "    border: 2px solid var(--fill-color);\n",
+              "    border-color: transparent;\n",
+              "    border-bottom-color: var(--fill-color);\n",
+              "    animation:\n",
+              "      spin 1s steps(1) infinite;\n",
+              "  }\n",
+              "\n",
+              "  @keyframes spin {\n",
+              "    0% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "      border-left-color: var(--fill-color);\n",
+              "    }\n",
+              "    20% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    30% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    40% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    60% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    80% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "    90% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "  }\n",
+              "</style>\n",
+              "\n",
+              "  <script>\n",
+              "    async function quickchart(key) {\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#' + key + ' button');\n",
+              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
+              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
+              "      try {\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'suggestCharts', [key], {});\n",
+              "      } catch (error) {\n",
+              "        console.error('Error during call to suggestCharts:', error);\n",
+              "      }\n",
+              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+              "    }\n",
+              "    (() => {\n",
+              "      let quickchartButtonEl =\n",
+              "        document.querySelector('#df-c0ddef4d-eaea-4231-8ef5-001220e1699a button');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "    })();\n",
+              "  </script>\n",
+              "</div>\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "summary": "{\n  \"name\": \"duplicate_results\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"is_near_duplicate_issue\",\n      \"properties\": {\n        \"dtype\": \"boolean\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          true\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"near_duplicate_score\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.0,\n        \"min\": 0.0,\n        \"max\": 0.0,\n        \"num_unique_values\": 1,\n        \"samples\": [\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"near_duplicate_sets\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"distance_to_nearest_neighbor\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.0,\n        \"min\": 0.0,\n        \"max\": 0.0,\n        \"num_unique_values\": 1,\n        \"samples\": [\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {},
+          "execution_count": 15
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The results above show which examples Cleanlab considers nearly duplicated (rows where is_near_duplicate_issue == True). Here, we see some examples that Cleanlab has flagged as being nearly duplicated. Let’s view these examples to see how similar they are."
+      ],
+      "metadata": {
+        "id": "0FyG5cJtRNGb"
+      },
+      "id": "0FyG5cJtRNGb"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Identify the row with the lowest near_duplicate_score\n",
+        "lowest_scoring_duplicate = duplicate_results[\"near_duplicate_score\"].idxmin()\n",
+        "\n",
+        "# Extract the indices of the lowest scoring duplicate and its near duplicate sets\n",
+        "indices_to_display = [lowest_scoring_duplicate] + duplicate_results.loc[lowest_scoring_duplicate, \"near_duplicate_sets\"].tolist()\n",
+        "\n",
+        "# Display the relevant rows from the original dataset\n",
+        "X_raw.iloc[indices_to_display]\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 143
+        },
+        "id": "IqgcWEVIROAP",
+        "outputId": "eb36a8cd-a66e-4f3d-eb68-c7aac6ef27b5"
+      },
+      "id": "IqgcWEVIROAP",
+      "execution_count": 18,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "        Amount TransactionType  Location\n",
+              "73     3374.61          refund  New York\n",
+              "19427  3374.61          refund  New York\n",
+              "30450  3374.63          refund  New York"
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-a18947c4-2bfe-4434-842a-663e654b0ee3\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Amount</th>\n",
+              "      <th>TransactionType</th>\n",
+              "      <th>Location</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>73</th>\n",
+              "      <td>3374.61</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>New York</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>19427</th>\n",
+              "      <td>3374.61</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>New York</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>30450</th>\n",
+              "      <td>3374.63</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>New York</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-a18947c4-2bfe-4434-842a-663e654b0ee3')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-a18947c4-2bfe-4434-842a-663e654b0ee3 button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-a18947c4-2bfe-4434-842a-663e654b0ee3');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "<div id=\"df-aa0fc0d2-3e8f-4bb9-8cb6-735e3e9ed7ed\">\n",
+              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-aa0fc0d2-3e8f-4bb9-8cb6-735e3e9ed7ed')\"\n",
+              "            title=\"Suggest charts\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "     width=\"24px\">\n",
+              "    <g>\n",
+              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "    </g>\n",
+              "</svg>\n",
+              "  </button>\n",
+              "\n",
+              "<style>\n",
+              "  .colab-df-quickchart {\n",
+              "      --bg-color: #E8F0FE;\n",
+              "      --fill-color: #1967D2;\n",
+              "      --hover-bg-color: #E2EBFA;\n",
+              "      --hover-fill-color: #174EA6;\n",
+              "      --disabled-fill-color: #AAA;\n",
+              "      --disabled-bg-color: #DDD;\n",
+              "  }\n",
+              "\n",
+              "  [theme=dark] .colab-df-quickchart {\n",
+              "      --bg-color: #3B4455;\n",
+              "      --fill-color: #D2E3FC;\n",
+              "      --hover-bg-color: #434B5C;\n",
+              "      --hover-fill-color: #FFFFFF;\n",
+              "      --disabled-bg-color: #3B4455;\n",
+              "      --disabled-fill-color: #666;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart {\n",
+              "    background-color: var(--bg-color);\n",
+              "    border: none;\n",
+              "    border-radius: 50%;\n",
+              "    cursor: pointer;\n",
+              "    display: none;\n",
+              "    fill: var(--fill-color);\n",
+              "    height: 32px;\n",
+              "    padding: 0;\n",
+              "    width: 32px;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart:hover {\n",
+              "    background-color: var(--hover-bg-color);\n",
+              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "    fill: var(--button-hover-fill-color);\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart-complete:disabled,\n",
+              "  .colab-df-quickchart-complete:disabled:hover {\n",
+              "    background-color: var(--disabled-bg-color);\n",
+              "    fill: var(--disabled-fill-color);\n",
+              "    box-shadow: none;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-spinner {\n",
+              "    border: 2px solid var(--fill-color);\n",
+              "    border-color: transparent;\n",
+              "    border-bottom-color: var(--fill-color);\n",
+              "    animation:\n",
+              "      spin 1s steps(1) infinite;\n",
+              "  }\n",
+              "\n",
+              "  @keyframes spin {\n",
+              "    0% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "      border-left-color: var(--fill-color);\n",
+              "    }\n",
+              "    20% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    30% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    40% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    60% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    80% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "    90% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "  }\n",
+              "</style>\n",
+              "\n",
+              "  <script>\n",
+              "    async function quickchart(key) {\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#' + key + ' button');\n",
+              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
+              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
+              "      try {\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'suggestCharts', [key], {});\n",
+              "      } catch (error) {\n",
+              "        console.error('Error during call to suggestCharts:', error);\n",
+              "      }\n",
+              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+              "    }\n",
+              "    (() => {\n",
+              "      let quickchartButtonEl =\n",
+              "        document.querySelector('#df-aa0fc0d2-3e8f-4bb9-8cb6-735e3e9ed7ed button');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "    })();\n",
+              "  </script>\n",
+              "</div>\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "summary": "{\n  \"name\": \"X_raw\",\n  \"rows\": 3,\n  \"fields\": [\n    {\n      \"column\": \"Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.011547005383782014,\n        \"min\": 3374.61,\n        \"max\": 3374.63,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          3374.63,\n          3374.61\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TransactionType\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"refund\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Location\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"New York\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {},
+          "execution_count": 18
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "These examples are exact duplicates! Perhaps the same information was accidentally recorded multiple times in this data.\n",
+        "\n",
+        "Similarly, let’s take a look at another example and the identified near-duplicate sets:"
+      ],
+      "metadata": {
+        "id": "6nhecZHHSuv9"
+      },
+      "id": "6nhecZHHSuv9"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Identify the next row not in the previous near duplicate set\n",
+        "second_lowest_scoring_duplicate = duplicate_results[\"near_duplicate_score\"].drop(indices_to_display).idxmin()\n",
+        "\n",
+        "# Extract the indices of the second lowest scoring duplicate and its near duplicate sets\n",
+        "next_indices_to_display = [second_lowest_scoring_duplicate] + duplicate_results.loc[second_lowest_scoring_duplicate, \"near_duplicate_sets\"].tolist()\n",
+        "\n",
+        "# Display the relevant rows from the original dataset\n",
+        "X_raw.iloc[next_indices_to_display]"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 112
+        },
+        "id": "94gQWzVkRW53",
+        "outputId": "106f3513-d065-4483-dc76-e6c28e614b39"
+      },
+      "id": "94gQWzVkRW53",
+      "execution_count": 19,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "        Amount TransactionType  Location\n",
+              "167    1796.39          refund  New York\n",
+              "53564  1796.39          refund  New York"
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-1dc32bdb-7824-4e6d-9218-a44a95f200c3\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Amount</th>\n",
+              "      <th>TransactionType</th>\n",
+              "      <th>Location</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>167</th>\n",
+              "      <td>1796.39</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>New York</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>53564</th>\n",
+              "      <td>1796.39</td>\n",
+              "      <td>refund</td>\n",
+              "      <td>New York</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-1dc32bdb-7824-4e6d-9218-a44a95f200c3')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-1dc32bdb-7824-4e6d-9218-a44a95f200c3 button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-1dc32bdb-7824-4e6d-9218-a44a95f200c3');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "<div id=\"df-3336eb5f-e09b-4355-a801-b5fde09f5360\">\n",
+              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-3336eb5f-e09b-4355-a801-b5fde09f5360')\"\n",
+              "            title=\"Suggest charts\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "     width=\"24px\">\n",
+              "    <g>\n",
+              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "    </g>\n",
+              "</svg>\n",
+              "  </button>\n",
+              "\n",
+              "<style>\n",
+              "  .colab-df-quickchart {\n",
+              "      --bg-color: #E8F0FE;\n",
+              "      --fill-color: #1967D2;\n",
+              "      --hover-bg-color: #E2EBFA;\n",
+              "      --hover-fill-color: #174EA6;\n",
+              "      --disabled-fill-color: #AAA;\n",
+              "      --disabled-bg-color: #DDD;\n",
+              "  }\n",
+              "\n",
+              "  [theme=dark] .colab-df-quickchart {\n",
+              "      --bg-color: #3B4455;\n",
+              "      --fill-color: #D2E3FC;\n",
+              "      --hover-bg-color: #434B5C;\n",
+              "      --hover-fill-color: #FFFFFF;\n",
+              "      --disabled-bg-color: #3B4455;\n",
+              "      --disabled-fill-color: #666;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart {\n",
+              "    background-color: var(--bg-color);\n",
+              "    border: none;\n",
+              "    border-radius: 50%;\n",
+              "    cursor: pointer;\n",
+              "    display: none;\n",
+              "    fill: var(--fill-color);\n",
+              "    height: 32px;\n",
+              "    padding: 0;\n",
+              "    width: 32px;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart:hover {\n",
+              "    background-color: var(--hover-bg-color);\n",
+              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "    fill: var(--button-hover-fill-color);\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-quickchart-complete:disabled,\n",
+              "  .colab-df-quickchart-complete:disabled:hover {\n",
+              "    background-color: var(--disabled-bg-color);\n",
+              "    fill: var(--disabled-fill-color);\n",
+              "    box-shadow: none;\n",
+              "  }\n",
+              "\n",
+              "  .colab-df-spinner {\n",
+              "    border: 2px solid var(--fill-color);\n",
+              "    border-color: transparent;\n",
+              "    border-bottom-color: var(--fill-color);\n",
+              "    animation:\n",
+              "      spin 1s steps(1) infinite;\n",
+              "  }\n",
+              "\n",
+              "  @keyframes spin {\n",
+              "    0% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "      border-left-color: var(--fill-color);\n",
+              "    }\n",
+              "    20% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    30% {\n",
+              "      border-color: transparent;\n",
+              "      border-left-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    40% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-top-color: var(--fill-color);\n",
+              "    }\n",
+              "    60% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "    }\n",
+              "    80% {\n",
+              "      border-color: transparent;\n",
+              "      border-right-color: var(--fill-color);\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "    90% {\n",
+              "      border-color: transparent;\n",
+              "      border-bottom-color: var(--fill-color);\n",
+              "    }\n",
+              "  }\n",
+              "</style>\n",
+              "\n",
+              "  <script>\n",
+              "    async function quickchart(key) {\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#' + key + ' button');\n",
+              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
+              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
+              "      try {\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'suggestCharts', [key], {});\n",
+              "      } catch (error) {\n",
+              "        console.error('Error during call to suggestCharts:', error);\n",
+              "      }\n",
+              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+              "    }\n",
+              "    (() => {\n",
+              "      let quickchartButtonEl =\n",
+              "        document.querySelector('#df-3336eb5f-e09b-4355-a801-b5fde09f5360 button');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "    })();\n",
+              "  </script>\n",
+              "</div>\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "summary": "{\n  \"name\": \"X_raw\",\n  \"rows\": 2,\n  \"fields\": [\n    {\n      \"column\": \"Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.0,\n        \"min\": 1796.39,\n        \"max\": 1796.39,\n        \"num_unique_values\": 1,\n        \"samples\": [\n          1796.39\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TransactionType\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"refund\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Location\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"New York\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {},
+          "execution_count": 19
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We identified another set of exact duplicates in our dataset! Including near/exact duplicates in a dataset may have unintended effects on models; be wary about splitting them across training/test sets. Learn more about handling near duplicates detected in a dataset from the FAQ.\n",
+        "\n",
+        "This tutorial highlights a straightforward approach to detect potentially incorrect information in any tabular dataset. Just use Cleanlab with any ML model – the better the model, the more accurate the data errors detected by Cleanlab will be!"
+      ],
+      "metadata": {
+        "id": "6vexriCMTCAG"
+      },
+      "id": "6vexriCMTCAG"
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "I56gc8gFTC4l"
+      },
+      "id": "I56gc8gFTC4l",
+      "execution_count": null,
+      "outputs": []
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.12.3"
+    },
+    "colab": {
+      "provenance": []
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/1021_fintech_documentation/Requirement.txt b/1021_fintech_documentation/Requirement.txt
new file mode 100644
index 0000000..2758a9a
--- /dev/null
+++ b/1021_fintech_documentation/Requirement.txt
@@ -0,0 +1,4 @@
+numpy==1.22.0
+pandas==1.3.3
+scikit-learn==1.0.2
+scikit-image==0.18.3
\ No newline at end of file