feat: Update MLJAR AutoML notebook with explain level and documentation

SverreNystad · Mar 29, 2024 · 0a825ce · 0a825ce
1 parent 53e04dd
commit 0a825ce
Showing 1 changed file with 25 additions and 6 deletions.
diff --git a/models/mljar.ipynb b/models/mljar.ipynb
@@ -90,15 +90,31 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "x_train, _, x_test, y_train, _, y_test = prepare_data(validation_size=0.0, test_size=0.1)\n",
-    "train_data = pd.concat([x_train, y_train], axis=1)"
+    "x_train, _, x_test, y_train, _, y_test = prepare_data(validation_size=0.0, test_size=0.1)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Train model"
+    "## Train model\n",
+    "\n",
+    "**Evaluation metrics:**\n",
+    "- for binary classification: `logloss`, `auc`, `f1`, `average_precision`, `accuracy` - default is logloss (if left \"auto\")\n",
+    "- for mutliclass classification: `logloss`, `f1`, `accuracy` - default is `logloss` (if left \"auto\")\n",
+    "- for regression: `rmse`, `mse`, `mae`, `r2`, `mape`, `spearman`, `pearson` - default is `rmse` (if left \"auto\")\n",
+    "\n",
+    "**Explain level:**\n",
+    "Specifies the amount of interoperability detail provided with the model's predictions, ranging from 0 (minimal) to 2 (extensive), enabling users to adjust the balance between simplicity and depth of insight into how the model makes its decisions.\n",
+    "\n",
+    "**Golden features:**\n",
+    "Activates the creation of new features from existing ones by exploring their interactions, potentially uncovering extremely valuable patterns to enhance model accuracy. \n",
+    "\n",
+    "**n_jobs:**\n",
+    "Determines the number of CPU cores used for parallel processing, with -1 utilizing all available cores to speed up the training process.\n",
+    "\n",
+    "**stack_models:**\n",
+    "Enables stacking of multiple models to improve predictions, leveraging the strengths of various models by using their predictions as inputs to a final model, thereby potentially increasing overall accuracy."
    ]
   },
   {
@@ -108,12 +124,15 @@
    "outputs": [],
    "source": [
     "# Initialize MLJAR AutoML\n",
+    "time_limit = 4 * 60 # 24 * 60 * 60 \n",
     "predictor = AutoML(mode=\"Explain\", \n",
     "    random_state=42,\n",
+    "    total_time_limit=time_limit,\n",
     "    n_jobs=-1, \n",
     "    golden_features=True,\n",
     "    features_selection=True,\n",
-    "    stack_models=True\n",
+    "    stack_models=True,\n",
+    "    explain_level=2,\n",
     "    )\n",
     "\n",
     "# Train the model\n",
@@ -139,7 +158,7 @@
     "print(\"Test Accuracy: \", test_accuracy)\n",
     "print(\"Test Classification Report:\\n\", classification_report(y_test, y_test_pred))\n",
     "# MLJAR also provides a leaderboard with model performance\n",
-    "predictor.report()\n"
+    "predictor.report()"
    ]
   },
   {
@@ -156,7 +175,7 @@
    "outputs": [],
    "source": [
     "x_test = prepare_test_data()\n",
-    "final_predictions = predictor.predict(x_test)\n",
+    "final_predictions = pd.DataFrame(predictor.predict(x_test))\n",
     "\n",
     "save_predictions(final_predictions, 'mljar_automl')"
    ]