Skip to content

Commit

Permalink
feat: Update MLJAR AutoML notebook with explain level and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
SverreNystad committed Mar 29, 2024
1 parent 53e04dd commit 0a825ce
Showing 1 changed file with 25 additions and 6 deletions.
31 changes: 25 additions & 6 deletions models/mljar.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -90,15 +90,31 @@
"metadata": {},
"outputs": [],
"source": [
"x_train, _, x_test, y_train, _, y_test = prepare_data(validation_size=0.0, test_size=0.1)\n",
"train_data = pd.concat([x_train, y_train], axis=1)"
"x_train, _, x_test, y_train, _, y_test = prepare_data(validation_size=0.0, test_size=0.1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model"
"## Train model\n",
"\n",
"**Evaluation metrics:**\n",
"- for binary classification: `logloss`, `auc`, `f1`, `average_precision`, `accuracy` - default is logloss (if left \"auto\")\n",
"- for mutliclass classification: `logloss`, `f1`, `accuracy` - default is `logloss` (if left \"auto\")\n",
"- for regression: `rmse`, `mse`, `mae`, `r2`, `mape`, `spearman`, `pearson` - default is `rmse` (if left \"auto\")\n",
"\n",
"**Explain level:**\n",
"Specifies the amount of interoperability detail provided with the model's predictions, ranging from 0 (minimal) to 2 (extensive), enabling users to adjust the balance between simplicity and depth of insight into how the model makes its decisions.\n",
"\n",
"**Golden features:**\n",
"Activates the creation of new features from existing ones by exploring their interactions, potentially uncovering extremely valuable patterns to enhance model accuracy. \n",
"\n",
"**n_jobs:**\n",
"Determines the number of CPU cores used for parallel processing, with -1 utilizing all available cores to speed up the training process.\n",
"\n",
"**stack_models:**\n",
"Enables stacking of multiple models to improve predictions, leveraging the strengths of various models by using their predictions as inputs to a final model, thereby potentially increasing overall accuracy."
]
},
{
Expand All @@ -108,12 +124,15 @@
"outputs": [],
"source": [
"# Initialize MLJAR AutoML\n",
"time_limit = 4 * 60 # 24 * 60 * 60 \n",
"predictor = AutoML(mode=\"Explain\", \n",
" random_state=42,\n",
" total_time_limit=time_limit,\n",
" n_jobs=-1, \n",
" golden_features=True,\n",
" features_selection=True,\n",
" stack_models=True\n",
" stack_models=True,\n",
" explain_level=2,\n",
" )\n",
"\n",
"# Train the model\n",
Expand All @@ -139,7 +158,7 @@
"print(\"Test Accuracy: \", test_accuracy)\n",
"print(\"Test Classification Report:\\n\", classification_report(y_test, y_test_pred))\n",
"# MLJAR also provides a leaderboard with model performance\n",
"predictor.report()\n"
"predictor.report()"
]
},
{
Expand All @@ -156,7 +175,7 @@
"outputs": [],
"source": [
"x_test = prepare_test_data()\n",
"final_predictions = predictor.predict(x_test)\n",
"final_predictions = pd.DataFrame(predictor.predict(x_test))\n",
"\n",
"save_predictions(final_predictions, 'mljar_automl')"
]
Expand Down

0 comments on commit 0a825ce

Please sign in to comment.