-
Notifications
You must be signed in to change notification settings - Fork 179
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* latest HTML output * latest HTML output * latest HTML output * latest HTML output --------- Co-authored-by: aepanchi <[email protected]>
- Loading branch information
Showing
412 changed files
with
82,049 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 05538de980fa100591b86cc89801ada2 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,386 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3768ec43", | ||
"metadata": {}, | ||
"source": [ | ||
"# Intel® Extension for Scikit-learn ElasticNet for Airlines DepDelay dataset" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "b1b922d1", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from timeit import default_timer as timer\n", | ||
"from sklearn import metrics\n", | ||
"from sklearn.model_selection import train_test_split\n", | ||
"import warnings\n", | ||
"from sklearn.datasets import fetch_openml\n", | ||
"from sklearn.preprocessing import LabelEncoder\n", | ||
"from IPython.display import HTML\n", | ||
"\n", | ||
"warnings.filterwarnings(\"ignore\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "34e460a7", | ||
"metadata": {}, | ||
"source": [ | ||
"### Download the data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "00c2277b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"x, y = fetch_openml(name=\"Airlines_DepDelay_10M\", return_X_y=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "06d309c0", | ||
"metadata": {}, | ||
"source": [ | ||
"### Preprocessing\n", | ||
"Let's encode categorical features with LabelEncoder" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "2ff35bc2", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"for col in [\"UniqueCarrier\", \"Origin\", \"Dest\"]:\n", | ||
" le = LabelEncoder().fit(x[col])\n", | ||
" x[col] = le.transform(x[col])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "38637349", | ||
"metadata": {}, | ||
"source": [ | ||
"Split the data into train and test sets" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "0d332789", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"((9000000, 9), (1000000, 9), (9000000,), (1000000,))" | ||
] | ||
}, | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)\n", | ||
"x_train.shape, x_test.shape, y_train.shape, y_test.shape" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "246f819f", | ||
"metadata": {}, | ||
"source": [ | ||
"Normalize the data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "454a341c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from sklearn.preprocessing import StandardScaler\n", | ||
"\n", | ||
"scaler_y = StandardScaler()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "df400504", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"y_train = y_train.to_numpy().reshape(-1, 1)\n", | ||
"y_test = y_test.to_numpy().reshape(-1, 1)\n", | ||
"\n", | ||
"scaler_y.fit(y_train)\n", | ||
"y_train = scaler_y.transform(y_train).ravel()\n", | ||
"y_test = scaler_y.transform(y_test).ravel()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "fe1d4fac", | ||
"metadata": {}, | ||
"source": [ | ||
"### Patch original Scikit-learn with Intel® Extension for Scikit-learn\n", | ||
"Intel® Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock Scikit-learn package. You can take advantage of the performance optimizations of Intel® Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "ef6938df", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from sklearnex import patch_sklearn\n", | ||
"\n", | ||
"patch_sklearn()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "20c5ab48", | ||
"metadata": {}, | ||
"source": [ | ||
"Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the [list of supported algorithms and parameters](https://intel.github.io/scikit-learn-intelex/latest/algorithms.html) for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, [submit an issue on GitHub](https://github.com/intel/scikit-learn-intelex/issues)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f80273e7", | ||
"metadata": {}, | ||
"source": [ | ||
"Training of the ElasticNet algorithm with Intel® Extension for Scikit-learn for Airlines DepDelay dataset" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"id": "a4dd1c7e", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'Intel® extension for Scikit-learn time: 0.28 s'" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from sklearn.linear_model import ElasticNet\n", | ||
"\n", | ||
"params = {\n", | ||
" \"alpha\": 0.3,\n", | ||
" \"fit_intercept\": False,\n", | ||
" \"l1_ratio\": 0.7,\n", | ||
" \"random_state\": 0,\n", | ||
" \"copy_X\": False,\n", | ||
"}\n", | ||
"start = timer()\n", | ||
"model = ElasticNet(**params).fit(x_train, y_train)\n", | ||
"train_patched = timer() - start\n", | ||
"f\"Intel® extension for Scikit-learn time: {train_patched:.2f} s\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f10b51fc", | ||
"metadata": {}, | ||
"source": [ | ||
"Predict and get a result of the ElasticNet algorithm with Intel® Extension for Scikit-learn" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"id": "d4295a26", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'Patched Scikit-learn MSE: 1.0109113399224974'" | ||
] | ||
}, | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"y_predict = model.predict(x_test)\n", | ||
"mse_metric_opt = metrics.mean_squared_error(y_test, y_predict)\n", | ||
"f\"Patched Scikit-learn MSE: {mse_metric_opt}\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "cbe6db0d", | ||
"metadata": {}, | ||
"source": [ | ||
"### Train the same algorithm with original Scikit-learn\n", | ||
"In order to cancel optimizations, we use *unpatch_sklearn* and reimport the class ElasticNet" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"id": "6f64ba97", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from sklearnex import unpatch_sklearn\n", | ||
"\n", | ||
"unpatch_sklearn()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f242c6da", | ||
"metadata": {}, | ||
"source": [ | ||
"Training of the ElasticNet algorithm with original Scikit-learn library for Airlines DepDelay dataset" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 11, | ||
"id": "67243849", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'Original Scikit-learn time: 3.96 s'" | ||
] | ||
}, | ||
"execution_count": 11, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from sklearn.linear_model import ElasticNet\n", | ||
"\n", | ||
"start = timer()\n", | ||
"model = ElasticNet(**params).fit(x_train, y_train)\n", | ||
"train_unpatched = timer() - start\n", | ||
"f\"Original Scikit-learn time: {train_unpatched:.2f} s\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c85a125c", | ||
"metadata": {}, | ||
"source": [ | ||
"Predict and get a result of the ElasticNet algorithm with original Scikit-learn" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"id": "cd9e726c", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'Original Scikit-learn MSE: 1.0109113399545733'" | ||
] | ||
}, | ||
"execution_count": 12, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"y_predict = model.predict(x_test)\n", | ||
"mse_metric_original = metrics.mean_squared_error(y_test, y_predict)\n", | ||
"f\"Original Scikit-learn MSE: {mse_metric_original}\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 13, | ||
"id": "a2edbb65", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<h3>Compare MSE metric of patched Scikit-learn and original</h3>MSE metric of patched Scikit-learn: 1.0109113399224974 <br>MSE metric of unpatched Scikit-learn: 1.0109113399545733 <br>Metrics ratio: 0.9999999999682703 <br><h3>With Scikit-learn-intelex patching you can:</h3><ul><li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li><li>Fast execution training and prediction of Scikit-learn models;</li><li>Get the similar quality</li><li>Get speedup in <strong>14.2</strong> times.</li></ul>" | ||
], | ||
"text/plain": [ | ||
"<IPython.core.display.HTML object>" | ||
] | ||
}, | ||
"execution_count": 13, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"HTML(\n", | ||
" f\"<h3>Compare MSE metric of patched Scikit-learn and original</h3>\"\n", | ||
" f\"MSE metric of patched Scikit-learn: {mse_metric_opt} <br>\"\n", | ||
" f\"MSE metric of unpatched Scikit-learn: {mse_metric_original} <br>\"\n", | ||
" f\"Metrics ratio: {mse_metric_opt/mse_metric_original} <br>\"\n", | ||
" f\"<h3>With Scikit-learn-intelex patching you can:</h3>\"\n", | ||
" f\"<ul>\"\n", | ||
" f\"<li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li>\"\n", | ||
" f\"<li>Fast execution training and prediction of Scikit-learn models;</li>\"\n", | ||
" f\"<li>Get the similar quality</li>\"\n", | ||
" f\"<li>Get speedup in <strong>{(train_unpatched/train_patched):.1f}</strong> times.</li>\"\n", | ||
" f\"</ul>\"\n", | ||
")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.