Nixtla · jmoralez · Nov 14, 2023 · Nov 13, 2023 · Nov 14, 2023 · Nov 14, 2023
diff --git a/nbs/common.base_auto.ipynb b/nbs/common.base_auto.ipynb
@@ -529,12 +529,6 @@
    "outputs": [],
    "source": [
     "#| hide\n",
-    "\n",
-    "#| hide\n",
-    "import os\n",
-    "os.environ[\"PYTORCH_ENABLE_MPS_FALLBACK\"] = \"1\"\n",
-    "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n",
-    "\n",
     "import optuna\n",
     "import pandas as pd\n",
     "from neuralforecast.models.mlp import MLP\n",
@@ -647,7 +641,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ea77450f",
+   "id": "463d4dc0-b25a-4ce6-9172-5690dc979f0b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -657,8 +651,18 @@
     "from neuralforecast.models.mlp import MLP\n",
     "from neuralforecast.utils import AirPassengersDF as Y_df\n",
     "from neuralforecast.tsdataset import TimeSeriesDataset\n",
-    "from neuralforecast.losses.pytorch import MAE, MSE\n",
-    "\n",
+    "from neuralforecast.losses.pytorch import MAE, MSE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "882c8331-440a-4758-a56c-07a78c0b1603",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#| hide\n",
+    "# Unit tests to guarantee that losses are correctly instantiated\n",
     "Y_train_df = Y_df[Y_df.ds<='1959-12-31'] # 132 train\n",
     "Y_test_df = Y_df[Y_df.ds>'1959-12-31']   # 12 test\n",
     "\n",

diff --git a/nbs/common.base_multivariate.ipynb b/nbs/common.base_multivariate.ipynb
@@ -21,13 +21,19 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "# BaseMultivariate\n",
     "\n",
-    "> The `BaseWindows` class contains standard methods shared across window-based multivariate neural networks; in contrast to recurrent neural networks these models commit to a fixed sequence length input. <br><br>The standard methods include data preprocessing `_normalization`, optimization utilities like parameter initialization, `training_step`, `validation_step`, and shared `fit` and `predict` methods.These shared methods enable all the `neuralforecast.models` compatibility with the `core.NeuralForecast` wrapper class. "
+    "> The `BaseWindows` class contains standard methods shared across window-based multivariate neural networks; in contrast to recurrent neural networks these models commit to a fixed sequence length input."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The standard methods include data preprocessing `_normalization`, optimization utilities like parameter initialization, `training_step`, `validation_step`, and shared `fit` and `predict` methods.These shared methods enable all the `neuralforecast.models` compatibility with the `core.NeuralForecast` wrapper class. "
    ]
   },
   {

diff --git a/nbs/common.base_recurrent.ipynb b/nbs/common.base_recurrent.ipynb
@@ -29,12 +29,16 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The `BaseRecurrent` class contains standard methods shared across recurrent neural networks; these models possess the ability to process variable-length sequences of inputs through their internal memory states. The class is represented by `LSTM`, `GRU`, and `RNN`, along with other more sophisticated architectures like `MQCNN`.\n",
-    "\n",
+    "> The `BaseRecurrent` class contains standard methods shared across recurrent neural networks; these models possess the ability to process variable-length sequences of inputs through their internal memory states. The class is represented by `LSTM`, `GRU`, and `RNN`, along with other more sophisticated architectures like `MQCNN`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "The standard methods include `TemporalNorm` preprocessing, optimization utilities like parameter initialization, `training_step`, `validation_step`, and shared `fit` and `predict` methods.These shared methods enable all the `neuralforecast.models` compatibility with the `core.NeuralForecast` wrapper class."
    ]
   },

diff --git a/nbs/common.base_windows.ipynb b/nbs/common.base_windows.ipynb
@@ -23,14 +23,21 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "id": "12fa25a4",
+   "id": "1e0f9607-d12d-44e5-b2be-91a57a0bca79",
    "metadata": {},
    "source": [
     "# BaseWindows\n",
     "\n",
-    "> The `BaseWindows` class contains standard methods shared across window-based neural networks; in contrast to recurrent neural networks these models commit to a fixed sequence length input. The class is represented by `MLP`, and other more sophisticated architectures like `NBEATS`, and `NHITS`.<br><br>The standard methods include data preprocessing `_normalization`, optimization utilities like parameter initialization, `training_step`, `validation_step`, and shared `fit` and `predict` methods.These shared methods enable all the `neuralforecast.models` compatibility with the `core.NeuralForecast` wrapper class. "
+    "> The `BaseWindows` class contains standard methods shared across window-based neural networks; in contrast to recurrent neural networks these models commit to a fixed sequence length input. The class is represented by `MLP`, and other more sophisticated architectures like `NBEATS`, and `NHITS`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1730a556-1574-40ad-92a2-23b924ceb398",
+   "metadata": {},
+   "source": [
+    "The standard methods include data preprocessing `_normalization`, optimization utilities like parameter initialization, `training_step`, `validation_step`, and shared `fit` and `predict` methods.These shared methods enable all the `neuralforecast.models` compatibility with the `core.NeuralForecast` wrapper class. "
    ]
   },
   {
@@ -817,16 +824,25 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b2fd48a7",
+   "id": "8927f2e5-f376-4c99-bb8f-8cbb73efe01e",
    "metadata": {},
    "outputs": [],
    "source": [
     "#| hide\n",
-    "# add h=0,1 unit test for _parse_windows \n",
     "from neuralforecast.losses.pytorch import MAE\n",
     "from neuralforecast.utils import AirPassengersDF\n",
-    "from neuralforecast.tsdataset import TimeSeriesDataset, TimeSeriesDataModule\n",
-    "\n",
+    "from neuralforecast.tsdataset import TimeSeriesDataset, TimeSeriesDataModule"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "61490e69-f014-4087-83c5-540d5bd7d458",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#| hide\n",
+    "# add h=0,1 unit test for _parse_windows \n",
     "# Declare batch\n",
     "AirPassengersDF['x'] = np.array(len(AirPassengersDF))\n",
     "AirPassengersDF['x2'] = np.array(len(AirPassengersDF)) * 2\n",

diff --git a/nbs/common.scalers.ipynb b/nbs/common.scalers.ipynb
@@ -24,33 +24,38 @@
   },
   {
    "cell_type": "markdown",
-   "id": "56742bfb",
+   "id": "83d112c7-18f8-4f20-acad-34e6de54cebf",
    "metadata": {},
    "source": [
     "# TemporalNorm\n",
     "\n",
-    "> Temporal normalization has proven to be essential in neural forecasting tasks, as it enables network's non-linearities to express themselves. Forecasting scaling methods take particular interest in the temporal dimension where most of the variance dwells, contrary to other deep learning techniques like `BatchNorm` that normalizes across batch and temporal dimensions, and `LayerNorm` that normalizes across the feature dimension. Currently we support the following techniques: `std`, `median`, `norm`, `norm1`, `invariant`, `revin`.<br><br>**References**<br>- [Kin G. Olivares, David Luo, Cristian Challu, Stefania La Vattiata, Max Mergenthaler, Artur Dubrawski (2023). \"HINT: Hierarchical Mixture Networks For Coherent Probabilistic Forecasting\". Neural Information Processing Systems, submitted. Working Paper version available at arxiv.](https://arxiv.org/abs/2305.07089)<br>- [Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang-Ho Choi and Jaegul Choo. \"Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift\". ICLR 2022.](https://openreview.net/pdf?id=cGDAkQo1C0p).<br>- [David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski (2020). \"DeepAR: Probabilistic forecasting with autoregressive recurrent networks\". International Journal of Forecasting.](https://www.sciencedirect.com/science/article/pii/S0169207019301888)<br>"
+    "> Temporal normalization has proven to be essential in neural forecasting tasks, as it enables network's non-linearities to express themselves. Forecasting scaling methods take particular interest in the temporal dimension where most of the variance dwells, contrary to other deep learning techniques like `BatchNorm` that normalizes across batch and temporal dimensions, and `LayerNorm` that normalizes across the feature dimension. Currently we support the following techniques: `std`, `median`, `norm`, `norm1`, `invariant`, `revin`."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9319296d",
+   "id": "fee5e60b-f53b-44ff-9ace-1f5def7b601d",
    "metadata": {},
    "source": [
-    "![Figure 1. Illustration of temporal normalization (left), layer normalization (center) and batch normalization (right). The entries in green show the components used to compute the normalizing statistics.](imgs_models/temporal_norm.png)"
+    "## References"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f5400f41",
+   "cell_type": "markdown",
+   "id": "f9211dd2-99a4-4d67-90cb-bb1f7851685e",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "#| hide\n",
-    "import os\n",
-    "os.environ[\"PYTORCH_ENABLE_MPS_FALLBACK\"] = \"1\"\n",
-    "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\""
+    "* [Kin G. Olivares, David Luo, Cristian Challu, Stefania La Vattiata, Max Mergenthaler, Artur Dubrawski (2023). \"HINT: Hierarchical Mixture Networks For Coherent Probabilistic Forecasting\". Neural Information Processing Systems, submitted. Working Paper version available at arxiv.](https://arxiv.org/abs/2305.07089)\n",
+    "* [Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang-Ho Choi and Jaegul Choo. \"Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift\". ICLR 2022.](https://openreview.net/pdf?id=cGDAkQo1C0p)\n",
+    "* [David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski (2020). \"DeepAR: Probabilistic forecasting with autoregressive recurrent networks\". International Journal of Forecasting.](https://www.sciencedirect.com/science/article/pii/S0169207019301888)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9319296d",
+   "metadata": {},
+   "source": [
+    "![Figure 1. Illustration of temporal normalization (left), layer normalization (center) and batch normalization (right). The entries in green show the components used to compute the normalizing statistics.](imgs_models/temporal_norm.png)"
    ]
   },
   {
@@ -68,14 +73,23 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b7250387",
+   "id": "0f08562b-88d8-4e92-aeeb-bc9bc4c61ab7",
    "metadata": {},
    "outputs": [],
    "source": [
     "#| hide\n",
     "from nbdev.showdoc import show_doc\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5201e067-f7c0-4ca3-89a7-d879001b1908",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#| hide\n",
     "plt.rcParams[\"axes.grid\"]=True\n",
     "plt.rcParams['font.family'] = 'serif'\n",
     "plt.rcParams[\"figure.figsize\"] = (4,2)"
@@ -86,7 +100,7 @@
    "id": "ef461e9c",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\"> 1. Auxiliary Functions </span>"
+    "# 1. Auxiliary Functions"
    ]
   },
   {
@@ -167,7 +181,7 @@
    "id": "a7a486a2",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\"> 2. Scalers </span>"
+    "# 2. Scalers"
    ]
   },
   {
@@ -185,8 +199,10 @@
     "    [0,1] range. This transformation is often used as an alternative \n",
     "    to the standard scaler. The scaled features are obtained as:\n",
     "\n",
-    "    $$\\mathbf{z} = (\\mathbf{x}_{[B,T,C]}-\\mathrm{min}({\\mathbf{x}})_{[B,1,C]})/\n",
-    "        (\\mathrm{max}({\\mathbf{x}})_{[B,1,C]}- \\mathrm{min}({\\mathbf{x}})_{[B,1,C]})$$\n",
+    "    $$\n",
+    "    \\mathbf{z} = (\\mathbf{x}_{[B,T,C]}-\\mathrm{min}({\\mathbf{x}})_{[B,1,C]})/\n",
+    "        (\\mathrm{max}({\\mathbf{x}})_{[B,1,C]}- \\mathrm{min}({\\mathbf{x}})_{[B,1,C]})\n",
+    "    $$\n",
     "\n",
     "    **Parameters:**<br>\n",
     "    `x`: torch.Tensor input tensor.<br>\n",
@@ -587,7 +603,7 @@
    "id": "e87e828c",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\"> 3. TemporalNorm Module </span>"
+    "# 3. TemporalNorm Module"
    ]
   },
   {
@@ -766,7 +782,7 @@
    "id": "3e2968e0",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\"> Example </span>"
+    "# Example"
    ]
   },
   {

diff --git a/nbs/core.ipynb b/nbs/core.ipynb
@@ -40,10 +40,6 @@
    "outputs": [],
    "source": [
     "#| hide\n",
-    "import os\n",
-    "os.environ[\"PYTORCH_ENABLE_MPS_FALLBACK\"] = \"1\"\n",
-    "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n",
-    "\n",
     "import shutil\n",
     "from fastcore.test import test_eq, test_fail\n",
     "from nbdev.showdoc import show_doc\n",
@@ -907,13 +903,23 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8898e349-8000-4668-a1c5-52c03c69e85a",
+   "id": "5d6ef366-daec-4ec6-a2ae-199c6ea39a51",
    "metadata": {},
    "outputs": [],
    "source": [
     "#| hide\n",
     "import logging\n",
-    "import warnings\n",
+    "import warnings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0ac1aa65-40a4-4909-bdfb-1439c30439b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#| hide\n",
     "logging.getLogger(\"pytorch_lightning\").setLevel(logging.ERROR)\n",
     "warnings.filterwarnings(\"ignore\")"
    ]

diff --git a/nbs/examples/Neuralforecast_Map.ipynb b/nbs/examples/Neuralforecast_Map.ipynb
@@ -145,5 +145,5 @@
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/nbs/losses.numpy.ipynb b/nbs/losses.numpy.ipynb
@@ -49,6 +49,7 @@
    "outputs": [],
    "source": [
     "#| hide\n",
+    "from IPython.display import Image\n",
     "from nbdev.showdoc import show_doc"
    ]
   },
@@ -59,7 +60,6 @@
    "outputs": [],
    "source": [
     "#| hide\n",
-    "from IPython.display import Image\n",
     "WIDTH = 600\n",
     "HEIGHT = 300"
    ]
@@ -100,7 +100,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\">1. Scale-dependent Errors </span>\n",
+    "# 1. Scale-dependent Errors\n",
     "\n",
     "These metrics are on the same scale as the data."
    ]
@@ -304,7 +304,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\"> 2. Percentage errors </span>\n",
+    "# 2. Percentage errors\n",
     "\n",
     "These metrics are unit-free, suitable for comparisons across series."
    ]
@@ -446,7 +446,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\"> 3. Scale-independent Errors </span>\n",
+    "# 3. Scale-independent Errors\n",
     "\n",
     "These metrics measure the relative improvements versus baselines."
    ]
@@ -596,7 +596,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\"> 4. Probabilistic Errors </span>\n",
+    "# 4. Probabilistic Errors\n",
     "\n",
     "These measure absolute deviation non-symmetrically, that produce under/over estimation."
    ]
@@ -763,7 +763,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# <span style=\"color:DarkBlue\"> Examples and Validation </span>"
+    "# Examples and Validation"
    ]
   },
   {