diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3584fffb32..972af3054a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -11,9 +11,15 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
 
 **Improved**
 
-- Added `IQRDetector`, that allows to detect anomalies using the interquartile range algorithm. [#2441] by [Igor Urbanik](https://github.com/u8-igor).
+- Added `IQRDetector`, that allows to detect anomalies using the interquartile range algorithm. [#2441](https://github.com/unit8co/darts/issues/2441) by [Igor Urbanik](https://github.com/u8-igor).
 - Added hyperparameters controlling the hidden layer sizes for the feature encoders in `TiDEModel`. [#2408](https://github.com/unit8co/darts/issues/2408) by [eschibli](https://github.com/eschibli).
+- Added hyperparameter `activation` to `BlockRNNModel` to specify the activation function in case of a multi-layer output network. [#2408](https://github.com/unit8co/darts/issues/2408) by [eschibli](https://github.com/eschibli).
 - Added support for broadcasting to TimeSeries on component and sample level. [#2476](https://https://github.com/unit8co/darts/pull/2476) by [Joel L.](https://github.com/Joelius300).
+- Added support for computing metrics, backtest, and residuals on one or multiple quantiles `q`, either from probabilistic predictions or predicted quantiles. [#2530](https://github.com/unit8co/darts/issues/2530) by [Dennis Bader](https://github.com/dennisbader).
+- Added quantile interval metrics: `miw` (Mean Interval Width, time aggregated) and `iw` (Interval Width, per time step / non-aggregated) which compute the width of quantile intervals `q_intervals` (expected to be a tuple or sequence of tuples with (lower quantile, upper quantile). [#2530](https://github.com/unit8co/darts/issues/2530) by [Dennis Bader](https://github.com/dennisbader).
+- Added property `TimeSeries.shape` to get the shape of the time series. [#2530](https://github.com/unit8co/darts/issues/2530) by [Dennis Bader](https://github.com/dennisbader).
+- Added support for parameters `enable_optimization` and `predict_likelihood_parameters` to the forecasting models' `backtest()` and `residuals()` methods. [#2530](https://github.com/unit8co/darts/issues/2530) by [Dennis Bader](https://github.com/dennisbader).
+- Helper function `darts.utils.utils.generate_index()` now accepts datetime strings as `start` and `end` parameters to generate the pandas DatetimeIndex. [#2522](https://github.com/unit8co/darts/pull/2522) by [Dennis Bader](https://github.com/dennisbader).
 - Various improvements in the documentation:
   - Made README's forecasting model support table more colorblind-friendly. [#2433](https://github.com/unit8co/darts/pull/2433)
   - Updated the Ray Tune Hyperparameter Optimization example in the [user guide](https://unit8co.github.io/darts/userguide/hyperparameter_optimization.html) to work with the latest `ray` versions (`>=2.31.0`). [#2459](https://github.com/unit8co/darts/pull/2459) by [He Weilin](https://github.com/cnhwl).
@@ -22,6 +28,11 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
 
 **Fixed**
 
+- Fixed a bug when predicting with `predict_likelihood_parameters=True`, `n > 1` and a `RegressionModel` with `multi_models=False` that uses a `likelihood`. The prediction now works without raising an exception. [#2545](https://github.com/unit8co/darts/pull/2545) by [Dennis Bader](https://github.com/dennisbader).
+- Fixed a bug when performing probabilistic optimized historical forecasts (`num_samples>1, retrain=False, enable_optimization=True`) with regression models, where reshaping the array resulted in a wrong order of samples across components and forecasts. [#2534](https://github.com/unit8co/darts/pull/2534) by [Dennis Bader](https://github.com/dennisbader).
+- Fixed bug when plotting a probabilistic multivariate series, where all confidence intervals (starting from 2nd component) had the same color as the median line. [#2532](https://github.com/unit8co/darts/pull/2532) by [Dennis Bader](https://github.com/dennisbader).
+- Fixed a bug when passing an empty array to `TimeSeries.prepend/append_values()` raised an error. [#2522](https://github.com/unit8co/darts/pull/2522) by [Alessio Pellegrini](https://github.com/AlessiopSymplectic)
+- Fixed a bug with `TimeSeries.prepend/append_values()`, where the name of the (time) index was lost. [#2522](https://github.com/unit8co/darts/pull/2522) by [Alessio Pellegrini](https://github.com/AlessiopSymplectic)
 - Fixed a bug when using `from_group_dataframe()` with a `time_col` of type integer, where the resulting time index was wrongly converted to a DatetimeIndex. [#2512](https://github.com/unit8co/darts/pull/2512) by [Alessio Pellegrini](https://github.com/AlessiopSymplectic)
 - Fixed a bug when using `historical_forecasts()` with a pre-trained `RegressionModel` that has no target lags `lags=None` but uses static covariates. [#2426](https://github.com/unit8co/darts/pull/2426) by [Dennis Bader](https://github.com/dennisbader).
 - Fixed a bug with `xgboost>=2.1.0`, where multi output regression was not properly handled. [#2426](https://github.com/unit8co/darts/pull/2426) by [Dennis Bader](https://github.com/dennisbader).
@@ -30,6 +41,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
 - Fixed a bug preventing TimeSeries to be divided by xarray or ndarray. [#2476](https://https://github.com/unit8co/darts/pull/2476) by [Joel L.](https://github.com/Joelius300).
 - Fixed a bug when using `save()` and `load()` with a `RegressionEnsembleModel` that ensembles any `TorchForecastingModel`. [#2437](https://github.com/unit8co/darts/issues/2437) by [He Weilin](https://github.com/cnhwl).
 - Fixed a bug with `CrostonModel`, which actually does not support future covariates. [#2511](https://github.com/unit8co/darts/pull/2511) by [Antoine Madrona](https://github.com/madtoinou).
+- Fixed the comment of `scorers_are_univariate` in class `AnomalyModel`. [#2452](https://github.com/unit8co/darts/pull/2542) by [He Weilin](https://github.com/cnhwl).
 
 **Dependencies**
 
diff --git a/darts/ad/anomaly_model/anomaly_model.py b/darts/ad/anomaly_model/anomaly_model.py
index be66758a0f..97f0934ba4 100644
--- a/darts/ad/anomaly_model/anomaly_model.py
+++ b/darts/ad/anomaly_model/anomaly_model.py
@@ -317,7 +317,7 @@ def show_anomalies(
 
     @property
     def scorers_are_univariate(self):
-        """Whether any of the Scorers is trainable."""
+        """Whether any of the Scorers is univariate."""
         return any(s.is_univariate for s in self.scorers)
 
     @property
diff --git a/darts/ad/anomaly_model/forecasting_am.py b/darts/ad/anomaly_model/forecasting_am.py
index a70c5b21bb..fd3eb9a33a 100644
--- a/darts/ad/anomaly_model/forecasting_am.py
+++ b/darts/ad/anomaly_model/forecasting_am.py
@@ -132,6 +132,7 @@ def fit(
             `train_length`.
         enable_optimization
             Whether to use the optimized version of historical_forecasts when supported and available.
+            Default: ``True``.
         model_fit_kwargs
             Parameters to be passed on to the forecast model `fit()` method.
 
@@ -212,6 +213,7 @@ def score(
             `train_length`.
         enable_optimization
             Whether to use the optimized version of historical_forecasts when supported and available.
+            Default: ``True``.
         return_model_prediction
             Whether to return the forecasting model prediction along with the anomaly scores.
 
@@ -299,6 +301,7 @@ def predict_series(
             `train_length`.
         enable_optimization
             Whether to use the optimized version of historical_forecasts when supported and available.
+            Default: ``True``.
 
         Returns
         -------
@@ -394,6 +397,7 @@ def eval_metric(
             `train_length`.
         enable_optimization
             Whether to use the optimized version of historical_forecasts when supported and available.
+            Default: ``True``.
         metric
             The name of the metric function to use. Must be one of "AUC_ROC" (Area Under the
             Receiver Operating Characteristic Curve) and "AUC_PR" (Average Precision from scores).
@@ -499,6 +503,7 @@ def show_anomalies(
             `train_length`.
         enable_optimization
             Whether to use the optimized version of historical_forecasts when supported and available.
+            Default: ``True``.
         anomalies
             The ground truth of the anomalies (1 if it is an anomaly and 0 if not).
         names_of_scorers
diff --git a/darts/metrics/__init__.py b/darts/metrics/__init__.py
index 7a8cb445da..2d0b5aaa2e 100644
--- a/darts/metrics/__init__.py
+++ b/darts/metrics/__init__.py
@@ -2,7 +2,10 @@
 Metrics
 -------
 
-For deterministic forecasts (point predictions with `num_samples == 1`):
+For deterministic forecasts (point predictions with `num_samples == 1`), probabilistic forecasts (`num_samples > 1`),
+and quantile forecasts. For probablistic and quantile forecasts, use parameter `q` to define the quantile(s) to
+compute the deterministic metrics on:
+
     - Aggregated over time:
         Absolute metrics:
             - :func:`MERR <darts.metrics.metrics.merr>`: Mean Error
@@ -42,8 +45,10 @@
     - Aggregated over time:
         - :func:`MQL <darts.metrics.metrics.mql>`: Mean Quantile Loss
         - :func:`QR <darts.metrics.metrics.qr>`: Quantile Risk
+        - :func:`MIW <darts.metrics.metrics.miw>`: Mean Interval Width
     - Per time step:
         - :func:`QL <darts.metrics.metrics.ql>`: Quantile Loss
+        - :func:`IW <darts.metrics.metrics.iw>`: Interval Width
 
 For Dynamic Time Warping (DTW) (aggregated over time):
     - :func:`DTW <darts.metrics.metrics.dtw_metric>`: Dynamic Time Warping Metric
@@ -57,11 +62,13 @@
     coefficient_of_variation,
     dtw_metric,
     err,
+    iw,
     mae,
     mape,
     marre,
     mase,
     merr,
+    miw,
     mql,
     mse,
     msse,
@@ -90,6 +97,7 @@
     se,
     sle,
     sse,
+    iw,
 }
 
 __all__ = [
@@ -120,4 +128,6 @@
     "sle",
     "smape",
     "sse",
+    "iw",
+    "miw",
 ]
diff --git a/darts/metrics/metrics.py b/darts/metrics/metrics.py
index 40a7e6e15e..1c667a7eb3 100644
--- a/darts/metrics/metrics.py
+++ b/darts/metrics/metrics.py
@@ -8,19 +8,27 @@
 import inspect
 from functools import wraps
 from inspect import signature
-from typing import Callable, List, Optional, Sequence, Tuple, Union
+from typing import Any, Callable, List, Optional, Sequence, Tuple, Union
 
 import numpy as np
+import pandas as pd
 
 from darts import TimeSeries
 from darts.dataprocessing import dtw
 from darts.logging import get_logger, raise_log
-from darts.utils import _build_tqdm_iterator, _parallel_apply, n_steps_between
 from darts.utils.ts_utils import SeriesType, get_series_seq_type, series2seq
+from darts.utils.utils import (
+    _build_tqdm_iterator,
+    _parallel_apply,
+    likelihood_component_names,
+    n_steps_between,
+    quantile_names,
+)
 
 logger = get_logger(__name__)
 TIME_AX = 0
 COMP_AX = 1
+SMPL_AX = 2
 
 # Note: for new metrics added to this module to be able to leverage the two decorators, it is required both having
 # the `actual_series` and `pred_series` parameters, and not having other ``Sequence`` as args (since these decorators
@@ -33,6 +41,51 @@
 ]
 
 
+def interval_support(func) -> Callable[..., METRIC_OUTPUT_TYPE]:
+    """
+    This decorator adds support for quantile interval metrics with sanity checks, processing, and extraction of
+    quantiles from the intervals.
+    """
+
+    @wraps(func)
+    def wrapper_interval_support(*args, **kwargs):
+        q = kwargs.get("q")
+        if q is not None:
+            raise_log(
+                ValueError(
+                    "`q` is not supported for quantile interval metrics; use `q_interval` instead."
+                )
+            )
+        q_interval = kwargs.get("q_interval")
+        if q_interval is None:
+            raise_log(
+                ValueError("Quantile interval metrics require setting `q_interval`.")
+            )
+        if isinstance(q_interval, tuple):
+            q_interval = [q_interval]
+        q_interval = np.array(q_interval)
+        if not q_interval.ndim == 2 or q_interval.shape[1] != 2:
+            raise_log(
+                ValueError(
+                    "`q_interval` must be a tuple (float, float) or a sequence of tuples (float, float)."
+                ),
+                logger=logger,
+            )
+        if not np.all(q_interval[:, 1] - q_interval[:, 0] > 0):
+            raise_log(
+                ValueError(
+                    "all intervals in `q_interval` must be tuples of (lower q, upper q) with `lower q > upper q`. "
+                    f"Received `q_interval={q_interval}`"
+                ),
+                logger=logger,
+            )
+        kwargs["q_interval"] = q_interval
+        kwargs["q"] = np.sort(np.unique(q_interval))
+        return func(*args, **kwargs)
+
+    return wrapper_interval_support
+
+
 def multi_ts_support(func) -> Callable[..., METRIC_OUTPUT_TYPE]:
     """
     This decorator further adapts the metrics that took as input two (or three for scaled metrics with `insample`)
@@ -60,12 +113,7 @@ def wrapper_multi_ts_support(*args, **kwargs):
 
         params = signature(func).parameters
         n_jobs = kwargs.pop("n_jobs", params["n_jobs"].default)
-        if not isinstance(n_jobs, int):
-            raise_log(ValueError("n_jobs must be an integer"), logger=logger)
-
         verbose = kwargs.pop("verbose", params["verbose"].default)
-        if not isinstance(verbose, bool):
-            raise_log(ValueError("verbose must be a bool"), logger=logger)
 
         # sanity check reduction functions
         _ = _get_reduction(
@@ -131,6 +179,38 @@ def wrapper_multi_ts_support(*args, **kwargs):
             num_series_in_args += int("insample" not in kwargs)
             kwargs.pop("insample", 0)
 
+        # handle `q` (quantile) parameter for probabilistic (or quantile) forecasts
+        if "q" in params:
+            # convert `q` to tuple of (quantile values, optional quantile component names)
+            q = kwargs.get("q", params["q"].default)
+            q_comp_names = None
+            if q is None:
+                kwargs["q"] = None
+            else:
+                if isinstance(q, tuple):
+                    q, q_comp_names = q
+                if isinstance(q, float):
+                    q = np.array([q])
+                else:
+                    q = np.array(q)
+
+                if not np.all(q[1:] - q[:-1] > 0.0):
+                    raise_log(
+                        ValueError(
+                            "`q` must be of type `float`, or a sequence of increasing order with unique values only. "
+                            f"Received `q={q}`."
+                        ),
+                        logger=logger,
+                    )
+                if not np.all(q >= 0.0) & np.all(q <= 1.0):
+                    raise_log(
+                        ValueError(
+                            f"All `q` values must be in the range `(>=0,<=1)`. Received `q={q}`."
+                        ),
+                        logger=logger,
+                    )
+                kwargs["q"] = (q, q_comp_names)
+
         iterator = _build_tqdm_iterator(
             iterable=zip(*input_series),
             verbose=verbose,
@@ -187,20 +267,61 @@ def wrapper_multivariate_support(*args, **kwargs) -> METRIC_OUTPUT_TYPE:
         pred_series = args[1]
         num_series_in_args = 2
 
-        if actual_series.width != pred_series.width:
-            raise_log(
-                ValueError(
-                    f"Mismatch between number of components in `actual_series` "
-                    f"(n={actual_series.width}) and `pred_series` (n={pred_series.width}."
-                ),
-                logger=logger,
-            )
+        q, q_comp_names = kwargs.get("q"), None
+        if q is None:
+            # without quantiles, the number of components must match
+            if actual_series.n_components != pred_series.n_components:
+                raise_log(
+                    ValueError(
+                        f"Mismatch between number of components in `actual_series` "
+                        f"(n={actual_series.width}) and `pred_series` (n={pred_series.width})."
+                    ),
+                    logger=logger,
+                )
+            # compute median for stochastic predictions
+            if pred_series.is_stochastic:
+                q = np.array([0.5])
+        else:
+            # `q` is required to be a tuple (handled by `multi_ts_support` wrapper)
+            if not isinstance(q, tuple) or not len(q) == 2:
+                raise_log(
+                    ValueError(
+                        "`q` must be of tuple of `(np.ndarray, Optional[pd.Index])` "
+                        "where the (quantile values, optioanl quantile component names). "
+                        f"Received `q={q}`."
+                    ),
+                    logger=logger,
+                )
+            q, q_comp_names = q
+            if not pred_series.is_stochastic:
+                # quantile component names are required if the predictions are not stochastic (as for stocahstic
+                # predictions, the quantiles can be retrieved from the sample dimension for each component)
+                if q_comp_names is None:
+                    q_comp_names = pd.Index(
+                        likelihood_component_names(
+                            components=actual_series.components,
+                            parameter_names=quantile_names(q=q),
+                        )
+                    )
+                if not q_comp_names.isin(pred_series.components).all():
+                    raise_log(
+                        ValueError(
+                            f"Computing a metric with quantile(s) `q={q}` is only supported for probabilistic "
+                            f"`pred_series` (num samples > 1) or `pred_series` containing the predicted "
+                            f"quantiles as columns / components. Either pass a probabilistic `pred_series` or "
+                            f"a series containing the expected quantile components: {q_comp_names.tolist()} "
+                        ),
+                        logger=logger,
+                    )
+
+        if "q" in params:
+            kwargs["q"] = (q, q_comp_names)
 
         # handle `insample` parameters for scaled metrics
         input_series = (actual_series, pred_series)
         if "insample" in params:
             insample = args[2]
-            if actual_series.width != insample.width:
+            if actual_series.n_components != insample.n_components:
                 raise_log(
                     ValueError(
                         f"Mismatch between number of components in `actual_series` "
@@ -212,15 +333,18 @@ def wrapper_multivariate_support(*args, **kwargs) -> METRIC_OUTPUT_TYPE:
             num_series_in_args += 1
 
         vals = func(*input_series, *args[num_series_in_args:], **kwargs)
-        if not 1 <= len(vals.shape) <= 2:
+        # bring vals to shape (n_time, n_comp, n_quantile)
+        if not 2 <= len(vals.shape) <= 3:
             raise_log(
                 ValueError(
-                    "Metric output must have 1 dimension for aggregated metrics (e.g. `mae()`, ...), "
-                    "or 2 dimension for time dependent metrics (e.g. `ae()`, ...)"
+                    "Metric output must have 2 dimensions (n components, n quantiles) "
+                    "for aggregated metrics (e.g. `mae()`, ...), "
+                    "or 3 dimension (n times, n components, n quantiles)  "
+                    "for time dependent metrics (e.g. `ae()`, ...)"
                 ),
                 logger=logger,
             )
-        elif len(vals.shape) == 1:
+        if len(vals.shape) == 2:
             vals = np.expand_dims(vals, TIME_AX)
 
         time_reduction = _get_reduction(
@@ -231,6 +355,7 @@ def wrapper_multivariate_support(*args, **kwargs) -> METRIC_OUTPUT_TYPE:
             sanity_check=False,
         )
         if time_reduction is not None:
+            # -> (1, n_comp, n_quantile)
             vals = np.expand_dims(time_reduction(vals, axis=TIME_AX), axis=TIME_AX)
 
         component_reduction = _get_reduction(
@@ -241,40 +366,71 @@ def wrapper_multivariate_support(*args, **kwargs) -> METRIC_OUTPUT_TYPE:
             sanity_check=False,
         )
         if component_reduction is not None:
-            vals = np.expand_dims(component_reduction(vals, axis=COMP_AX), axis=COMP_AX)
+            # -> (*, n_quantile)
+            vals = component_reduction(vals, axis=COMP_AX)
+        else:
+            # -> (*, n_comp * n_quantile), with order [c0_q0, c0_q1, ... c1_q0, c1_q1, ...]
+            vals = vals.reshape(vals.shape[0], -1)
         return vals
 
     return wrapper_multivariate_support
 
 
 def _get_values(
-    vals: np.ndarray, stochastic_quantile: Optional[float] = 0.5
+    vals: np.ndarray,
+    vals_components: pd.Index,
+    actual_components: pd.Index,
+    q: Optional[Tuple[Sequence[float], Union[Optional[pd.Index]]]] = None,
 ) -> np.ndarray:
     """
-    Returns a deterministic or probabilistic numpy array from the values of a time series.
-    For stochastic input values, return either all sample values with (stochastic_quantile=None) or the quantile sample
-    value with (stochastic_quantile {>=0,<=1})
+    Returns a deterministic or probabilistic numpy array from the values of a time series of shape
+    (times, components, samples / quantiles).
+    To extract quantile (sample) values from quantile or stachastic `vals`, use `q`.
+
+    Parameters
+    ----------
+    vals
+        A numpy array with the values of a TimeSeries (actual values or predictions).
+    vals_components
+        The components of the `vals` TimeSeries.
+    actual_components
+        The components of the actual TimeSeries.
+    q
+        Optionally, for stochastic or quantile series/values, return deterministic quantile values.
+        If not `None`, must a tuple with (quantile values,
+        `None` if `pred_series` is stochastic else the quantile component names).
     """
-    if vals.shape[2] == 1:  # deterministic
-        out = vals[:, :, 0]
-    else:  # stochastic
-        if stochastic_quantile is None:
-            out = vals
-        else:
-            out = np.quantile(vals, stochastic_quantile, axis=2)
-    return out
+    # return values as is (times, components, samples)
+    if q is None:
+        return vals
+
+    q, q_names = q
+    if vals.shape[SMPL_AX] == 1:  # deterministic (or quantile components) input
+        if q_names is not None:
+            # `q_names` are the component names of the predicted quantile parameters
+            # we extract the relevant quantile components with shape (times, components * quantiles)
+            vals = vals[:, vals_components.get_indexer(q_names)]
+            # rearrange into (times, components, quantiles)
+            vals = vals.reshape((len(vals), len(actual_components), -1))
+        return vals
+
+    # probabilistic input
+    # compute multiple quantiles for all times and components; with shape: (quantiles, times, components)
+    out = np.quantile(vals, q, axis=SMPL_AX)
+    # rearrange into (times, components, quantiles)
+    return out.transpose((1, 2, 0))
 
 
 def _get_values_or_raise(
     series_a: TimeSeries,
     series_b: TimeSeries,
     intersect: bool,
-    stochastic_quantile: Optional[float] = 0.5,
+    q: Optional[Tuple[Sequence[float], Union[Optional[pd.Index]]]] = None,
     remove_nan_union: bool = False,
     is_insample: bool = False,
 ) -> Tuple[np.ndarray, np.ndarray]:
     """Returns the processed numpy values of two time series. Processing can be customized with arguments
-    `intersect, stochastic_quantile, remove_nan_union`.
+    `intersect, q, remove_nan_union`.
 
     Parameters
     ----------
@@ -285,9 +441,10 @@ def _get_values_or_raise(
         A deterministic or stochastic ``TimeSeries`` instance (the predictions `pred_series`).
     intersect
         A boolean for whether to only consider the time intersection between `series_a` and `series_b`
-    stochastic_quantile
-        Optionally, for stochastic predicted series, return either all sample values with (`stochastic_quantile=None`)
-        or any deterministic quantile sample values by setting `stochastic_quantile=quantile` {>=0,<=1}.
+    q
+        Optionally, for predicted stochastic or quantile series, return deterministic quantile values.
+        If not `None`, must a tuple with (quantile values,
+        `None` if `pred_series` is stochastic else the quantile component names).
     remove_nan_union
         By setting `remove_non_union` to True, sets all values from `series_a` and `series_b` to `np.nan` at indices
         where any of the two series contain a NaN value. Only effective when `is_insample=False`.
@@ -299,16 +456,6 @@ def _get_values_or_raise(
     ValueError
         If `is_insample=False` and the two time series do not have at least a partially overlapping time index.
     """
-
-    if not series_a.width == series_b.width:
-        raise_log(
-            ValueError("The two time series must have the same number of components"),
-            logger=logger,
-        )
-
-    if not isinstance(intersect, bool):
-        raise_log(ValueError("The intersect parameter must be a bool"), logger=logger)
-
     make_copy = False
     if not is_insample:
         # get the time intersection and values of the two series (corresponds to `actual_series` and `pred_series`
@@ -319,15 +466,12 @@ def _get_values_or_raise(
             vals_a_common = series_a.slice_intersect_values(series_b, copy=make_copy)
             vals_b_common = series_b.slice_intersect_values(series_a, copy=make_copy)
 
-        if not len(vals_a_common) == len(vals_b_common):
-            raise_log(
-                ValueError(
-                    "The two time series must have at least a partially overlapping time index."
-                ),
-                logger=logger,
-            )
-
-        vals_b_det = _get_values(vals_b_common, stochastic_quantile=stochastic_quantile)
+        vals_b = _get_values(
+            vals=vals_b_common,
+            vals_components=series_b.components,
+            actual_components=series_a.components,
+            q=q,
+        )
     else:
         # for `insample` series we extract only values up until before start of `pred_series`
         # find how many steps `insample` overlaps into `series_b`
@@ -347,36 +491,67 @@ def _get_values_or_raise(
             )
         end = end or None
         vals_a_common = series_a.all_values(copy=make_copy)[:end]
-        vals_b_det = None
-    vals_a_det = _get_values(vals_a_common, stochastic_quantile=stochastic_quantile)
+        vals_b = None
+    vals_a = _get_values(
+        vals=vals_a_common,
+        vals_components=series_a.components,
+        actual_components=series_a.components,
+        q=([0.5], None),
+    )
 
     if not remove_nan_union or is_insample:
-        return vals_a_det, vals_b_det
+        return vals_a, vals_b
 
-    b_is_deterministic = bool(len(vals_b_det.shape) == 2)
-    if b_is_deterministic:
-        isnan_mask = np.logical_or(np.isnan(vals_a_det), np.isnan(vals_b_det))
-        isnan_mask_pred = isnan_mask
-    else:
-        isnan_mask = np.logical_or(
-            np.isnan(vals_a_det), np.isnan(vals_b_det).any(axis=2)
-        )
-        isnan_mask_pred = np.repeat(
-            np.expand_dims(isnan_mask, axis=-1), vals_b_det.shape[2], axis=2
-        )
-    return np.where(isnan_mask, np.nan, vals_a_det), np.where(
-        isnan_mask_pred, np.nan, vals_b_det
+    isnan_mask = np.expand_dims(
+        np.logical_or(np.isnan(vals_a), np.isnan(vals_b)).any(axis=SMPL_AX), axis=-1
+    )
+    isnan_mask_pred = np.repeat(isnan_mask, vals_b.shape[SMPL_AX], axis=SMPL_AX)
+    return np.where(isnan_mask, np.nan, vals_a), np.where(
+        isnan_mask_pred, np.nan, vals_b
     )
 
 
+def _get_quantile_intervals(
+    vals: np.ndarray,
+    q: Tuple[Sequence[float], Any],
+    q_interval: np.ndarray = None,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Returns the lower and upper bound values from `vals` for all quantile intervals in `q_interval`.
+
+    Parameters
+    ----------
+    vals
+        A numpy array with predicted quantile values of shape (n times, n components, n quantiles).
+    q
+        A tuple with (quantile values, any).
+    q_interval
+        A numpy array with the lower and upper quantile interval bound of shape (n intervals, 2).
+    """
+    q, _ = q
+    # find index of every `q_interval` value in `q`; we have guarantees from support wrappers:
+    # - `q` has increasing order
+    # - `vals` has same order as `q` in dim 3 (quantile dim)
+    # - `q_interval` holds (lower q, upper q) in that order
+    q_idx = np.searchsorted(q, q_interval.flatten()).reshape(q_interval.shape)
+    return vals[:, :, q_idx[:, 0]], vals[:, :, q_idx[:, 1]]
+
+
 def _get_wrapped_metric(
-    func: Callable[..., METRIC_OUTPUT_TYPE],
+    func: Callable[..., METRIC_OUTPUT_TYPE], n_wrappers: int = 2
 ) -> Callable[..., METRIC_OUTPUT_TYPE]:
     """Returns the inner metric function `func` which bypasses the decorators `multi_ts_support` and
     `multivariate_support`. It significantly decreases process time compared to calling `func` directly.
     Only use this to compute a pre-defined metric within the scope of another metric.
     """
-    return func.__wrapped__.__wrapped__
+    if not 2 <= n_wrappers <= 3:
+        raise_log(
+            NotImplementedError("Only 2-3 wrappers are currently supported"),
+            logger=logger,
+        )
+    if n_wrappers == 2:
+        return func.__wrapped__.__wrapped__
+    else:
+        return func.__wrapped__.__wrapped__.__wrapped__
 
 
 def _get_reduction(
@@ -471,6 +646,7 @@ def err(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -484,8 +660,9 @@ def err(
 
     .. math:: y_t - \\hat{y}_t
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -496,6 +673,8 @@ def err(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -543,7 +722,11 @@ def err(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=False
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=False,
+        q=q,
     )
     return y_true - y_pred
 
@@ -555,6 +738,7 @@ def merr(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -567,8 +751,9 @@ def merr(
 
     .. math:: \\frac{1}{T}\\sum_{t=1}^T{(y_t - \\hat{y}_t)}
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -579,6 +764,8 @@ def merr(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -620,6 +807,7 @@ def merr(
             actual_series,
             pred_series,
             intersect,
+            q=q,
         ),
         axis=TIME_AX,
     )
@@ -632,6 +820,7 @@ def ae(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -645,8 +834,9 @@ def ae(
 
     .. math:: |y_t - \\hat{y}_t|
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -657,6 +847,8 @@ def ae(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -704,7 +896,11 @@ def ae(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=False
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=False,
+        q=q,
     )
     return np.abs(y_true - y_pred)
 
@@ -716,6 +912,7 @@ def mae(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -728,8 +925,9 @@ def mae(
 
     .. math:: \\frac{1}{T}\\sum_{t=1}^T{|y_t - \\hat{y}_t|}
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -740,6 +938,8 @@ def mae(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -781,6 +981,7 @@ def mae(
             actual_series,
             pred_series,
             intersect,
+            q=q,
         ),
         axis=TIME_AX,
     )
@@ -795,6 +996,7 @@ def ase(
     m: int = 1,
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -816,8 +1018,9 @@ def ase(
 
     .. math:: E_m = MAE(y_{m:t_p}, y_{0:t_p - m}).
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -835,6 +1038,8 @@ def ase(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -895,6 +1100,7 @@ def ase(
         actual_series,
         pred_series,
         intersect,
+        q=q,
     )
     return errors / error_scale
 
@@ -908,6 +1114,7 @@ def mase(
     m: int = 1,
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -928,8 +1135,9 @@ def mase(
 
     .. math:: E_m = MAE(y_{m:t_p}, y_{0:t_p - m}).
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -947,6 +1155,8 @@ def mase(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -1000,6 +1210,7 @@ def mase(
             insample,
             m=m,
             intersect=intersect,
+            q=q,
         ),
         axis=TIME_AX,
     )
@@ -1012,6 +1223,7 @@ def se(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -1025,8 +1237,9 @@ def se(
 
     .. math:: (y_t - \\hat{y}_t)^2.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1037,6 +1250,8 @@ def se(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -1084,7 +1299,11 @@ def se(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=False
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=False,
+        q=q,
     )
     return (y_true - y_pred) ** 2
 
@@ -1096,6 +1315,7 @@ def mse(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -1108,8 +1328,9 @@ def mse(
 
     .. math:: \\frac{1}{T}\\sum_{t=1}^T{(y_t - \\hat{y}_t)^2}.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1120,6 +1341,8 @@ def mse(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -1161,6 +1384,7 @@ def mse(
             actual_series,
             pred_series,
             intersect,
+            q=q,
         ),
         axis=TIME_AX,
     )
@@ -1175,6 +1399,7 @@ def sse(
     m: int = 1,
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -1196,8 +1421,9 @@ def sse(
 
     .. math:: E_m = MSE(y_{m:t_p}, y_{0:t_p - m}).
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1215,6 +1441,8 @@ def sse(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -1275,6 +1503,7 @@ def sse(
         actual_series,
         pred_series,
         intersect,
+        q=q,
     )
     return errors / error_scale
 
@@ -1288,6 +1517,7 @@ def msse(
     m: int = 1,
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -1308,8 +1538,9 @@ def msse(
 
     .. math:: E_m = MSE(y_{m:t_p}, y_{0:t_p - m}).
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1327,6 +1558,8 @@ def msse(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -1380,6 +1613,7 @@ def msse(
             insample,
             m=m,
             intersect=intersect,
+            q=q,
         ),
         axis=TIME_AX,
     )
@@ -1392,6 +1626,7 @@ def rmse(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -1404,8 +1639,9 @@ def rmse(
 
     .. math:: \\sqrt{\\frac{1}{T}\\sum_{t=1}^T{(y_t - \\hat{y}_t)^2}}
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1416,6 +1652,8 @@ def rmse(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -1457,6 +1695,7 @@ def rmse(
             actual_series,
             pred_series,
             intersect,
+            q=q,
         )
     )
 
@@ -1470,6 +1709,7 @@ def rmsse(
     m: int = 1,
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -1490,8 +1730,9 @@ def rmsse(
 
     .. math:: E_m = RMSE(y_{m:t_p}, y_{0:t_p - m}).
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1509,6 +1750,8 @@ def rmsse(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -1560,6 +1803,7 @@ def rmsse(
         actual_series,
         pred_series,
         intersect,
+        q=q,
     )
     return errors / error_scale
 
@@ -1571,6 +1815,7 @@ def sle(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -1586,8 +1831,9 @@ def sle(
 
     using the natural logarithm.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1598,6 +1844,8 @@ def sle(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -1645,7 +1893,11 @@ def sle(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=False
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=False,
+        q=q,
     )
     y_true, y_pred = np.log(y_true + 1), np.log(y_pred + 1)
     return (y_true - y_pred) ** 2
@@ -1658,6 +1910,7 @@ def rmsle(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -1672,8 +1925,9 @@ def rmsle(
 
     using the natural logarithm.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1684,6 +1938,8 @@ def rmsle(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -1726,6 +1982,7 @@ def rmsle(
                 actual_series,
                 pred_series,
                 intersect,
+                q=q,
             ),
             axis=TIME_AX,
         )
@@ -1739,6 +1996,7 @@ def ape(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -1755,8 +2013,9 @@ def ape(
     Note that it will raise a `ValueError` if :math:`y_t = 0` for some :math:`t`. Consider using
     the Absolute Scaled Error (:func:`~darts.metrics.metrics.ase`) in these cases.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1767,6 +2026,8 @@ def ape(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -1819,7 +2080,11 @@ def ape(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=False
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=False,
+        q=q,
     )
     if not (y_true != 0).all():
         raise_log(
@@ -1838,6 +2103,7 @@ def mape(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -1853,8 +2119,9 @@ def mape(
     Note that it will raise a `ValueError` if :math:`y_t = 0` for some :math:`t`. Consider using
     the Mean Absolute Scaled Error (:func:`~darts.metrics.metrics.mase`) in these cases.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1865,6 +2132,8 @@ def mape(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -1912,6 +2181,7 @@ def mape(
             actual_series,
             pred_series,
             intersect,
+            q=q,
         ),
         axis=TIME_AX,
     )
@@ -1924,6 +2194,7 @@ def sape(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -1941,8 +2212,9 @@ def sape(
     Note that it will raise a `ValueError` if :math:`\\left| y_t \\right| + \\left| \\hat{y}_t \\right| = 0` for some
     :math:`t`. Consider using the Absolute Scaled Error (:func:`~darts.metrics.metrics.ase`)  in these cases.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -1953,6 +2225,8 @@ def sape(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -2005,7 +2279,11 @@ def sape(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=True
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=True,
+        q=q,
     )
     if not np.logical_or(y_true != 0, y_pred != 0).all():
         raise_log(
@@ -2024,6 +2302,7 @@ def smape(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -2042,8 +2321,9 @@ def smape(
     for some :math:`t`. Consider using the Mean Absolute Scaled Error (:func:`~darts.metrics.metrics.mase`) in these
     cases.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -2054,6 +2334,8 @@ def smape(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -2101,6 +2383,7 @@ def smape(
             actual_series,
             pred_series,
             intersect,
+            q=q,
         ),
         axis=TIME_AX,
     )
@@ -2113,6 +2396,7 @@ def ope(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -2126,8 +2410,9 @@ def ope(
     .. math:: 100 \\cdot \\left| \\frac{\\sum_{t=1}^{T}{y_t}
               - \\sum_{t=1}^{T}{\\hat{y}_t}}{\\sum_{t=1}^{T}{y_t}} \\right|.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -2138,6 +2423,8 @@ def ope(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -2181,7 +2468,11 @@ def ope(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=True
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=True,
+        q=q,
     )
     y_true_sum, y_pred_sum = (
         np.nansum(y_true, axis=TIME_AX),
@@ -2204,6 +2495,7 @@ def arre(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -2217,8 +2509,9 @@ def arre(
 
     .. math:: 100 \\cdot \\left| \\frac{y_t - \\hat{y}_t} {\\max_t{y_t} - \\min_t{y_t}} \\right|
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -2229,6 +2522,8 @@ def arre(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     time_reduction
         Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
@@ -2281,7 +2576,11 @@ def arre(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=True
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=True,
+        q=q,
     )
     y_max, y_min = np.nanmax(y_true, axis=TIME_AX), np.nanmin(y_true, axis=TIME_AX)
     if not (y_max > y_min).all():
@@ -2303,6 +2602,7 @@ def marre(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -2316,8 +2616,9 @@ def marre(
     .. math:: 100 \\cdot \\frac{1}{T} \\sum_{t=1}^{T} {\\left| \\frac{y_t - \\hat{y}_t} {\\max_t{y_t} -
               \\min_t{y_t}} \\right|}
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -2328,6 +2629,8 @@ def marre(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -2372,6 +2675,7 @@ def marre(
             actual_series,
             pred_series,
             intersect,
+            q=q,
         ),
         axis=TIME_AX,
     )
@@ -2384,6 +2688,7 @@ def r2_score(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -2400,8 +2705,9 @@ def r2_score(
 
     This metric is not symmetric.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -2412,6 +2718,8 @@ def r2_score(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -2453,7 +2761,11 @@ def r2_score(
     .. [1] https://en.wikipedia.org/wiki/Coefficient_of_determination
     """
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=True
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=True,
+        q=q,
     )
     ss_errors = np.nansum((y_true - y_pred) ** 2, axis=TIME_AX)
     y_hat = np.nanmean(y_true, axis=TIME_AX)
@@ -2468,6 +2780,7 @@ def coefficient_of_variation(
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
     intersect: bool = True,
     *,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -2483,8 +2796,9 @@ def coefficient_of_variation(
     where :math:`RMSE` is the Root Mean Squared Error (:func:`~darts.metrics.metrics.rmse`), and :math:`\\bar{y}` is
     the average of :math:`y` over all time steps.
 
-    If any of the series is stochastic (containing several samples), :math:`\\hat{y}_t` is the median over all samples
-    for time step :math:`t`.
+    If :math:`\\hat{y}_t` are stochastic (contains several samples) or quantile predictions, use parameter `q` to
+    specify on which quantile(s) to compute the metric on. By default, it uses the median 0.5 quantile
+    (over all samples, or, if given, the quantile prediction itself).
 
     Parameters
     ----------
@@ -2495,6 +2809,8 @@ def coefficient_of_variation(
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        Optionally, the quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -2533,7 +2849,11 @@ def coefficient_of_variation(
     """
 
     y_true, y_pred = _get_values_or_raise(
-        actual_series, pred_series, intersect, remove_nan_union=True
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=True,
+        q=q,
     )
     # not calling rmse as y_true and y_pred are np.ndarray
     return (
@@ -2629,9 +2949,9 @@ def dtw_metric(
 def qr(
     actual_series: Union[TimeSeries, Sequence[TimeSeries]],
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
-    q: float = 0.5,
     intersect: bool = True,
     *,
+    q: Union[float, List[float], Tuple[np.ndarray, pd.Index]] = 0.5,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -2660,11 +2980,11 @@ def qr(
         The (sequence of) actual series.
     pred_series
         The (sequence of) predicted series.
-    q
-        The quantile (float [0, 1]) of interest for the risk evaluation.
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        The quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -2713,18 +3033,19 @@ def qr(
         actual_series,
         pred_series,
         intersect,
-        stochastic_quantile=None,
+        q=None,
         remove_nan_union=True,
     )
     z_true = np.nansum(z_true, axis=TIME_AX)
     z_hat = np.nansum(
         z_hat, axis=TIME_AX
     )  # aggregate all individual sample realizations
+    # quantile loss
+    q, _ = q
     z_hat_rho = np.quantile(
         z_hat, q=q, axis=1
-    )  # get the quantile from aggregated samples
+    ).T  # get the quantile from aggregated samples
 
-    # quantile loss
     errors = z_true - z_hat_rho
     losses = 2 * np.maximum((q - 1) * errors, q * errors)
     return losses / z_true
@@ -2735,9 +3056,9 @@ def qr(
 def ql(
     actual_series: Union[TimeSeries, Sequence[TimeSeries]],
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
-    q: float = 0.5,
     intersect: bool = True,
     *,
+    q: Union[float, List[float], Tuple[np.ndarray, pd.Index]] = 0.5,
     time_reduction: Optional[Callable[..., np.ndarray]] = None,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
@@ -2747,7 +3068,8 @@ def ql(
     """Quantile Loss (QL).
 
     Also known as Pinball Loss. QL is a metric that quantifies the accuracy of a specific quantile :math:`q` from the
-    predicted value distribution of a stochastic/probabilistic `pred_series` containing N samples.
+    predicted deterministic quantiles or value distribution of a stochastic/probabilistic `pred_series` containing N
+    samples.
 
     QL computes the quantile of all sample values and the loss per time step.
 
@@ -2756,7 +3078,7 @@ def ql(
 
     .. math:: 2 \\max((q - 1) (y_t - \\hat{y}_{t,q}), q (y_t - \\hat{y}_{t,q})),
 
-    where :math:`\\hat{y}_{t,q}` is the quantile :math:`q` of all predicted sample values at time :math:`t`.
+    where :math:`\\hat{y}_{t,q}` is quantile value :math:`q` (of all predicted quantiles or samples) at time :math:`t`.
     The factor `2` makes the loss more interpretable, as for `q=0.5` the loss is identical to the Absolute Error
     (:func:`~darts.metrics.metrics.ae`).
 
@@ -2766,11 +3088,11 @@ def ql(
         The (sequence of) actual series.
     pred_series
         The (sequence of) predicted series.
-    q
-        The quantile (float [0, 1]) of interest for the loss.
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        The quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -2816,22 +3138,14 @@ def ql(
     List[np.ndarray]
         Same as for type `np.ndarray` but for a sequence of series.
     """
-    if not pred_series.is_stochastic:
-        raise_log(
-            ValueError(
-                "quantile/pinball loss (ql) should only be computed for "
-                "stochastic predicted TimeSeries."
-            ),
-            logger=logger,
-        )
-
     y_true, y_pred = _get_values_or_raise(
         actual_series,
         pred_series,
         intersect,
-        stochastic_quantile=q,
+        q=q,
         remove_nan_union=True,
     )
+    q, _ = q
     errors = y_true - y_pred
     losses = 2.0 * np.maximum((q - 1) * errors, q * errors)
     return losses
@@ -2842,9 +3156,9 @@ def ql(
 def mql(
     actual_series: Union[TimeSeries, Sequence[TimeSeries]],
     pred_series: Union[TimeSeries, Sequence[TimeSeries]],
-    q: float = 0.5,
     intersect: bool = True,
     *,
+    q: Union[float, List[float], Tuple[np.ndarray, pd.Index]] = 0.5,
     component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
     series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
     n_jobs: int = 1,
@@ -2853,7 +3167,8 @@ def mql(
     """Mean Quantile Loss (MQL).
 
     Also known as Pinball Loss. QL is a metric that quantifies the accuracy of a specific quantile :math:`q` from the
-    predicted value distribution of a stochastic/probabilistic `pred_series` containing N samples.
+    predicted deterministic quantiles or value distribution of a stochastic/probabilistic `pred_series` containing N
+    samples.
 
     MQL first computes the quantile of all sample values and the loss per time step, and then takes the mean over the
     time axis.
@@ -2863,7 +3178,7 @@ def mql(
 
     .. math:: 2 \\frac{1}{T}\\sum_{t=1}^T{\\max((q - 1) (y_t - \\hat{y}_{t,q}), q (y_t - \\hat{y}_{t,q}))},
 
-    where :math:`\\hat{y}_{t,q}` is the quantile :math:`q` of all predicted sample values at time :math:`t`.
+    where :math:`\\hat{y}_{t,q}` is quantile value :math:`q` (of all predicted quantiles or samples) at time :math:`t`.
     The factor `2` makes the loss more interpretable, as for `q=0.5` the loss is identical to the Mean Absolute Error
     (:func:`~darts.metrics.metrics.mae`).
 
@@ -2873,11 +3188,11 @@ def mql(
         The (sequence of) actual series.
     pred_series
         The (sequence of) predicted series.
-    q
-        The quantile (float [0, 1]) of interest for the loss.
     intersect
         For time series that are overlapping in time without having the same time index, setting `True`
         will consider the values only over their common time interval (intersection in time).
+    q
+        The quantile (float [0, 1]) or list of quantiles of interest to compute the metric on.
     component_reduction
         Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
         of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
@@ -2923,3 +3238,192 @@ def mql(
         ),
         axis=TIME_AX,
     )
+
+
+@interval_support
+@multi_ts_support
+@multivariate_support
+def iw(
+    actual_series: Union[TimeSeries, Sequence[TimeSeries]],
+    pred_series: Union[TimeSeries, Sequence[TimeSeries]],
+    intersect: bool = True,
+    *,
+    q_interval: Union[Tuple[float, float], Sequence[Tuple[float, float]]] = None,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
+    time_reduction: Optional[Callable[..., np.ndarray]] = None,
+    component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
+    series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
+    n_jobs: int = 1,
+    verbose: bool = False,
+) -> METRIC_OUTPUT_TYPE:
+    """Interval Width (IL).
+
+    IL gives the width of predicted quantile intervals.
+
+    For the true series :math:`y` and predicted stochastic or quantile series :math:`\\hat{y}` of length :math:`T`,
+    it is computed per component/column, quantile interval, and time step
+    :math:`t` as:
+
+    .. math:: \\hat{y}_{t,qh} - \\hat{y}_{t,ql}
+
+    where :math:`\\hat{y}_{t,qh}` are the upper bound quantile values (of all predicted quantiles or samples) at time
+    :math:`t`, and :math:`\\hat{y}_{t,ql}` are the lower bound quantile values.
+
+    Parameters
+    ----------
+    actual_series
+        The (sequence of) actual series.
+    pred_series
+        The (sequence of) predicted series.
+    intersect
+        For time series that are overlapping in time without having the same time index, setting `True`
+        will consider the values only over their common time interval (intersection in time).
+    q_interval
+        The quantile interval(s) to compute the metric on. Must be a tuple (single interval) or sequence tuples
+        (multiple intervals) with elements (low quantile, high quantile).
+    q
+        Quantiles `q` not supported by this metric; use `q_interval` instead.
+    component_reduction
+        Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
+        of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
+        parameter named `axis`, and returns the reduced array. The `axis` receives value `1` corresponding to the
+        component axis. If `None`, will return a metric per component.
+    time_reduction
+        Optionally, a function to aggregate the metrics over the time axis. It must reduce a `np.ndarray`
+        of shape `(t, c)` to a `np.ndarray` of shape `(c,)`. The function takes as input a ``np.ndarray`` and a
+        parameter named `axis`, and returns the reduced array. The `axis` receives value `0` corresponding to the
+        time axis. If `None`, will return a metric per time step.
+    series_reduction
+        Optionally, a function to aggregate the metrics over multiple series. It must reduce a `np.ndarray`
+        of shape `(s, t, c)` to a `np.ndarray` of shape `(t, c)` The function takes as input a ``np.ndarray`` and a
+        parameter named `axis`, and returns the reduced array. The `axis` receives value `0` corresponding to the
+        series axis. For example with `np.nanmean`, will return the average over all series metrics. If `None`, will
+        return a metric per component.
+    n_jobs
+        The number of jobs to run in parallel. Parallel jobs are created only when a ``Sequence[TimeSeries]`` is
+        passed as input, parallelising operations regarding different ``TimeSeries``. Defaults to `1`
+        (sequential). Setting the parameter to `-1` means using all the available processors.
+    verbose
+        Optionally, whether to print operations progress
+
+    Returns
+    -------
+    float
+        A single metric score for:
+
+        - single univariate series.
+        - single multivariate series with `component_reduction`.
+        - a sequence (list) of uni/multivariate series with `series_reduction`, `component_reduction` and
+          `time_reduction`.
+    np.ndarray
+        A numpy array of metric scores. The array has shape (n time steps, n components) without time
+        and component reductions. For:
+
+        - single multivariate series and at least `component_reduction=None`.
+        - single uni/multivariate series and at least `time_reduction=None`.
+        - a sequence of uni/multivariate series including `series_reduction` and at least one of
+          `component_reduction=None` or `time_reduction=None`.
+    List[float]
+        Same as for type `float` but for a sequence of series.
+    List[np.ndarray]
+        Same as for type `np.ndarray` but for a sequence of series.
+    """
+    y_true, y_pred = _get_values_or_raise(
+        actual_series,
+        pred_series,
+        intersect,
+        remove_nan_union=True,
+        q=q,
+    )
+    y_pred_lo, y_pred_hi = _get_quantile_intervals(y_pred, q=q, q_interval=q_interval)
+    return y_pred_hi - y_pred_lo
+
+
+@interval_support
+@multi_ts_support
+@multivariate_support
+def miw(
+    actual_series: Union[TimeSeries, Sequence[TimeSeries]],
+    pred_series: Union[TimeSeries, Sequence[TimeSeries]],
+    intersect: bool = True,
+    *,
+    q_interval: Union[Tuple[float, float], Sequence[Tuple[float, float]]] = None,
+    q: Optional[Union[float, List[float], Tuple[np.ndarray, pd.Index]]] = None,
+    component_reduction: Optional[Callable[[np.ndarray], float]] = np.nanmean,
+    series_reduction: Optional[Callable[[np.ndarray], Union[float, np.ndarray]]] = None,
+    n_jobs: int = 1,
+    verbose: bool = False,
+) -> METRIC_OUTPUT_TYPE:
+    """Mean Interval Width (IL).
+
+    IL gives the width of predicted quantile intervals aggregated over time.
+
+    For the true series :math:`y` and predicted stochastic or quantile series :math:`\\hat{y}` of length :math:`T`,
+    it is computed per component/column, quantile interval, and time step
+    :math:`t` as:
+
+    .. math:: \\frac{1}{T}\\sum_{t=1}^T{\\hat{y}_{t,qh} - \\hat{y}_{t,ql}}
+
+    where :math:`\\hat{y}_{t,qh}` are the upper bound quantile values (of all predicted quantiles or samples) at time
+    :math:`t`, and :math:`\\hat{y}_{t,ql}` are the lower bound quantile values.
+
+    Parameters
+    ----------
+    actual_series
+        The (sequence of) actual series.
+    pred_series
+        The (sequence of) predicted series.
+    intersect
+        For time series that are overlapping in time without having the same time index, setting `True`
+        will consider the values only over their common time interval (intersection in time).
+    q_interval
+        The quantile interval(s) to compute the metric on. Must be a tuple (single interval) or sequence tuples
+        (multiple intervals) with elements (low quantile, high quantile).
+    q
+        Quantiles `q` not supported by this metric; use `q_interval` instead.
+    component_reduction
+        Optionally, a function to aggregate the metrics over the component/column axis. It must reduce a `np.ndarray`
+        of shape `(t, c)` to a `np.ndarray` of shape `(t,)`. The function takes as input a ``np.ndarray`` and a
+        parameter named `axis`, and returns the reduced array. The `axis` receives value `1` corresponding to the
+        component axis. If `None`, will return a metric per component.
+    series_reduction
+        Optionally, a function to aggregate the metrics over multiple series. It must reduce a `np.ndarray`
+        of shape `(s, t, c)` to a `np.ndarray` of shape `(t, c)` The function takes as input a ``np.ndarray`` and a
+        parameter named `axis`, and returns the reduced array. The `axis` receives value `0` corresponding to the
+        series axis. For example with `np.nanmean`, will return the average over all series metrics. If `None`, will
+        return a metric per component.
+    n_jobs
+        The number of jobs to run in parallel. Parallel jobs are created only when a ``Sequence[TimeSeries]`` is
+        passed as input, parallelising operations regarding different ``TimeSeries``. Defaults to `1`
+        (sequential). Setting the parameter to `-1` means using all the available processors.
+    verbose
+        Optionally, whether to print operations progress
+
+    Returns
+    -------
+    float
+        A single metric score for:
+
+        - single univariate series.
+        - single multivariate series with `component_reduction`.
+        - sequence (list) of uni/multivariate series with `series_reduction` and `component_reduction`.
+    np.ndarray
+        A numpy array of metric scores. The array has shape (n components,) without component reduction. For:
+
+        - single multivariate series and at least `component_reduction=None`.
+        - sequence of uni/multivariate series including `series_reduction` and `component_reduction=None`.
+    List[float]
+        Same as for type `float` but for a sequence of series.
+    List[np.ndarray]
+        Same as for type `np.ndarray` but for a sequence of series.
+    """
+    return np.nanmean(
+        _get_wrapped_metric(iw, n_wrappers=3)(
+            actual_series,
+            pred_series,
+            intersect,
+            q=q,
+            q_interval=q_interval,
+        ),
+        axis=TIME_AX,
+    )
diff --git a/darts/models/forecasting/block_rnn_model.py b/darts/models/forecasting/block_rnn_model.py
index ebed1b4a44..d6f401c50f 100644
--- a/darts/models/forecasting/block_rnn_model.py
+++ b/darts/models/forecasting/block_rnn_model.py
@@ -30,6 +30,7 @@ def __init__(
         nr_params: int,
         num_layers_out_fc: Optional[List] = None,
         dropout: float = 0.0,
+        activation: str = "ReLU",
         **kwargs,
     ):
         """This class allows to create custom block RNN modules that can later be used with Darts'
@@ -63,6 +64,8 @@ def __init__(
             This network connects the last hidden layer of the PyTorch RNN module to the output.
         dropout
             The fraction of neurons that are dropped in all-but-last RNN layers.
+        activation
+            The name of the activation function to be applied between the layers of the fully connected network.
         **kwargs
             all parameters required for :class:`darts.models.forecasting.pl_forecasting_module.PLForecastingModule`
             base class.
@@ -77,6 +80,7 @@ def __init__(
         self.nr_params = nr_params
         self.num_layers_out_fc = [] if num_layers_out_fc is None else num_layers_out_fc
         self.dropout = dropout
+        self.activation = activation
         self.out_len = self.output_chunk_length
 
     @io_processor
@@ -105,6 +109,7 @@ class _BlockRNNModule(CustomBlockRNNModule):
     def __init__(
         self,
         name: str,
+        activation: Optional[str] = None,
         **kwargs,
     ):
         """PyTorch module implementing a block RNN to be used in `BlockRNNModel`.
@@ -116,6 +121,7 @@ def __init__(
 
         This module uses an RNN to encode the input sequence, and subsequently uses a fully connected
         network as the decoder which takes as input the last hidden state of the encoder RNN.
+        Optionally, a non-linear activation function can be applied between the layers of the fully connected network.
         The final output of the decoder is a sequence of length `output_chunk_length`. In this sense,
         the `_BlockRNNModule` produces 'blocks' of forecasts at a time (which is different
         from `_RNNModule` used by the `RNNModel`).
@@ -124,6 +130,9 @@ def __init__(
         ----------
         name
             The name of the specific PyTorch RNN module ("RNN", "GRU" or "LSTM").
+        activation
+            The name of the activation function to be applied between the layers of the fully connected network.
+            Options include "ReLU", "Sigmoid", "Tanh", or None for no activation. Default: None.
         **kwargs
             all parameters required for the :class:`darts.models.forecasting.CustomBlockRNNModule` base class.
 
@@ -155,10 +164,15 @@ def __init__(
         # to the output of desired length
         last = self.hidden_dim
         feats = []
-        for feature in self.num_layers_out_fc + [
-            self.out_len * self.target_size * self.nr_params
-        ]:
+        for index, feature in enumerate(
+            self.num_layers_out_fc + [self.out_len * self.target_size * self.nr_params]
+        ):
             feats.append(nn.Linear(last, feature))
+
+            # Add activation only between layers, but not on the final layer
+            if activation and index < len(self.num_layers_out_fc):
+                activation_function = getattr(nn, activation)()
+                feats.append(activation_function)
             last = feature
         self.fc = nn.Sequential(*feats)
 
@@ -195,6 +209,7 @@ def __init__(
         n_rnn_layers: int = 1,
         hidden_fc_sizes: Optional[List] = None,
         dropout: float = 0.0,
+        activation: str = "ReLU",
         **kwargs,
     ):
         """Block Recurrent Neural Network Model (RNNs).
@@ -243,6 +258,9 @@ def __init__(
             Sizes of hidden layers connecting the last hidden layer of the RNN module to the output, if any.
         dropout
             Fraction of neurons afected by Dropout.
+        activation
+            The name of a torch.nn activation function to be applied between the layers of the fully connected network.
+            Default: "ReLU".
         **kwargs
             Optional arguments to initialize the pytorch_lightning.Module, pytorch_lightning.Trainer, and
             Darts' :class:`TorchForecastingModel`.
@@ -435,6 +453,7 @@ def encode_year(idx):
         self.hidden_dim = hidden_dim
         self.n_rnn_layers = n_rnn_layers
         self.dropout = dropout
+        self.activation = activation
 
     @property
     def supports_multivariate(self) -> bool:
@@ -464,6 +483,15 @@ def _create_model(self, train_sample: Tuple[torch.Tensor]) -> torch.nn.Module:
             num_layers=self.n_rnn_layers,
             num_layers_out_fc=hidden_fc_sizes,
             dropout=self.dropout,
+            activation=self.activation,
             **self.pl_module_params,
             **kwargs,
         )
+
+    def _check_ckpt_parameters(self, tfm_save):
+        # new parameters were added that will break loading weights
+        new_params = ["activation"]
+        for param in new_params:
+            if param not in tfm_save.model_params:
+                tfm_save.model_params[param] = "ReLU"
+        super()._check_ckpt_parameters(tfm_save)
diff --git a/darts/models/forecasting/forecasting_model.py b/darts/models/forecasting/forecasting_model.py
index 5a5dc7a738..ee271ceed6 100644
--- a/darts/models/forecasting/forecasting_model.py
+++ b/darts/models/forecasting/forecasting_model.py
@@ -57,7 +57,12 @@
     get_single_series,
     series2seq,
 )
-from darts.utils.utils import generate_index
+from darts.utils.utils import (
+    generate_index,
+    likelihood_component_names,
+    quantile_interval_names,
+    quantile_names,
+)
 
 logger = get_logger(__name__)
 
@@ -754,6 +759,7 @@ def historical_forecasts(
             Default: ``False``
         enable_optimization
             Whether to use the optimized version of historical_forecasts when supported and available.
+            Default: ``True``.
         fit_kwargs
             Additional arguments passed to the model `fit()` method.
         predict_kwargs
@@ -1193,6 +1199,8 @@ def backtest(
         reduction: Union[Callable[..., float], None] = np.mean,
         verbose: bool = False,
         show_warnings: bool = True,
+        predict_likelihood_parameters: bool = False,
+        enable_optimization: bool = True,
         metric_kwargs: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None,
         fit_kwargs: Optional[Dict[str, Any]] = None,
         predict_kwargs: Optional[Dict[str, Any]] = None,
@@ -1312,6 +1320,13 @@ def backtest(
             Whether to print progress.
         show_warnings
             Whether to show warnings related to parameters `start`, and `train_length`.
+        predict_likelihood_parameters
+            If set to `True`, the model predict the parameters of its Likelihood parameters instead of the target. Only
+            supported for probabilistic models with `likelihood="quantile"`, `num_samples = 1` and
+            `n<=output_chunk_length`. Default: ``False``.
+        enable_optimization
+            Whether to use the optimized version of historical_forecasts when supported and available.
+            Default: ``True``.
         metric_kwargs
             Additional arguments passed to `metric()`, such as `'n_jobs'` for parallelization, `'component_reduction'`
             for reducing the component wise metrics, seasonality `'m'` for scaled metrics, etc. Will pass arguments to
@@ -1343,9 +1358,9 @@ def backtest(
             An numpy array of backtest scores. For single series and one of:
 
             - a single `metric` function, `historical_forecasts` generated with `last_points_only=False`
-              and backtest `reduction=None`. The output has shape (n forecasts,).
+              and backtest `reduction=None`. The output has shape (n forecasts, *).
             - multiple `metric` functions and `historical_forecasts` generated with `last_points_only=False`.
-              The output has shape (n metrics,) when using a backtest `reduction`, and (n metrics, n forecasts)
+              The output has shape (*, n metrics) when using a backtest `reduction`, and (n forecasts, *, n metrics)
               when `reduction=None`
             - multiple uni/multivariate series including `series_reduction` and at least one of
               `component_reduction=None` or `time_reduction=None` for "per time step metrics"
@@ -1391,6 +1406,8 @@ def backtest(
             last_points_only=last_points_only,
             verbose=verbose,
             show_warnings=show_warnings,
+            predict_likelihood_parameters=predict_likelihood_parameters,
+            enable_optimization=enable_optimization,
             fit_kwargs=fit_kwargs,
             predict_kwargs=predict_kwargs,
             sample_weight=sample_weight,
@@ -1491,22 +1508,24 @@ def __getitem__(self, index) -> TimeSeries:
         # get errors for each input `series`
         backtest_list = []
         for i in range(len(cum_len) - 1):
-            # errors_series with shape `(n metrics, n series specific historical forecasts)`
+            # errors_series with shape `(n metrics, n series specific historical forecasts, *)`
             errors_series = errors[:, cum_len[i] : cum_len[i + 1]]
 
             if reduction is not None:
-                # shape `(n metrics, n forecasts)` -> `(n metrics,)`
+                # shape `(n metrics, n forecasts, *)` -> `(n metrics, *)`
                 errors_series = reduction(errors_series, axis=1)
             elif last_points_only:
-                # shape `(n metrics, n forecasts = 1)` -> `(n metrics,)`
+                # shape `(n metrics, n forecasts = 1, *)` -> `(n metrics, *)`
                 errors_series = errors_series[:, 0]
 
             if len(metric) == 1:
                 # shape `(n metrics, *)` -> `(*,)`
                 errors_series = errors_series[0]
-            elif not last_points_only and reduction is None:
+            else:
                 # shape `(n metrics, *)` -> `(*, n metrics)`
-                errors_series = errors_series.T
+                errors_series = errors_series.transpose(
+                    tuple(i for i in range(1, errors_series.ndim)) + (0,)
+                )
 
             backtest_list.append(errors_series)
         return (
@@ -1831,6 +1850,8 @@ def residuals(
         metric: METRIC_TYPE = metrics.err,
         verbose: bool = False,
         show_warnings: bool = True,
+        predict_likelihood_parameters: bool = False,
+        enable_optimization: bool = True,
         metric_kwargs: Optional[Dict[str, Any]] = None,
         fit_kwargs: Optional[Dict[str, Any]] = None,
         predict_kwargs: Optional[Dict[str, Any]] = None,
@@ -1949,6 +1970,13 @@ def residuals(
             Whether to print progress.
         show_warnings
             Whether to show warnings related to parameters `start`, and `train_length`.
+        predict_likelihood_parameters
+            If set to `True`, the model predict the parameters of its Likelihood parameters instead of the target. Only
+            supported for probabilistic models with `likelihood="quantile"`, `num_samples = 1` and
+            `n<=output_chunk_length`. Default: ``False``.
+        enable_optimization
+            Whether to use the optimized version of historical_forecasts when supported and available.
+            Default: ``True``.
         metric_kwargs
             Additional arguments passed to `metric()`, such as `'n_jobs'` for parallelization, `'m'` for scaled
             metrics, etc. Will pass arguments only if they are present in the corresponding metric signature. Ignores
@@ -2004,6 +2032,8 @@ def residuals(
             last_points_only=last_points_only,
             verbose=verbose,
             show_warnings=show_warnings,
+            predict_likelihood_parameters=predict_likelihood_parameters,
+            enable_optimization=enable_optimization,
             fit_kwargs=fit_kwargs,
             predict_kwargs=predict_kwargs,
             overlap_end=False,
@@ -2035,9 +2065,10 @@ def residuals(
             residuals = [[res] for res in residuals]
 
         # sanity check residual output
+        q, q_interval = metric_kwargs.get("q"), metric_kwargs.get("q_interval")
         try:
-            res, fc = residuals[0][0], historical_forecasts[0][0]
-            _ = np.reshape(res, (len(fc), fc.n_components, 1))
+            series_, res, fc = series[0], residuals[0][0], historical_forecasts[0][0]
+            _ = np.reshape(res, (len(fc), -1, 1))
         except Exception as err:
             raise_log(
                 ValueError(
@@ -2051,13 +2082,47 @@ def residuals(
 
         # process residuals
         residuals_out = []
-        for fc_list, res_list in zip(historical_forecasts, residuals):
+        for series_, fc_list, res_list in zip(series, historical_forecasts, residuals):
             res_list_out = []
+            if q is not None:
+                q = [q] if isinstance(q, float) else q
+                # multi-quantile metrics yield more components
+                comp_names = likelihood_component_names(
+                    components=series_.components,
+                    parameter_names=quantile_names(q),
+                )
+            # `q` and `q_interval` are mutually exclusive
+            elif q_interval is not None:
+                # multi-quantile metrics yield more components
+                q_interval = (
+                    [q_interval] if isinstance(q_interval, tuple) else q_interval
+                )
+                comp_names = likelihood_component_names(
+                    components=series_.components,
+                    parameter_names=quantile_interval_names(q_interval),
+                )
+            else:
+                comp_names = None
             for fc, res in zip(fc_list, res_list):
-                # make sure all residuals have shape (n time steps, n components, n samples=1)
+                # make sure all residuals have shape (n time steps, n components * n quantiles, n samples=1)
                 if len(res.shape) != 3:
-                    res = np.reshape(res, (len(fc), fc.n_components, 1))
-                res_list_out.append(res if values_only else fc.with_values(res))
+                    res = np.reshape(res, (len(fc), -1, 1))
+                if values_only:
+                    res = res
+                elif (q is None and q_interval is None) and res.shape[
+                    1
+                ] == fc.n_components:
+                    res = fc.with_values(res)
+                else:
+                    # quantile (interval) metrics created different number of components;
+                    # create new series with unknown components
+                    res = TimeSeries.from_times_and_values(
+                        times=fc._time_index,
+                        values=res,
+                        columns=comp_names,
+                    )
+                res_list_out.append(res)
+
             residuals_out.append(res_list_out)
 
         # if required, reduce to `series` input type
diff --git a/darts/models/forecasting/regression_model.py b/darts/models/forecasting/regression_model.py
index d22a23329e..649a7b1432 100644
--- a/darts/models/forecasting/regression_model.py
+++ b/darts/models/forecasting/regression_model.py
@@ -56,7 +56,11 @@
 )
 from darts.utils.multioutput import MultiOutputRegressor
 from darts.utils.ts_utils import get_single_series, seq2series, series2seq
-from darts.utils.utils import _check_quantiles
+from darts.utils.utils import (
+    _check_quantiles,
+    likelihood_component_names,
+    quantile_names,
+)
 
 logger = get_logger(__name__)
 
@@ -1139,10 +1143,18 @@ def predict(
         # same for covariate matrices
         for cov_type, data in covariate_matrices.items():
             covariate_matrices[cov_type] = np.repeat(data, num_samples, axis=0)
+
+        # for concatenating target with predictions (or quantile parameters)
+        if predict_likelihood_parameters and self.likelihood is not None:
+            # with `multi_models=False`, the predictions are concatenated with the past target, even if `n<=ocl`
+            # to make things work, we just append the first predicted parameter (it will never be accessed)
+            sample_slice = slice(0, None, self.num_parameters)
+        else:
+            sample_slice = slice(None)
+
         # prediction
         predictions = []
         last_step_shift = 0
-
         # t_pred indicates the number of time steps after the first prediction
         for t_pred in range(0, n, step):
             # in case of autoregressive forecast `(t_pred > 0)` and if `n` is not a round multiple of `step`,
@@ -1153,7 +1165,9 @@ def predict(
 
             # concatenate previous iteration forecasts
             if "target" in self.lags and predictions:
-                series_matrix = np.concatenate([series_matrix, predictions[-1]], axis=1)
+                series_matrix = np.concatenate(
+                    [series_matrix, predictions[-1][:, :, sample_slice]], axis=1
+                )
 
             # extract and concatenate lags from target and covariates series
             X = _create_lagged_data_autoregression(
@@ -1668,17 +1682,15 @@ def _quantiles_generate_components_names(
     ) -> List[str]:
         return self._likelihood_generate_components_names(
             input_series,
-            [f"q{quantile:.2f}" for quantile in self._model_container.keys()],
+            quantile_names(q=self._model_container.keys()),
         )
 
     def _likelihood_generate_components_names(
         self, input_series: TimeSeries, parameter_names: List[str]
     ) -> List[str]:
-        return [
-            f"{tgt_name}_{param_n}"
-            for tgt_name in input_series.components
-            for param_n in parameter_names
-        ]
+        return likelihood_component_names(
+            components=input_series.components, parameter_names=parameter_names
+        )
 
 
 class _QuantileModelContainer(OrderedDict):
diff --git a/darts/tests/metrics/test_metrics.py b/darts/tests/metrics/test_metrics.py
index 40754c0bfe..df7a820bf5 100644
--- a/darts/tests/metrics/test_metrics.py
+++ b/darts/tests/metrics/test_metrics.py
@@ -7,8 +7,9 @@
 import pytest
 import sklearn.metrics
 
-from darts import TimeSeries
+from darts import TimeSeries, concatenate
 from darts.metrics import metrics
+from darts.utils.utils import likelihood_component_names, quantile_names
 
 
 def sklearn_mape(*args, **kwargs):
@@ -65,6 +66,19 @@ def metric_rmsle(y_true, y_pred, **kwargs):
     )
 
 
+def metric_iw(y_true, y_pred, q_interval=None, **kwargs):
+    # this tests assumes `y_pred` are stochastic values
+    if isinstance(q_interval, tuple):
+        q_interval = [q_interval]
+    q_interval = np.array(q_interval)
+    q_lo = q_interval[:, 0]
+    q_hi = q_interval[:, 1]
+    y_pred_lo = np.quantile(y_pred, q_lo, axis=2).transpose(1, 2, 0)
+    y_pred_hi = np.quantile(y_pred, q_hi, axis=2).transpose(1, 2, 0)
+    res = y_pred_hi - y_pred_lo
+    return res.reshape(len(y_pred), -1)
+
+
 class TestMetrics:
     np.random.seed(42)
     pd_train = pd.Series(
@@ -963,6 +977,15 @@ def test_arre(self, config):
         self.helper_test_nan(metric, **kwargs)
         self.helper_test_non_aggregate(metric, is_aggregate)
 
+        with pytest.raises(ValueError) as exc:
+            _ = metric(
+                TimeSeries.from_values(np.ones((3, 1, 1))),
+                TimeSeries.from_values(np.ones((3, 1, 1))),
+            )
+        assert str(exc.value).startswith(
+            "The difference between the max and min values must "
+        )
+
     @pytest.mark.parametrize(
         "metric",
         [
@@ -1034,10 +1057,10 @@ def test_sape(self, config):
         "config",
         [
             (metrics.ase, False, {"time_reduction": np.nanmean}),
-            # (metrics.sse, False, {"time_reduction": np.nanmean}),
-            # (metrics.mase, True, {}),
-            # (metrics.msse, True, {}),
-            # (metrics.rmsse, True, {}),
+            (metrics.sse, False, {"time_reduction": np.nanmean}),
+            (metrics.mase, True, {}),
+            (metrics.msse, True, {}),
+            (metrics.rmsse, True, {}),
         ],
     )
     def test_scaled_errors(self, config):
@@ -1110,6 +1133,9 @@ def test_scaled_errors(self, config):
         assert str(err.value).startswith(
             "The `insample` series must start before the `pred_series`"
         )
+        # wrong number of components
+        with pytest.raises(ValueError):
+            metric(self.series1, self.series2, insample.stack(insample))
         # multi-ts, second series is not a TimeSeries
         with pytest.raises(ValueError):
             metric([self.series1] * 2, self.series2, [insample] * 2)
@@ -1157,6 +1183,22 @@ def test_rho_risk(self):
         np.testing.assert_array_almost_equal(metrics.qr(s1, s12_stochastic, q=0.0), 0.0)
         np.testing.assert_array_almost_equal(metrics.qr(s2, s12_stochastic, q=1.0), 0.0)
 
+        # preds must be probabilistic
+        q_names = likelihood_component_names(
+            self.series1.components,
+            quantile_names([0.5]),
+        )
+        with pytest.raises(ValueError) as exc:
+            metrics.qr(
+                self.series1,
+                self.series1.with_columns_renamed(self.series1.components, q_names),
+                q=0.5,
+            )
+        assert (
+            str(exc.value)
+            == "quantile risk (qr) should only be computed for stochastic predicted TimeSeries."
+        )
+
     @pytest.mark.parametrize(
         "config",
         [
@@ -1241,7 +1283,7 @@ def test_metrics_arguments(self):
 
         # should fail if kwargs are passed as args, because of the "*"
         with pytest.raises(TypeError):
-            metrics.r2_score(series00, series11, False, np.mean)
+            metrics.r2_score(series00, series11, False, 0.5, np.mean)
 
     def test_multiple_ts_rmse(self):
         # simple test
@@ -1539,3 +1581,549 @@ def helper_test_non_aggregate(self, metric, is_aggregate, val_exp=None):
 
         if val_exp is not None:
             assert (res == -1.0).all()
+
+    @pytest.mark.parametrize(
+        "config",
+        list(
+            itertools.product(
+                [
+                    # time dependent but with time reduction
+                    metrics.err,
+                    metrics.ae,
+                    metrics.se,
+                    metrics.sle,
+                    metrics.ase,
+                    metrics.sse,
+                    metrics.ape,
+                    metrics.sape,
+                    metrics.arre,
+                    metrics.ql,
+                    # time aggregates
+                    metrics.merr,
+                    metrics.mae,
+                    metrics.mse,
+                    metrics.rmse,
+                    metrics.rmsle,
+                    metrics.mase,
+                    metrics.msse,
+                    metrics.rmsse,
+                    metrics.mape,
+                    metrics.smape,
+                    metrics.ope,
+                    metrics.marre,
+                    metrics.r2_score,
+                    metrics.coefficient_of_variation,
+                    metrics.mql,
+                ],
+                [True, False],  # univariate series
+                [True, False],  # single series
+            )
+        ),
+    )
+    def test_metric_quantiles(self, config):
+        """Test output types and shapes for time aggregated metrics with quantiles:
+        for single and multiple univariate or multivariate series, in combination
+        with different component and series reduction functions."""
+        np.random.seed(42)
+        metric, is_univar, is_single = config
+        params = inspect.signature(metric).parameters
+
+        n_comp = 1 if is_univar else 2
+
+        qs_all = [0.1, 0.5, 0.8]
+        components = [str(i) for i in range(n_comp)]
+
+        series_vals = np.random.random((10, n_comp, 1))
+
+        pred_prob_vals = np.random.random((10, n_comp, 100))
+
+        pred_vals_qs = []
+        for i in range(n_comp):
+            pred_vals_qs.append(
+                np.quantile(pred_prob_vals[:, [i]], qs_all, axis=2).transpose(1, 0, 2)
+            )
+        pred_vals_qs = np.concatenate(pred_vals_qs, axis=1)
+        pred_components = likelihood_component_names(
+            components=components, parameter_names=quantile_names(q=qs_all)
+        )
+
+        series = TimeSeries.from_values(series_vals, columns=components)
+        series_q_exp = concatenate(
+            [series[comp] for comp in components for _ in qs_all], axis=1
+        )
+        pred_prob = TimeSeries.from_values(pred_prob_vals, columns=components)
+        pred_qs = TimeSeries.from_values(pred_vals_qs, columns=pred_components)
+        insample = series.shift(-len(series))
+        insample_q_exp = concatenate(
+            [insample[comp] for comp in components for _ in qs_all], axis=1
+        )
+        shape_time = (len(pred_qs),) if "time_reduction" in params else tuple()
+
+        if not is_single:
+            series = [series] * 2
+            series_q_exp = [series_q_exp] * 2
+            pred_prob = [pred_prob] * 2
+            pred_qs = [pred_qs] * 2
+            insample = [insample] * 2
+            insample_q_exp = [insample_q_exp] * 2
+
+        kwargs = {"actual_series": series}
+        if "insample" in params:
+            kwargs["insample"] = insample
+
+        def check_res(
+            pred_prob_, pred_qs_, shape_exp, series_reduction=None, **test_kwargs
+        ):
+            res_prob = metric(
+                pred_series=pred_prob_,
+                series_reduction=series_reduction,
+                **kwargs,
+                **test_kwargs,
+            )
+            res_qs = metric(
+                pred_series=pred_qs_,
+                series_reduction=series_reduction,
+                **kwargs,
+                **test_kwargs,
+            )
+            if is_single or series_reduction is not None:
+                res_prob = [res_prob]
+                res_qs = [res_qs]
+            if series_reduction is None and not is_single:
+                assert len(res_prob) == len(res_qs) == len(pred_prob_)
+
+            for res_p, res_q in zip(res_prob, res_qs):
+                assert res_p.shape == res_q.shape == shape_exp
+                np.testing.assert_array_almost_equal(res_p, res_q)
+
+        check_res(pred_prob, pred_qs, shape_time, q=0.1)
+        # one quantile as list
+        check_res(pred_prob, pred_qs, shape_time, q=[0.1])
+        # multiple quantiles
+        check_res(pred_prob, pred_qs, shape_time + (2,), q=[0.1, 0.8])
+        # all quantiles
+        check_res(pred_prob, pred_qs, shape_time + (3,), q=[0.1, 0.5, 0.8])
+        qs = [0.1, 0.8]
+        # component and series reduction
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (len(qs),),
+            q=qs,
+            component_reduction=np.mean,
+            series_reduction=np.mean,
+        )
+        # no component reduction
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (len(qs) * n_comp,),
+            q=qs,
+            component_reduction=None,
+            series_reduction=np.mean,
+        )
+        # no series reduction
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (len(qs),),
+            q=qs,
+            component_reduction=np.mean,
+            series_reduction=None,
+        )
+        # no series and component reduction
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (len(qs) * n_comp,),
+            q=qs,
+            component_reduction=None,
+            series_reduction=None,
+        )
+
+        # check that we get identical results as when computing each quantile component against the actual
+        # target component directly
+        kwargs_direct = copy.deepcopy(kwargs)
+        q_direct = {}
+        if metric.__name__ not in ["ql", "mql"]:
+            kwargs_direct["actual_series"] = series_q_exp
+            if "insample" in params:
+                kwargs_direct["insample"] = insample_q_exp
+        else:
+            q_direct["q"] = qs_all
+            kwargs_direct["actual_series"] = series
+
+        res_direct = metric(
+            pred_series=pred_qs, component_reduction=None, **kwargs_direct, **q_direct
+        )
+        res_qs = metric(
+            pred_series=pred_qs,
+            component_reduction=None,
+            q=qs_all,
+            **kwargs,
+        )
+        np.testing.assert_array_almost_equal(res_direct, res_qs)
+
+    def test_invalid_quantiles(self):
+        np.random.seed(42)
+        series_a = TimeSeries.from_values(np.random.random((10, 2, 1)))
+        series_b = TimeSeries.from_values(np.random.random((10, 2, 10)))
+
+        # unsorted quantiles
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.mae(series_a, series_b, q=[0.2, 0.1])
+        assert "a sequence of increasing order" in str(exc.value)
+
+        # non-unique values metrics
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.mae(series_a, series_b, q=[0.2, 0.2])
+        assert "with unique values only" in str(exc.value)
+
+        # q > 1
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.mae(series_a, series_b, q=[0.2, 1.01])
+        assert "must be in the range `(>=0,<=1)`" in str(exc.value)
+
+        # q < 0
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.mae(series_a, series_b, q=[-0.01, 0.2])
+        assert "must be in the range `(>=0,<=1)`" in str(exc.value)
+
+        # but sorted, unique, and valid quantiles work
+        _ = metrics.mae(series_a, series_b, q=[0.0, 0.5, 1.0])
+
+    def test_quantile_as_tuple(self):
+        """Test that `q` as tuple (list of quantiles, quantile component names) gives same results as `q`
+        as quantile values list."""
+        np.random.seed(42)
+        q = [0.25, 0.75]
+
+        series_a = TimeSeries.from_values(np.random.random((10, 2, 1)))
+        q_names = pd.Index(
+            likelihood_component_names(series_a.components, quantile_names(q))
+        )
+        series_b = TimeSeries.from_values(np.random.random((10, 4, 1)), columns=q_names)
+
+        np.testing.assert_array_almost_equal(
+            metrics.mae(series_a, series_b, q=(q, q_names)),
+            metrics.mae(series_a, series_b, q=q),
+        )
+
+    def test_custom_metric_wrong_output_shape(self):
+        """Test that custom metrics must have correct output dim."""
+
+        @metrics.multi_ts_support
+        @metrics.multivariate_support
+        def custom_metric(
+            actual_series,
+            pred_series,
+            intersect=True,
+            *,
+            q=None,
+            time_reduction=None,
+            component_reduction=np.nanmean,
+            series_reduction=None,
+            n_jobs=1,
+            verbose=False,
+            out_ndim=1,
+        ):
+            return np.ones(tuple(1 for _ in range(out_ndim)))
+
+        for ndim in [1, 4]:
+            with pytest.raises(ValueError) as exc:
+                custom_metric(self.series1, self.series2, out_ndim=ndim)
+            assert str(exc.value).startswith(
+                "Metric output must have 2 dimensions (n components, n quantiles) for aggregated metrics"
+            )
+        for ndim in [2, 3]:
+            _ = custom_metric(self.series1, self.series2, out_ndim=ndim)
+
+    def test_wrong_error_scale(self):
+        with pytest.raises(ValueError) as exc:
+            _ = metrics._get_error_scale(
+                self.series1.shift(-len(self.series1)),
+                self.series1,
+                m=1,
+                metric="wrong_metric",
+            )
+        assert str(exc.value).startswith("unknown `metric=wrong_metric`")
+
+    @pytest.mark.parametrize(
+        "config",
+        [
+            # only time dependent quantile interval metrics
+            (metrics.iw, metric_iw),
+        ],
+    )
+    def test_metric_quantile_interval_accuracy(self, config):
+        """Test output types and shapes for time dependent metrics with quantile intervals:
+        for single and multiple univariate or multivariate series, in combination
+        with different component and series reduction functions."""
+        np.random.seed(42)
+        metric, metric_ref = config
+        n_comp = 2
+        components = [str(i) for i in range(n_comp)]
+        series_vals = np.random.random((10, n_comp, 1))
+        pred_prob_vals = np.random.random((10, n_comp, 100))
+        series = TimeSeries.from_values(series_vals, columns=components)
+        pred_prob = TimeSeries.from_values(pred_prob_vals, columns=components)
+
+        def check_ref(**test_kwargs):
+            res_prob = metric(
+                actual_series=series,
+                pred_series=pred_prob,
+                series_reduction=None,
+                component_reduction=None,
+                time_reduction=None,
+                **test_kwargs,
+            )
+            res_ref = metric_ref(
+                y_true=series.all_values(),
+                y_pred=pred_prob.all_values(),
+                **test_kwargs,
+            )
+            np.testing.assert_array_almost_equal(res_prob, res_ref)
+
+        # one interval as tuple
+        check_ref(q_interval=(0.1, 0.5))
+        # one interval in list
+        check_ref(q_interval=[(0.1, 0.5)])
+        # multiple intervals
+        check_ref(q_interval=[(0.1, 0.5), (0.5, 0.8)])
+
+    @pytest.mark.parametrize(
+        "config",
+        list(
+            itertools.product(
+                [
+                    # time dependent but with time reduction
+                    metrics.iw,
+                    metrics.miw,
+                ],
+                [True, False],  # univariate series
+                [True, False],  # single series
+            )
+        ),
+    )
+    def test_metric_quantile_interval(self, config):
+        """Test output types and shapes for time aggregated metrics with quantile intervals:
+        for single and multiple univariate or multivariate series, in combination
+        with different component and series reduction functions."""
+        np.random.seed(42)
+        metric, is_univar, is_single = config
+        params = inspect.signature(metric).parameters
+
+        n_comp = 1 if is_univar else 2
+
+        qs_all = [0.1, 0.5, 0.8]
+        components = [str(i) for i in range(n_comp)]
+
+        series_vals = np.random.random((10, n_comp, 1))
+        pred_prob_vals = np.random.random((10, n_comp, 100))
+
+        pred_vals_qs = []
+        for i in range(n_comp):
+            pred_vals_qs.append(
+                np.quantile(pred_prob_vals[:, [i]], qs_all, axis=2).transpose(1, 0, 2)
+            )
+        pred_vals_qs = np.concatenate(pred_vals_qs, axis=1)
+        pred_components = likelihood_component_names(
+            components=components, parameter_names=quantile_names(q=qs_all)
+        )
+
+        series = TimeSeries.from_values(series_vals, columns=components)
+        pred_prob = TimeSeries.from_values(pred_prob_vals, columns=components)
+        pred_qs = TimeSeries.from_values(pred_vals_qs, columns=pred_components)
+        shape_time = (len(pred_qs),) if "time_reduction" in params else tuple()
+
+        if not is_single:
+            series = [series] * 2
+            pred_prob = [pred_prob] * 2
+            pred_qs = [pred_qs] * 2
+
+        kwargs = {"actual_series": series}
+
+        def check_res(
+            pred_prob_, pred_qs_, shape_exp, series_reduction=None, **test_kwargs
+        ):
+            res_prob = metric(
+                actual_series=series,
+                pred_series=pred_prob_,
+                series_reduction=series_reduction,
+                **test_kwargs,
+            )
+            res_qs = metric(
+                actual_series=series,
+                pred_series=pred_qs_,
+                series_reduction=series_reduction,
+                **test_kwargs,
+            )
+            if is_single or series_reduction is not None:
+                res_prob = [res_prob]
+                res_qs = [res_qs]
+            if series_reduction is None and not is_single:
+                assert len(res_prob) == len(res_qs) == len(pred_prob_)
+
+            for res_p, res_q in zip(res_prob, res_qs):
+                assert res_p.shape == res_q.shape == shape_exp
+                np.testing.assert_array_almost_equal(res_p, res_q)
+            return res_qs
+
+        # one interval as tuple
+        res = check_res(pred_prob, pred_qs, shape_time, q_interval=(0.1, 0.5))
+        # one interval in list
+        res2 = check_res(pred_prob, pred_qs, shape_time, q_interval=[(0.1, 0.5)])
+        np.testing.assert_array_almost_equal(res, res2)
+        # multiple intervals
+        check_res(
+            pred_prob, pred_qs, shape_time + (2,), q_interval=[(0.1, 0.5), (0.5, 0.8)]
+        )
+        # all intervals
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (3,),
+            q_interval=[(0.1, 0.5), (0.5, 0.8), (0.1, 0.8)],
+        )
+        q_intervals = [(0.1, 0.5), (0.5, 0.8)]
+        # component and series reduction
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (len(q_intervals),),
+            q_interval=q_intervals,
+            component_reduction=np.mean,
+            series_reduction=np.mean,
+        )
+        # no component reduction
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (len(q_intervals) * n_comp,),
+            q_interval=q_intervals,
+            component_reduction=None,
+            series_reduction=np.mean,
+        )
+        # no series reduction
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (len(q_intervals),),
+            q_interval=q_intervals,
+            component_reduction=np.mean,
+            series_reduction=None,
+        )
+        # no series and component reduction
+        check_res(
+            pred_prob,
+            pred_qs,
+            shape_time + (len(q_intervals) * n_comp,),
+            q_interval=q_intervals,
+            component_reduction=None,
+            series_reduction=None,
+        )
+
+        # check that we get identical results as when computing intervals separately (on the time aggregated case)
+        if "time_reduction" in params:
+            kwargs["time_reduction"] = np.mean
+        res_lo = metric(
+            pred_series=pred_qs,
+            component_reduction=None,
+            q_interval=(0.1, 0.5),
+            **kwargs,
+        )
+        res_hi = metric(
+            pred_series=pred_qs,
+            component_reduction=None,
+            q_interval=(0.5, 0.8),
+            **kwargs,
+        )
+        res_multi = metric(
+            pred_series=pred_qs,
+            component_reduction=None,
+            q_interval=[(0.1, 0.5), (0.5, 0.8)],
+            **kwargs,
+        )
+        if is_single:
+            res_lo = [res_lo]
+            res_hi = [res_hi]
+            res_multi = [res_multi]
+        res_lo_hi = []
+        for res_lo_, res_hi_ in zip(res_lo, res_hi):
+            if res_lo_.ndim == 0:
+                res_lo_ = np.expand_dims(res_lo_, -1)
+                res_hi_ = np.expand_dims(res_hi_, -1)
+                res_lo_hi_ = np.concatenate([res_lo_, res_hi_])
+            else:
+                res_lo_hi_ = np.concatenate(
+                    [(res_lo_[i], res_hi_[i]) for i in range(n_comp)],
+                )
+            res_lo_hi.append(res_lo_hi_)
+        np.testing.assert_array_almost_equal(res_lo_hi, res_multi)
+
+    def test_invalid_quantile_intervals(self):
+        np.random.seed(42)
+        series_a = TimeSeries.from_values(np.random.random((10, 2, 1)))
+        series_b = TimeSeries.from_values(np.random.random((10, 2, 10)))
+
+        # q not supported
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.iw(series_a, series_b, q=[0.2])
+        assert str(exc.value).startswith(
+            "`q` is not supported for quantile interval metrics"
+        )
+
+        # no quantile interval
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.iw(series_a, series_b, q_interval=None)
+        assert str(exc.value).startswith(
+            "Quantile interval metrics require setting `q_interval`."
+        )
+
+        # invalid interval type
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.iw(series_a, series_b, q_interval=0.6)
+        assert (
+            str(exc.value)
+            == "`q_interval` must be a tuple (float, float) or a sequence of tuples (float, float)."
+        )
+
+        # invalid tuple length
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.iw(series_a, series_b, q_interval=(0.1, 0.2, 0.3))
+        assert (
+            str(exc.value)
+            == "`q_interval` must be a tuple (float, float) or a sequence of tuples (float, float)."
+        )
+
+        # one tuple has invalid length invalid tuple length (raises a numpy error)
+        with pytest.raises(ValueError):
+            _ = metrics.iw(series_a, series_b, q_interval=[(0.1, 0.2), (0.2, 0.3, 0.4)])
+
+        # interval upper bound too high
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.iw(series_a, series_b, q_interval=(0.1, 1.1))
+        assert str(exc.value).startswith(
+            "All `q` values must be in the range `(>=0,<=1)`."
+        )
+
+        # interval lower bound too low
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.iw(series_a, series_b, q_interval=(-0.01, 0.1))
+        assert str(exc.value).startswith(
+            "All `q` values must be in the range `(>=0,<=1)`."
+        )
+
+        # lower interval equal to higher interval
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.iw(series_a, series_b, q_interval=(0.2, 0.2))
+        assert str(exc.value).startswith(
+            "all intervals in `q_interval` must be tuples of (lower q, upper q)"
+        )
+
+        # lower interval higher than higher interval
+        with pytest.raises(ValueError) as exc:
+            _ = metrics.iw(series_a, series_b, q_interval=(0.3, 0.2))
+        assert str(exc.value).startswith(
+            "all intervals in `q_interval` must be tuples of (lower q, upper q)"
+        )
diff --git a/darts/tests/models/forecasting/test_backtesting.py b/darts/tests/models/forecasting/test_backtesting.py
index 0f6336043a..60f1b3e525 100644
--- a/darts/tests/models/forecasting/test_backtesting.py
+++ b/darts/tests/models/forecasting/test_backtesting.py
@@ -26,6 +26,7 @@
 from darts.utils.timeseries_generation import linear_timeseries as lt
 from darts.utils.timeseries_generation import random_walk_timeseries as rt
 from darts.utils.timeseries_generation import sine_timeseries as st
+from darts.utils.utils import generate_index
 
 logger = get_logger(__name__)
 
@@ -1252,6 +1253,179 @@ def time_reduced_metric(*args, **kwargs):
             for bt in bt_list:
                 np.testing.assert_array_almost_equal(bt, bt_expected)
 
+    @pytest.mark.parametrize(
+        "config",
+        itertools.product(
+            [
+                [metrics.mae],  # mae does not support time_reduction
+                [metrics.mae, metrics.ae],  # ae supports time_reduction
+                [metrics.miw],  # quantile interval metric
+                [metrics.miw, metrics.iw],
+            ],
+            [True, False],  # last_points_only
+        ),
+    )
+    def test_metric_quantiles_lpo(self, config):
+        """Tests backtest with quantile and quantile interval metrics from expected probabilistic or quantile
+        historical forecasts."""
+        metric, lpo = config
+        is_interval_metric = metric[0].__name__ == "miw"
+
+        q = [0.05, 0.5, 0.60, 0.95]
+        q_interval = [(0.05, 0.50), (0.50, 0.60), (0.60, 0.95), (0.05, 0.60)]
+
+        y = lt(length=20)
+        y = y.stack(y + 1.0)
+        hfc = TimeSeries.from_times_and_values(
+            times=generate_index(start=y.start_time() + 10 * y.freq, length=10),
+            values=np.random.random((10, 1, 100)),
+        )
+        hfc = hfc.stack(hfc + 1.0)
+        y = [y, y]
+        if lpo:
+            hfc = [hfc, hfc]
+        else:
+            hfc = [[hfc, hfc], [hfc]]
+
+        metric_kwargs = [{"component_reduction": np.median}]
+        if not is_interval_metric:
+            metric_kwargs[0]["q"] = q
+        else:
+            metric_kwargs[0]["q_interval"] = q_interval
+        if len(metric) > 1:
+            # give metric specific kwargs
+            metric_kwargs2 = {
+                "component_reduction": np.median,
+                "time_reduction": np.mean,
+            }
+            if not is_interval_metric:
+                metric_kwargs2["q"] = q
+            else:
+                metric_kwargs2["q_interval"] = q_interval
+            metric_kwargs.append(metric_kwargs2)
+
+        model = NaiveDrift()
+
+        bts = model.backtest(
+            series=y,
+            historical_forecasts=hfc,
+            metric=metric,
+            last_points_only=lpo,
+            reduction=None,
+            metric_kwargs=metric_kwargs,
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        if lpo:
+            bts = [[bt] for bt in bts]
+        # `ae` with time and component reduction is equal to `mae` with component reduction
+        hfc_single = hfc[0][0] if not lpo else hfc[0]
+        q_kwargs = {"q": q} if not is_interval_metric else {"q_interval": q_interval}
+        bt_expected = metric[0](
+            y[0], hfc_single, component_reduction=np.median, **q_kwargs
+        )
+        shape_expected = (len(q),)
+        if len(metric) > 1:
+            bt_expected = np.concatenate([bt_expected[:, None]] * 2, axis=1)
+            shape_expected += (len(metric),)
+        for bt_list in bts:
+            for bt in bt_list:
+                assert bt.shape == shape_expected
+                np.testing.assert_array_almost_equal(bt, bt_expected)
+
+        bts = model.backtest(
+            series=y,
+            historical_forecasts=hfc,
+            metric=metric,
+            last_points_only=lpo,
+            reduction=np.mean,
+            metric_kwargs=metric_kwargs,
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        for bt in bts:
+            assert bt.shape == shape_expected
+            np.testing.assert_array_almost_equal(bt, bt_expected)
+
+        def time_reduced_metric(*args, **kwargs):
+            metric_f = metrics.iw if is_interval_metric else metrics.ae
+            return metric_f(*args, **kwargs, time_reduction=np.mean)
+
+        # check that single kwargs can be used for all metrics if params are supported
+        metric = [metric[0], time_reduced_metric]
+        bts = model.backtest(
+            series=y,
+            historical_forecasts=hfc,
+            metric=metric,
+            last_points_only=lpo,
+            reduction=None,
+            metric_kwargs=metric_kwargs[0],
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        if lpo:
+            bts = [[bt] for bt in bts]
+        # `ae` / `miw` with time and component reduction is equal to `mae` / `miw` with component reduction
+        bt_expected = metric[0](
+            y[0], hfc_single, component_reduction=np.median, **q_kwargs
+        )
+        bt_expected = np.concatenate([bt_expected[:, None]] * 2, axis=1)
+        shape_expected = (len(q), len(metric))
+        for bt_list in bts:
+            for bt in bt_list:
+                assert bt.shape == shape_expected
+                np.testing.assert_array_almost_equal(bt, bt_expected)
+
+        bts = model.backtest(
+            series=y,
+            historical_forecasts=hfc,
+            metric=metric,
+            last_points_only=lpo,
+            reduction=np.mean,
+            metric_kwargs=metric_kwargs[0],
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        for bt in bts:
+            assert bt.shape == shape_expected
+            np.testing.assert_array_almost_equal(bt, bt_expected)
+
+        # without component reduction
+        metric_kwargs = {"component_reduction": None}
+        if not is_interval_metric:
+            metric_kwargs["q"] = q
+        else:
+            metric_kwargs["q_interval"] = q_interval
+        bts = model.backtest(
+            series=y,
+            historical_forecasts=hfc,
+            metric=metric,
+            last_points_only=lpo,
+            reduction=None,
+            metric_kwargs=metric_kwargs,
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        if lpo:
+            bts = [[bt] for bt in bts]
+
+        # `ae` / `iw` with time and no component reduction is equal to `mae` / `miw` without component reduction
+        bt_expected = metric[0](y[0], hfc_single, **metric_kwargs)
+        bt_expected = np.concatenate([bt_expected[:, None]] * 2, axis=1)
+        shape_expected = (len(q) * y[0].width, len(metric))
+        for bt_list in bts:
+            for bt in bt_list:
+                assert bt.shape == shape_expected
+                np.testing.assert_array_almost_equal(bt, bt_expected)
+
+        bts = model.backtest(
+            series=y,
+            historical_forecasts=hfc,
+            metric=metric,
+            last_points_only=lpo,
+            reduction=np.mean,
+            metric_kwargs=metric_kwargs,
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        for bt in bts:
+            assert bt.shape == shape_expected
+            np.testing.assert_array_almost_equal(bt, bt_expected)
+
     @pytest.mark.parametrize(
         "config",
         product([True, False], [True, False]),
diff --git a/darts/tests/models/forecasting/test_block_RNN.py b/darts/tests/models/forecasting/test_block_RNN.py
index 827f19a7e5..23a213394a 100644
--- a/darts/tests/models/forecasting/test_block_RNN.py
+++ b/darts/tests/models/forecasting/test_block_RNN.py
@@ -85,31 +85,46 @@ def test_creation(self):
         model1.fit(self.series)
         preds1 = model1.predict(n=3)
 
-        # can create from a custom class itself
+        # can create from valid module name with ReLU activation
         model2 = BlockRNNModel(
             input_chunk_length=1,
             output_chunk_length=1,
-            model=ModuleValid1,
+            model="RNN",
+            activation="ReLU",
+            hidden_fc_sizes=[10],
             n_epochs=1,
             random_state=42,
             **tfm_kwargs,
         )
         model2.fit(self.series)
         preds2 = model2.predict(n=3)
-        np.testing.assert_array_equal(preds1.all_values(), preds2.all_values())
+        assert preds1.values().shape == preds2.values().shape
 
+        # can create from a custom class itself
         model3 = BlockRNNModel(
             input_chunk_length=1,
             output_chunk_length=1,
-            model=ModuleValid2,
+            model=ModuleValid1,
             n_epochs=1,
             random_state=42,
             **tfm_kwargs,
         )
         model3.fit(self.series)
-        preds3 = model2.predict(n=3)
-        assert preds3.all_values().shape == preds2.all_values().shape
-        assert preds3.time_index.equals(preds2.time_index)
+        preds3 = model3.predict(n=3)
+        np.testing.assert_array_equal(preds1.all_values(), preds3.all_values())
+
+        model4 = BlockRNNModel(
+            input_chunk_length=1,
+            output_chunk_length=1,
+            model=ModuleValid2,
+            n_epochs=1,
+            random_state=42,
+            **tfm_kwargs,
+        )
+        model4.fit(self.series)
+        preds4 = model4.predict(n=3)
+        assert preds4.all_values().shape == preds3.all_values().shape
+        assert preds4.time_index.equals(preds3.time_index)
 
     def test_fit(self, tmpdir_module):
         # Test basic fit()
diff --git a/darts/tests/models/forecasting/test_historical_forecasts.py b/darts/tests/models/forecasting/test_historical_forecasts.py
index 5281261ad2..5a4d201cb3 100644
--- a/darts/tests/models/forecasting/test_historical_forecasts.py
+++ b/darts/tests/models/forecasting/test_historical_forecasts.py
@@ -2314,6 +2314,93 @@ def test_predict_likelihood_parameters(self, model_type):
             )
             assert len(hist_fc) == n + 1
 
+    @pytest.mark.parametrize(
+        "config",
+        product(
+            [False, True],  # last_points_only
+            [True, False],  # multi_models
+            [1, 2, 3],  # horizon
+        ),
+    )
+    def test_probabilistic_optimized_hist_fc_regression(self, config):
+        """Tests optimized probilistic historical forecasts for regression models."""
+        np.random.seed(42)
+        lpo, multi_models, n = config
+        ocl = 2
+        q = [0.05, 0.50, 0.95]
+
+        y = tg.linear_timeseries(length=20)
+        y = y.stack(y + 1.0)
+        y = [y, y]
+
+        icl = 3
+        model = LinearRegressionModel(
+            lags=icl,
+            output_chunk_length=ocl,
+            likelihood="quantile",
+            quantiles=q,
+            multi_models=multi_models,
+        )
+        model.fit(y)
+        # probabilistic forecasts non-optimized
+        hfcs_no_opt = model.historical_forecasts(
+            series=y,
+            forecast_horizon=n,
+            last_points_only=lpo,
+            retrain=False,
+            enable_optimization=False,
+            num_samples=1000,
+            stride=n,
+        )
+        # probabilistic forecasts optimized
+        hfcs_opt = model.historical_forecasts(
+            series=y,
+            forecast_horizon=n,
+            last_points_only=lpo,
+            retrain=False,
+            enable_optimization=True,
+            num_samples=1000,
+            stride=n,
+        )
+        if n <= ocl:
+            # quantile forecasts optimized
+            hfcs_opt_q = model.historical_forecasts(
+                series=y,
+                forecast_horizon=n,
+                last_points_only=lpo,
+                retrain=False,
+                enable_optimization=True,
+                predict_likelihood_parameters=True,
+                stride=n,
+            )
+            if lpo:
+                q_med = hfcs_opt_q[0].components[1::3].tolist()
+            else:
+                q_med = hfcs_opt_q[0][0].components[1::3].tolist()
+                hfcs_opt_q = (
+                    [concatenate(hfc) for hfc in hfcs_opt_q]
+                    if hfcs_opt_q is not None
+                    else hfcs_opt_q
+                )
+            hfcs_opt_q = (
+                [hfc[q_med] for hfc in hfcs_opt_q]
+                if hfcs_opt_q is not None
+                else hfcs_opt_q
+            )
+        else:
+            hfcs_opt_q = [None] * len(hfcs_opt)
+
+        if not lpo:
+            hfcs_opt = [concatenate(hfc) for hfc in hfcs_opt]
+            hfcs_no_opt = [concatenate(hfc) for hfc in hfcs_no_opt]
+
+        for hfc_opt, mean_opt_q, hfc_no_opt in zip(hfcs_opt, hfcs_opt_q, hfcs_no_opt):
+            mean_opt = hfc_opt.all_values().mean(axis=2)
+            mean_no_opt = hfc_no_opt.all_values().mean(axis=2)
+            assert np.abs(mean_opt - mean_no_opt).max() < 0.1
+            if mean_opt_q is not None:
+                assert np.abs(mean_opt - mean_opt_q.values()).max() < 0.1
+
     @pytest.mark.parametrize(
         "model_type,enable_optimization",
         product(["regression", "torch"], [True, False]),
diff --git a/darts/tests/models/forecasting/test_probabilistic_models.py b/darts/tests/models/forecasting/test_probabilistic_models.py
index eaae6c0c9f..d576dac8ee 100644
--- a/darts/tests/models/forecasting/test_probabilistic_models.py
+++ b/darts/tests/models/forecasting/test_probabilistic_models.py
@@ -94,6 +94,23 @@
     ),
 ]
 
+xgb_test_params = {
+    "n_estimators": 1,
+    "max_depth": 1,
+    "max_leaves": 1,
+}
+lgbm_test_params = {
+    "n_estimators": 1,
+    "max_depth": 1,
+    "num_leaves": 2,
+    "verbosity": -1,
+}
+cb_test_params = {
+    "iterations": 1,
+    "depth": 1,
+    "verbose": -1,
+}
+
 if TORCH_AVAILABLE:
     models_cls_kwargs_errs += [
         (
@@ -294,15 +311,17 @@ def helper_test_probabilistic_forecast_accuracy(self, model, err, ts, noisy_ts):
     @pytest.mark.parametrize(
         "config",
         itertools.product(
-            [(LinearRegressionModel, False), (XGBModel, False)]
-            + ([(LightGBMModel, False)] if lgbm_available else [])
-            + ([(CatBoostModel, True)] if cb_available else []),
-            [1, 3],
+            [(LinearRegressionModel, False, {}), (XGBModel, False, xgb_test_params)]
+            + ([(LightGBMModel, False, lgbm_test_params)] if lgbm_available else [])
+            + ([(CatBoostModel, True, cb_test_params)] if cb_available else []),
+            [1, 3],  # n components
             [
                 "quantile",
                 "poisson",
                 "gaussian",
-            ],
+            ],  # likelihood
+            [True, False],  # multi models
+            [1, 2],  # horizon
         ),
     )
     def test_predict_likelihood_parameters_regression_models(self, config):
@@ -312,7 +331,13 @@ def test_predict_likelihood_parameters_regression_models(self, config):
 
         Note: values are not tested as it would be too time consuming
         """
-        (model_cls, supports_gaussian), n_comp, likelihood = config
+        (
+            (model_cls, supports_gaussian, model_kwargs),
+            n_comp,
+            likelihood,
+            multi_models,
+            horizon,
+        ) = config
 
         seed = 142857
         n_times, n_samples = 100, 1
@@ -340,10 +365,17 @@ def test_predict_likelihood_parameters_regression_models(self, config):
         else:
             assert False, f"unknown likelihood {likelihood}"
 
-        model = model_cls(lags=3, random_state=seed, **lkl["kwargs"])
+        model = model_cls(
+            lags=3,
+            output_chunk_length=2,
+            random_state=seed,
+            **lkl["kwargs"],
+            multi_models=multi_models,
+            **model_kwargs,
+        )
         model.fit(lkl["ts"])
         pred_lkl_params = model.predict(
-            n=1, num_samples=1, predict_likelihood_parameters=True
+            n=horizon, num_samples=1, predict_likelihood_parameters=True
         )
         if n_comp == 1:
             assert lkl["expected"].shape == pred_lkl_params.values()[0].shape, (
@@ -352,7 +384,7 @@ def test_predict_likelihood_parameters_regression_models(self, config):
             )
         else:
             assert (
-                1,
+                horizon,
                 len(lkl["expected"]) * n_comp,
                 1,
             ) == pred_lkl_params.all_values().shape, (
diff --git a/darts/tests/models/forecasting/test_residuals.py b/darts/tests/models/forecasting/test_residuals.py
index 0e975240c4..40d15ebf0e 100644
--- a/darts/tests/models/forecasting/test_residuals.py
+++ b/darts/tests/models/forecasting/test_residuals.py
@@ -1,15 +1,23 @@
 import itertools
 
 import numpy as np
+import pandas as pd
 import pytest
 
 import darts.metrics as metrics
+from darts import TimeSeries, concatenate
 from darts.datasets import AirPassengersDataset
 from darts.logging import get_logger
 from darts.models import LinearRegressionModel, NaiveDrift, NaiveSeasonal
 from darts.tests.models.forecasting.test_regression_models import dummy_timeseries
 from darts.utils.timeseries_generation import constant_timeseries as ct
 from darts.utils.timeseries_generation import linear_timeseries as lt
+from darts.utils.utils import (
+    generate_index,
+    likelihood_component_names,
+    quantile_interval_names,
+    quantile_names,
+)
 
 logger = get_logger(__name__)
 
@@ -612,3 +620,260 @@ def test_sample_weight(self, config):
         for res_nw, res_w in zip(res_non_weighted, res_weighted):
             with pytest.raises(AssertionError):
                 np.testing.assert_array_almost_equal(res_w, res_nw)
+
+    @pytest.mark.parametrize(
+        "config",
+        itertools.product(
+            [metrics.ae, metrics.iw],  # quantile (interval) metrics
+            [True, False],  # last_points_only
+            [False, True],  # from stochastic predictions (or predicted quantiles)
+        ),
+    )
+    def test_residuals_with_quantiles_metrics(self, config):
+        """Tests residuals with quantile metrics from expected probabilistic or quantile historical forecasts."""
+        metric, lpo, stochastic_pred = config
+        is_interval_metric = metric.__name__ == "iw"
+
+        # multi-quantile metrics yield more components
+        q = [0.05, 0.50, 0.60, 0.95]
+        q_interval = [(0.05, 0.50), (0.50, 0.60), (0.60, 0.95), (0.05, 0.60)]
+
+        y = lt(length=20)
+        y = y.stack(y + 1.0)
+        if not is_interval_metric:
+            q_comp_names_expected = pd.Index(
+                likelihood_component_names(
+                    components=y.components,
+                    parameter_names=quantile_names(q),
+                )
+            )
+        else:
+            q_comp_names_expected = pd.Index(
+                likelihood_component_names(
+                    components=y.components,
+                    parameter_names=quantile_interval_names(q_interval),
+                )
+            )
+        # historical forecasts
+        vals = np.random.random((10, 1, 100))
+        if not stochastic_pred:
+            vals = np.quantile(vals, q, axis=2).transpose((1, 0, 2))
+            comp_names = pd.Index(
+                likelihood_component_names(
+                    components=y.components,
+                    parameter_names=quantile_names(q=q),
+                )
+            )
+        else:
+            comp_names = y.components
+        vals = np.concatenate([vals, vals + 1], axis=1)
+        hfc = TimeSeries.from_times_and_values(
+            times=generate_index(start=y.start_time() + 10 * y.freq, length=10),
+            values=vals,
+            columns=comp_names,
+        )
+
+        y = [y, y]
+        if lpo:
+            hfc = [hfc, hfc]
+        else:
+            hfc = [[hfc, hfc], [hfc]]
+
+        metric_kwargs = {"component_reduction": None}
+        if not is_interval_metric:
+            metric_kwargs["q"] = q
+        else:
+            metric_kwargs["q_interval"] = q_interval
+
+        model = NaiveDrift()
+
+        # return TimeSeries
+        bts = model.residuals(
+            series=y,
+            historical_forecasts=hfc,
+            metric=metric,
+            last_points_only=lpo,
+            metric_kwargs=metric_kwargs,
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        if lpo:
+            bts = [[bt] for bt in bts]
+
+        # `ae` with time and component reduction is equal to `mae` with component reduction
+        hfc_single = hfc[0][0] if not lpo else hfc[0]
+        bt_expected = metric(y[0], hfc_single, **metric_kwargs)
+        shape_expected = (len(hfc_single), len(q) * y[0].n_components)
+        for bt_list in bts:
+            for bt in bt_list:
+                assert bt.shape[:2] == shape_expected
+                assert bt.components.equals(q_comp_names_expected)
+                np.testing.assert_array_almost_equal(bt.values(), bt_expected)
+
+        # values only
+        bts = model.residuals(
+            series=y,
+            historical_forecasts=hfc,
+            metric=metric,
+            last_points_only=lpo,
+            metric_kwargs=metric_kwargs,
+            values_only=True,
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        if lpo:
+            bts = [[bt] for bt in bts]
+
+        # `ae` with time and component reduction is equal to `mae` with component reduction
+        for bt_list in bts:
+            for bt in bt_list:
+                assert bt.shape[:2] == shape_expected
+                np.testing.assert_array_almost_equal(bt[:, :, 0], bt_expected)
+
+    @pytest.mark.parametrize(
+        "config",
+        list(
+            itertools.product(
+                [metrics.ae, metrics.iw],  # quantile (interval) metrics
+                [True, False],  # last_points_only
+            )
+        ),
+    )
+    def test_quantiles_from_model(self, config):
+        """Tests residuals from quantile regression model works for both direct likelihood parameter prediction or
+        sampled prediction by giving the correct metrics kwargs."""
+        metric, lpo = config
+
+        is_interval_metric = metric.__name__ == "iw"
+
+        # multi-quantile metrics yield more components
+        q = [0.05, 0.50, 0.95]
+        q_interval = [(0.05, 0.50), (0.50, 0.95), (0.05, 0.95)]
+
+        y = lt(length=20)
+        y = y.stack(y + 1.0)
+        if not is_interval_metric:
+            q_comp_names_expected = pd.Index(
+                likelihood_component_names(
+                    components=y.components,
+                    parameter_names=quantile_names(q),
+                )
+            )
+        else:
+            q_comp_names_expected = pd.Index(
+                likelihood_component_names(
+                    components=y.components,
+                    parameter_names=quantile_interval_names(q_interval),
+                )
+            )
+        y = [y, y]
+        metric_kwargs = {"component_reduction": None}
+        if not is_interval_metric:
+            metric_kwargs["q"] = q
+        else:
+            metric_kwargs["q_interval"] = q_interval
+
+        icl = 3
+        model = LinearRegressionModel(
+            lags=icl, output_chunk_length=1, likelihood="quantile", quantiles=q
+        )
+        model.fit(y)
+
+        # quantile forecasts
+        bts = model.residuals(
+            series=y,
+            forecast_horizon=1,
+            metric=metric,
+            last_points_only=lpo,
+            metric_kwargs=metric_kwargs,
+            predict_likelihood_parameters=True,
+            retrain=False,
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        if not lpo:
+            bts = [concatenate(bt, axis=0) for bt in bts]
+
+        # `ae` with time and component reduction is equal to `mae` with component reduction
+        shape_expected = (len(y[0]) - icl, len(q) * y[0].n_components)
+        for bt in bts:
+            assert bt.shape[:2] == shape_expected
+            assert bt.components.equals(q_comp_names_expected)
+
+        # probabilistic forecasts
+        bts_prob = model.residuals(
+            series=y,
+            forecast_horizon=1,
+            metric=metric,
+            last_points_only=lpo,
+            metric_kwargs=metric_kwargs,
+            predict_likelihood_parameters=False,
+            num_samples=1000,
+            retrain=False,
+        )
+        assert isinstance(bts_prob, list) and len(bts_prob) == 2
+        if not lpo:
+            bts_prob = [concatenate(bt, axis=0) for bt in bts_prob]
+        for bt_p, bt_q in zip(bts_prob, bts):
+            assert bt_p.shape == bt_q.shape
+            assert bt_p.components.equals(bt_q.components)
+            # check that the results are similar
+            assert np.abs(bt_p.all_values() - bt_q.all_values()).max() < 0.1
+
+        # single quantile
+        q_single = 0.05
+        q_interval_single = (0.05, 0.50)
+        metric_kwargs = {"component_reduction": None}
+        if not is_interval_metric:
+            metric_kwargs["q"] = q_single
+        else:
+            metric_kwargs["q_interval"] = q_interval_single
+        bts = model.residuals(
+            series=y,
+            forecast_horizon=1,
+            metric=metric,
+            last_points_only=lpo,
+            metric_kwargs=metric_kwargs,
+            predict_likelihood_parameters=True,
+            retrain=False,
+        )
+        assert isinstance(bts, list) and len(bts) == 2
+        if not lpo:
+            bts = [concatenate(bt, axis=0) for bt in bts]
+
+        # `ae` with time and component reduction is equal to `mae` with component reduction
+        shape_expected = (len(y[0]) - icl, y[0].n_components)
+        for bt in bts:
+            assert bt.shape[:2] == shape_expected
+            assert bt.components.equals(
+                pd.Index(
+                    likelihood_component_names(
+                        y[0].components,
+                        parameter_names=(
+                            quantile_names([q_single])
+                            if not is_interval_metric
+                            else quantile_interval_names([q_interval_single])
+                        ),
+                    )
+                )
+            )
+
+        # wrong quantile
+        q_wrong = [0.99]
+        q_interval_wrong = (0.05, 0.99)
+        metric_kwargs = {"component_reduction": None}
+        if not is_interval_metric:
+            metric_kwargs["q"] = q_wrong
+        else:
+            metric_kwargs["q_interval"] = q_interval_wrong
+        with pytest.raises(ValueError) as exc:
+            _ = model.residuals(
+                series=y,
+                forecast_horizon=1,
+                metric=metric,
+                last_points_only=lpo,
+                metric_kwargs=metric_kwargs,
+                predict_likelihood_parameters=True,
+                retrain=False,
+            )
+        assert str(exc.value).startswith(
+            f"Computing a metric with quantile(s) "
+            f"`q={'[0.99]' if not is_interval_metric else '[0.05 0.99]'}` is only supported"
+        )
diff --git a/darts/tests/test_timeseries.py b/darts/tests/test_timeseries.py
index bd5e1b1562..0fc6f577fb 100644
--- a/darts/tests/test_timeseries.py
+++ b/darts/tests/test_timeseries.py
@@ -11,7 +11,7 @@
 
 from darts import TimeSeries, concatenate
 from darts.utils.timeseries_generation import constant_timeseries, linear_timeseries
-from darts.utils.utils import freqs, generate_index
+from darts.utils.utils import expand_arr, freqs, generate_index
 
 
 class TestTimeSeries:
@@ -762,6 +762,9 @@ def helper_test_prepend_values(test_case, test_series: TimeSeries):
         assert test_series.time_index.equals(prepended_sq.time_index)
         assert prepended_sq.components.equals(test_series.components)
 
+        # component and sample dimension should match
+        assert prepended._xa.shape[1:] == test_series._xa.shape[1:]
+
     def test_slice(self):
         TestTimeSeries.helper_test_slice(self, self.series1)
 
@@ -797,18 +800,112 @@ def test_append(self):
         assert appended.time_index.equals(expected_idx)
         assert appended.components.equals(series_1.components)
 
-    def test_append_values(self):
-        TestTimeSeries.helper_test_append_values(self, self.series1)
-        # Check `append_values` deals with `RangeIndex` series correctly:
-        series = linear_timeseries(start=1, length=5, freq=2)
-        appended = series.append_values(np.ones((2, 1, 1)))
-        expected_vals = np.concatenate(
-            [series.all_values(), np.ones((2, 1, 1))], axis=0
+    @pytest.mark.parametrize(
+        "config",
+        itertools.product(
+            [
+                (  # univariate array
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    np.array([0, 1]).reshape((2, 1, 1)),
+                ),
+                (  # multivariate array
+                    np.array([0, 1, 2, 3, 4, 5]).reshape((3, 2, 1)),
+                    np.array([0, 1, 2, 3]).reshape((2, 2, 1)),
+                ),
+                (  # empty array
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    np.array([]).reshape((0, 1, 1)),
+                ),
+                (
+                    # wrong number of components
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    np.array([0, 1, 2, 3]).reshape((2, 2, 1)),
+                ),
+                (
+                    # wrong number of samples
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    np.array([0, 1, 2, 3]).reshape((2, 1, 2)),
+                ),
+                (  # univariate list with times
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    [0, 1],
+                ),
+                (  # univariate list with times and components
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    [[0], [1]],
+                ),
+                (  # univariate list with times, components and samples
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    [[[0]], [[1]]],
+                ),
+                (  # multivar with list has wrong shape
+                    np.array([0, 1, 2, 3]).reshape((2, 2, 1)),
+                    [[1, 2], [3, 4]],
+                ),
+                (  # list with wrong numer of components
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    [[1, 2], [3, 4]],
+                ),
+                (  # list with wrong numer of samples
+                    np.array([0, 1, 2]).reshape((3, 1, 1)),
+                    [[[0, 1]], [[1, 2]]],
+                ),
+                (  # multivar input but list has wrong shape
+                    np.array([0, 1, 2, 3]).reshape((2, 2, 1)),
+                    [1, 2],
+                ),
+            ],
+            [True, False],
+            ["append_values", "prepend_values"],
+        ),
+    )
+    def test_append_and_prepend_values(self, config):
+        (series_vals, vals), is_datetime, method = config
+        start = "20240101" if is_datetime else 1
+        series_idx = generate_index(
+            start=start, length=len(series_vals), name="some_name"
         )
-        expected_idx = pd.RangeIndex(start=1, stop=15, step=2)
+        series = TimeSeries.from_times_and_values(
+            times=series_idx,
+            values=series_vals,
+        )
+
+        # expand if it's a list
+        vals_arr = np.array(vals) if isinstance(vals, list) else vals
+        vals_arr = expand_arr(vals_arr, ndim=3)
+
+        ts_method = getattr(TimeSeries, method)
+
+        if vals_arr.shape[1:] != series_vals.shape[1:]:
+            with pytest.raises(ValueError) as exc:
+                _ = ts_method(series, vals)
+            assert str(exc.value).startswith(
+                "The (expanded) values must have the same number of components and samples"
+            )
+            return
+
+        appended = ts_method(series, vals)
+
+        if method == "append_values":
+            expected_vals = np.concatenate([series_vals, vals_arr], axis=0)
+            expected_idx = generate_index(
+                start=series.start_time(),
+                length=len(series_vals) + len(vals),
+                freq=series.freq,
+            )
+        else:
+            expected_vals = np.concatenate([vals_arr, series_vals], axis=0)
+            expected_idx = generate_index(
+                end=series.end_time(),
+                length=len(series_vals) + len(vals),
+                freq=series.freq,
+            )
+
         assert np.allclose(appended.all_values(), expected_vals)
         assert appended.time_index.equals(expected_idx)
         assert appended.components.equals(series.components)
+        assert appended._xa.shape[1:] == series._xa.shape[1:]
+        assert appended.time_index.name == series.time_index.name
 
     def test_prepend(self):
         TestTimeSeries.helper_test_prepend(self, self.series1)
@@ -824,19 +921,6 @@ def test_prepend(self):
         assert prepended.time_index.equals(expected_idx)
         assert prepended.components.equals(series_1.components)
 
-    def test_prepend_values(self):
-        TestTimeSeries.helper_test_prepend_values(self, self.series1)
-        # Check `prepend_values` deals with `RangeIndex` series correctly:
-        series = linear_timeseries(start=1, length=5, freq=2)
-        prepended = series.prepend_values(np.ones((2, 1, 1)))
-        expected_vals = np.concatenate(
-            [np.ones((2, 1, 1)), series.all_values()], axis=0
-        )
-        expected_idx = pd.RangeIndex(start=-3, stop=11, step=2)
-        assert np.allclose(prepended.all_values(), expected_vals)
-        assert prepended.time_index.equals(expected_idx)
-        assert prepended.components.equals(series.components)
-
     @pytest.mark.parametrize(
         "config",
         [
diff --git a/darts/tests/test_timeseries_static_covariates.py b/darts/tests/test_timeseries_static_covariates.py
index 895ef5b686..07590eb82f 100644
--- a/darts/tests/test_timeseries_static_covariates.py
+++ b/darts/tests/test_timeseries_static_covariates.py
@@ -104,33 +104,22 @@ def test_ts_from_x(self, tmpdir_module):
             ts, TimeSeries.from_json(ts_json, static_covariates=ts.static_covariates)
         )
 
-    def test_from_group_dataframe(self):
-        # checks that the time_index is of RangeIndex type when the time_col is a(n) (unsorted) list and/or a Rangeindex
+    @pytest.mark.parametrize("index_type", ["int", "dt", "str"])
+    def test_from_group_dataframe(self, index_type):
+        """Tests correct extract of TimeSeries groups from a long DataFrame with unsorted (time/integer) index"""
         group = ["a", "a", "a", "b", "b", "b"]
         values = np.arange(len(group))
 
-        # for time as a unsorted list
-        time = [2, 1, 0, 0, 1, 2]
-        df = pd.DataFrame({
-            "group": group,
-            "time": time,
-            "x": values,
-        })
-        ts = TimeSeries.from_group_dataframe(df, group_cols="group", time_col="time")
-
-        # check the time index
-        assert ts[0].time_index.equals(pd.RangeIndex(3))
-        assert ts[1].time_index.equals(pd.RangeIndex(3))
+        if index_type == "int":
+            index_expected = pd.RangeIndex(3)
+            time = [2, 1, 0, 0, 1, 2]
+        else:
+            index_expected = pd.date_range("2024-01-01", periods=3)
+            time = index_expected[::-1].append(index_expected)
+            if index_type == "str":
+                time = time.astype(str)
 
-        # check the values
-        assert (ts[0].values().flatten() == [values[2], values[1], values[0]]).all()
-        assert (ts[1].values().flatten() == [values[3], values[4], values[5]]).all()
-
-        # for time as Timestamps
-        time = pd.DatetimeIndex(
-            [pd.Timestamp("20240103") - pd.Timedelta(i, "d") for i in range(3)]
-            + [pd.Timestamp("20240101") + pd.Timedelta(i, "d") for i in range(3)]
-        )
+        # create a df with unsorted time
         df = pd.DataFrame({
             "group": group,
             "time": time,
@@ -139,16 +128,8 @@ def test_from_group_dataframe(self):
         ts = TimeSeries.from_group_dataframe(df, group_cols="group", time_col="time")
 
         # check the time index
-        assert ts[0].time_index.equals(
-            pd.DatetimeIndex([
-                pd.Timestamp("20240101") + pd.Timedelta(i, "d") for i in range(3)
-            ])
-        )
-        assert ts[1].time_index.equals(
-            pd.DatetimeIndex([
-                pd.Timestamp("20240101") + pd.Timedelta(i, "d") for i in range(3)
-            ])
-        )
+        assert ts[0].time_index.equals(index_expected)
+        assert ts[1].time_index.equals(index_expected)
 
         # check the values
         assert (ts[0].values().flatten() == [values[2], values[1], values[0]]).all()
diff --git a/darts/tests/utils/test_utils.py b/darts/tests/utils/test_utils.py
index 809bf84bf5..d629851cea 100644
--- a/darts/tests/utils/test_utils.py
+++ b/darts/tests/utils/test_utils.py
@@ -7,7 +7,15 @@
 from darts.utils import _with_sanity_checks
 from darts.utils.missing_values import extract_subseries
 from darts.utils.ts_utils import retain_period_common_to_all
-from darts.utils.utils import freqs, generate_index, n_steps_between
+from darts.utils.utils import (
+    expand_arr,
+    freqs,
+    generate_index,
+    likelihood_component_names,
+    n_steps_between,
+    quantile_interval_names,
+    quantile_names,
+)
 
 
 class TestUtils:
@@ -418,6 +426,25 @@ def test_generate_index_with_end_length(self, config):
         assert idx[0] == expected_start
         assert idx[-1] == expected_start + (n_steps - 1) * freq
 
+    @pytest.mark.parametrize(
+        "config",
+        [
+            ("2000-01-01", None),
+            (None, "2000-01-03"),
+            ("2000-01-01", "2000-01-03"),
+        ],
+    )
+    def test_generate_index_with_string(self, config):
+        """Test that index generation with strings as start or end gives same results as with pandas TimeStamps."""
+        start, end = config
+        length = 3 if (start is None or end is None) else None
+        idx = generate_index(start=start, end=end, length=length)
+
+        start_ts = pd.Timestamp(start) if start is not None else start
+        end_ts = pd.Timestamp(end) if end is not None else end
+        idx_expected = generate_index(start=start_ts, end=end_ts, length=length)
+        assert idx.equals(idx_expected)
+
     @pytest.mark.parametrize(
         "config",
         [
@@ -539,3 +566,68 @@ def test_n_steps_between(self, config):
         assert n_steps == expected_n_steps
         n_steps_reversed = n_steps_between(end=start, start=end, freq=freq)
         assert n_steps_reversed == -expected_n_steps
+
+    @pytest.mark.parametrize(
+        "config",
+        [
+            (np.array([0, 1, 2]), (3, 1, 1)),
+            (np.array([[0], [1], [2]]), (3, 1, 1)),
+            (np.array([[[0]], [[1]], [[2]]]), (3, 1, 1)),
+            (np.array([[0, 1], [2, 3], [3, 4]]), (3, 2, 1)),
+            (np.array([[[0], [1]], [[1], [2]], [[3], [4]]]), (3, 2, 1)),
+            (
+                np.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]], [[8, 9], [10, 11]]]),
+                (3, 2, 2),
+            ),
+        ],
+    )
+    def test_expand_arr(self, config):
+        """tests array expansion to 3D."""
+        arr, shape_expected = config
+
+        if len(arr.shape) == 1:
+            arr_expected = arr[:, None, None]
+        elif len(arr.shape) == 2:
+            arr_expected = arr[:, :, None]
+        else:
+            arr_expected = arr
+
+        arr = expand_arr(arr, ndim=3)
+        assert arr.shape == shape_expected
+        np.testing.assert_array_almost_equal(arr, arr_expected)
+
+    def test_likelihood_component_names(self):
+        names = likelihood_component_names(["a", "b"], ["1", "2", "3"])
+        assert names == ["a_1", "a_2", "a_3", "b_1", "b_2", "b_3"]
+
+        assert (
+            likelihood_component_names(pd.Index(["a", "b"]), ["1", "2", "3"]) == names
+        )
+
+    @pytest.mark.parametrize(
+        "config",
+        [
+            (0.25, "a_q0.25"),
+            (0.2501, "a_q0.25"),
+            ([0.25], ["a_q0.25"]),
+            ([0.25, 0.75], ["a_q0.25", "a_q0.75"]),
+        ],
+    )
+    def test_quantile_names(self, config):
+        q, names_expected = config
+        names = quantile_names(q, "a")
+        assert names == names_expected
+
+    @pytest.mark.parametrize(
+        "config",
+        [
+            ((0.25, 0.5), "a_q0.25_q0.50"),
+            ((0.2501, 0.4999), "a_q0.25_q0.50"),
+            ([(0.25, 0.5)], ["a_q0.25_q0.50"]),
+            ([(0.25, 0.50), (0.6, 0.75)], ["a_q0.25_q0.50", "a_q0.60_q0.75"]),
+        ],
+    )
+    def test_quantile_interval_names(self, config):
+        q, names_expected = config
+        names = quantile_interval_names(q, "a")
+        assert names == names_expected
diff --git a/darts/timeseries.py b/darts/timeseries.py
index 5f7878eb56..f6fe161464 100644
--- a/darts/timeseries.py
+++ b/darts/timeseries.py
@@ -53,7 +53,11 @@
 
 from darts.logging import get_logger, raise_if, raise_if_not, raise_log
 from darts.utils import _build_tqdm_iterator, _parallel_apply
-from darts.utils.utils import expand_arr, generate_index, n_steps_between
+from darts.utils.utils import (
+    expand_arr,
+    generate_index,
+    n_steps_between,
+)
 
 try:
     from typing import Literal
@@ -70,6 +74,7 @@
 # dimension names in the DataArray
 # the "time" one can be different, if it has a name in the underlying Series/DataFrame.
 DIMS = ("time", "component", "sample")
+AXES = {"time": 0, "component": 1, "sample": 2}
 
 VALID_INDEX_TYPES = (pd.DatetimeIndex, pd.RangeIndex)
 STATIC_COV_TAG = "static_covariates"
@@ -866,7 +871,13 @@ def from_group_dataframe(
         df = df[static_cov_cols + extract_value_cols + extract_time_col]
 
         if time_col:
-            df = df.set_index(df[time_col])
+            if np.issubdtype(df[time_col].dtype, object) or np.issubdtype(
+                df[time_col].dtype, np.datetime64
+            ):
+                df.index = pd.DatetimeIndex(df[time_col])
+                df = df.drop(columns=time_col)
+            else:
+                df = df.set_index(time_col)
 
         if df.index.is_monotonic_increasing:
             logger.warning(
@@ -876,7 +887,7 @@ def from_group_dataframe(
             )
 
         # sort on entire `df` to avoid having to sort individually later on
-        if not df.index.is_monotonic_increasing:
+        else:
             df = df.sort_index()
 
         groups = df.groupby(group_cols[0] if len(group_cols) == 1 else group_cols)
@@ -1351,15 +1362,20 @@ def bottom_level_series(self) -> Optional[List[Self]]:
             else None
         )
 
+    @property
+    def shape(self) -> Tuple[int]:
+        """The shape of the series (n_timesteps, n_components, n_samples)."""
+        return self._xa.shape
+
     @property
     def n_samples(self) -> int:
         """Number of samples contained in the series."""
-        return len(self._xa.sample)
+        return self.shape[AXES["sample"]]
 
     @property
     def n_components(self) -> int:
         """Number of components (dimensions) contained in the series."""
-        return len(self._xa.component)
+        return self.shape[AXES["component"]]
 
     @property
     def width(self) -> int:
@@ -1369,12 +1385,12 @@ def width(self) -> int:
     @property
     def n_timesteps(self) -> int:
         """Number of time steps in the series."""
-        return len(self._time_index)
+        return self.shape[AXES["time"]]
 
     @property
     def is_deterministic(self) -> bool:
         """Whether this series is deterministic."""
-        return self.n_samples == 1
+        return self.shape[AXES["sample"]] == 1
 
     @property
     def is_stochastic(self) -> bool:
@@ -1389,7 +1405,7 @@ def is_probabilistic(self) -> bool:
     @property
     def is_univariate(self) -> bool:
         """Whether this series is univariate."""
-        return self.n_components == 1
+        return self.shape[AXES["component"]] == 1
 
     @property
     def freq(self) -> Union[pd.DateOffset, int]:
@@ -2849,10 +2865,10 @@ def append(self, other: Self) -> Self:
             "Both series must have the same number of components.",
             logger,
         )
-        if self._has_datetime_index:
+        if len(self) > 0 and len(other) > 0:
             raise_if_not(
                 other.start_time() == self.end_time() + self.freq,
-                "Appended TimeSeries must start one time step after current one.",
+                "Appended TimeSeries must start one (time) step after current one.",
                 logger,
             )
 
@@ -2886,17 +2902,28 @@ def append_values(self, values: np.ndarray) -> Self:
         TimeSeries
             A new TimeSeries with the new values appended
         """
-        if self._has_datetime_index:
-            idx = pd.DatetimeIndex(
-                [self.end_time() + i * self._freq for i in range(1, len(values) + 1)],
-                freq=self._freq,
-            )
-        else:
-            idx = pd.RangeIndex(
-                start=self.end_time() + self._freq,
-                stop=self.end_time() + (len(values) + 1) * self._freq,
-                step=self._freq,
+        if len(values) == 0:
+            return self.copy()
+
+        values = np.array(values) if not isinstance(values, np.ndarray) else values
+        values = expand_arr(values, ndim=len(DIMS))
+        if not values.shape[1:] == self._xa.values.shape[1:]:
+            raise_log(
+                ValueError(
+                    f"The (expanded) values must have the same number of components and samples "
+                    f"(second and third dims) as the series to append to. "
+                    f"Received shape: {values.shape}, expected: {self._xa.values.shape}"
+                ),
+                logger=logger,
             )
+
+        idx = generate_index(
+            start=self.end_time() + self.freq,
+            length=len(values),
+            freq=self.freq,
+            name=self._time_index.name,
+        )
+
         return self.append(
             self.__class__.from_times_and_values(
                 values=values,
@@ -2945,22 +2972,28 @@ def prepend_values(self, values: np.ndarray) -> Self:
         TimeSeries
             A new TimeSeries with the new values prepended.
         """
+        if len(values) == 0:
+            return self.copy()
 
-        if self._has_datetime_index:
-            idx = pd.DatetimeIndex(
-                [
-                    self.start_time() - i * self._freq
-                    for i in reversed(range(1, len(values) + 1))
-                ],
-                freq=self._freq,
-            )
-        else:
-            idx = pd.RangeIndex(
-                self.start_time() - self.freq * len(values),
-                self.start_time(),
-                step=self.freq,
+        values = np.array(values) if not isinstance(values, np.ndarray) else values
+        values = expand_arr(values, ndim=len(DIMS))
+        if not values.shape[1:] == self._xa.values.shape[1:]:
+            raise_log(
+                ValueError(
+                    f"The (expanded) values must have the same number of components and samples "
+                    f"(second and third dims) as the series to prepend to. "
+                    f"Received shape: {values.shape}, expected: {self._xa.values.shape}"
+                ),
+                logger=logger,
             )
 
+        idx = generate_index(
+            end=self.start_time() - self.freq,
+            length=len(values),
+            freq=self.freq,
+            name=self._time_index.name,
+        )
+
         return self.prepend(
             self.__class__.from_times_and_values(
                 values=values,
@@ -4164,8 +4197,6 @@ def plot(
                 central_series = comp.mean(dim=DIMS[2])
 
             alpha = kwargs["alpha"] if "alpha" in kwargs else None
-            if not self.is_deterministic:
-                kwargs["alpha"] = 1
             if custom_labels:
                 label_to_use = label[i]
             else:
@@ -4177,15 +4208,18 @@ def plot(
                     label_to_use = f"{label}_{comp_name}"
             kwargs["label"] = label_to_use
 
+            kwargs_central = deepcopy(kwargs)
+            if not self.is_deterministic:
+                kwargs_central["alpha"] = 1
             if central_series.shape[0] > 1:
-                p = central_series.plot(*args, ax=ax, **kwargs)
+                p = central_series.plot(*args, ax=ax, **kwargs_central)
             # empty TimeSeries
             elif central_series.shape[0] == 0:
                 p = ax.plot(
                     [],
                     [],
                     *args,
-                    **kwargs,
+                    **kwargs_central,
                 )
                 ax.set_xlabel(self.time_index.name)
             else:
@@ -4194,7 +4228,7 @@ def plot(
                     central_series.values[0],
                     "o",
                     *args,
-                    **kwargs,
+                    **kwargs_central,
                 )
             color_used = p[0].get_color() if default_formatting else None
 
diff --git a/darts/utils/historical_forecasts/optimized_historical_forecasts_regression.py b/darts/utils/historical_forecasts/optimized_historical_forecasts_regression.py
index c480d69c57..eeef59f04b 100644
--- a/darts/utils/historical_forecasts/optimized_historical_forecasts_regression.py
+++ b/darts/utils/historical_forecasts/optimized_historical_forecasts_regression.py
@@ -141,18 +141,23 @@ def _optimized_historical_forecasts_last_points_only(
         )
         # forecast has shape ((forecastable_index_length-1)*num_samples, k, n_component)
         # where k = output_chunk length if multi_models, 1 otherwise
-
-        # reshape into (forecasted indexes, n_components, n_samples), components are interleaved
-        forecast = forecast.reshape(X.shape[0], -1, num_samples)
+        # reshape into (forecasted indexes, output_chunk_length, n_components, n_samples)
+        forecast = np.moveaxis(
+            forecast.reshape(
+                X.shape[0],
+                num_samples,
+                model.output_chunk_length if model.multi_models else 1,
+                -1,
+            ),
+            1,
+            -1,
+        )
 
         # extract the last sub-model forecast for each component
         if model.multi_models:
-            forecast = forecast[
-                :,
-                (forecast_horizon - 1) * len(forecast_components) : (forecast_horizon)
-                * len(forecast_components),
-                :,
-            ]
+            forecast = forecast[:, forecast_horizon - 1]
+        else:
+            forecast = forecast[:, 0]
 
         forecasts_list.append(
             TimeSeries.from_times_and_values(
@@ -237,7 +242,6 @@ def _optimized_historical_forecasts_all_points(
 
         # Additional shift, to account for the model output_chunk_length
         shift_start = 0
-        # shift_end = 0
         if model.output_chunk_length > 1:
             # used to convert the shift into the appropriate unit
             unit = freq if series_.has_datetime_index else 1
@@ -296,26 +300,27 @@ def _optimized_historical_forecasts_all_points(
             predict_likelihood_parameters=predict_likelihood_parameters,
             **kwargs,
         )
-
-        # reshape and stride the forecast into (forecastable_index, forecast_horizon, n_components, num_samples)
-        if model.multi_models:
-            # forecast has shape ((forecastable_index_length-1)*num_samples, output_chunk_length, n_component)
-            # and the components are interleaved
-            forecast = forecast.reshape(
+        # forecast has shape ((forecastable_index_length-1)*num_samples, k, n_component)
+        # where k = output_chunk length if multi_models, 1 otherwise
+        # reshape into (forecasted indexes, output_chunk_length, n_components, n_samples)
+        forecast = np.moveaxis(
+            forecast.reshape(
                 X.shape[0],
-                model.output_chunk_length,
-                len(forecast_components),
                 num_samples,
-            )
+                model.output_chunk_length if model.multi_models else 1,
+                -1,
+            ),
+            1,
+            -1,
+        )
+
+        if model.multi_models:
             forecast = forecast[::stride, :forecast_horizon]
         else:
-            # forecast has shape ((forecastable_index_length-1)*num_samples, 1, n_component)
-            # and the components are interleaved
-            forecast = forecast.reshape(X.shape[0], -1, num_samples)
-
-            # forecasts depend on lagged data only, output_chunk_length is reconstitued by applying a sliding window
+            # entire forecast horizon is given by multiple (previous) forecasts -> apply sliding window
             forecast = sliding_window_view(
-                forecast, (forecast_horizon, len(forecast_components), num_samples)
+                forecast[:, 0],
+                (forecast_horizon, len(forecast_components), num_samples),
             )
 
             # apply stride, remove the last windows, slice output_chunk_length to keep forecast_horizon values
diff --git a/darts/utils/likelihood_models.py b/darts/utils/likelihood_models.py
index 6d4d7fd1a1..75b7d86f07 100644
--- a/darts/utils/likelihood_models.py
+++ b/darts/utils/likelihood_models.py
@@ -60,7 +60,11 @@
 from darts.logging import raise_if_not
 
 # TODO: Table on README listing distribution, possible priors and wiki article
-from darts.utils.utils import _check_quantiles
+from darts.utils.utils import (
+    _check_quantiles,
+    likelihood_component_names,
+    quantile_names,
+)
 
 MIN_CAUCHY_GAMMA_SAMPLING = 1e-100
 
@@ -1354,11 +1358,10 @@ def _params_from_output(self, model_output: torch.Tensor) -> None:
 
     def likelihood_components_names(self, input_series: TimeSeries) -> List[str]:
         """Each component have their own quantiles"""
-        return [
-            f"{tgt_name}_q{quantile:.2f}"
-            for tgt_name in input_series.components
-            for quantile in self.quantiles
-        ]
+        return likelihood_component_names(
+            components=input_series.components,
+            parameter_names=quantile_names(self.quantiles),
+        )
 
     def simplified_name(self) -> str:
         return "quantile"
diff --git a/darts/utils/utils.py b/darts/utils/utils.py
index 643c0655f1..c7c2480215 100644
--- a/darts/utils/utils.py
+++ b/darts/utils/utils.py
@@ -6,7 +6,7 @@
 from enum import Enum
 from functools import wraps
 from inspect import Parameter, getcallargs, signature
-from typing import Callable, Iterator, List, Optional, Tuple, TypeVar, Union
+from typing import Callable, Iterator, List, Optional, Sequence, Tuple, TypeVar, Union
 
 import numpy as np
 import pandas as pd
@@ -67,6 +67,66 @@ class ModelMode(Enum):
 }
 
 
+def likelihood_component_names(
+    components: Union[pd.Index, List[str]], parameter_names: List[str]
+):
+    """Generates formatted likelihood parameter names for components and parameter names.
+
+    The order of the returned names is: `[comp1_param_1, ... comp1_param_n, ..., comp_n_param_n]`.
+
+    Parameters
+    ----------
+    components
+        A sequence of component names to add to the beginning of the returned names.
+    parameter_names
+        A sequence of likelihood parameter names to add to the end of the returned names.
+    """
+    return [
+        f"{tgt_name}_{param_n}"
+        for tgt_name in components
+        for param_n in parameter_names
+    ]
+
+
+def quantile_names(q: Union[float, List[float]], component: Optional[str] = None):
+    """Generates formatted quantile names, optionally added to a component name.
+
+    Parameters
+    ----------
+    q
+        A float or list of floats with the quantiles to generate the names for.
+    component
+        Optionally, a component name to add to the beginning of the quantile names.
+    """
+    # predicted quantile text format
+    comp = f"{component}_" if component is not None else ""
+    if isinstance(q, float):
+        return f"{comp}q{q:.2f}"
+    else:
+        return [f"{comp}q{q_i:.2f}" for q_i in q]
+
+
+def quantile_interval_names(
+    q_interval: Union[Tuple[float, float], Sequence[Tuple[float, float]]],
+    component: Optional[str] = None,
+):
+    """Generates formatted quantile interval names, optionally added to a component name.
+
+    Parameters
+    ----------
+    q_interval
+        A tuple or multiple tuples with the (lower bound, upper bound) of the quantile intervals.
+    component
+        Optionally, a component name to add to the beginning of the quantile names.
+    """
+    # predicted quantile text format
+    comp = f"{component}_" if component is not None else ""
+    if isinstance(q_interval, tuple):
+        return f"{comp}q{q_interval[0]:.2f}_q{q_interval[1]:.2f}"
+    else:
+        return [f"{comp}q{q_lo:.2f}_q{q_hi:.2f}" for q_lo, q_hi in q_interval]
+
+
 def _build_tqdm_iterator(iterable, verbose, **kwargs):
     """
     Build an iterable, possibly using tqdm (either in notebook or regular mode)
@@ -429,8 +489,8 @@ def n_steps_between(
 
 
 def generate_index(
-    start: Optional[Union[pd.Timestamp, int]] = None,
-    end: Optional[Union[pd.Timestamp, int]] = None,
+    start: Optional[Union[pd.Timestamp, str, int]] = None,
+    end: Optional[Union[pd.Timestamp, str, int]] = None,
     length: Optional[int] = None,
     freq: Union[str, int, pd.DateOffset] = None,
     name: str = None,
@@ -441,7 +501,7 @@ def generate_index(
     Parameters
     ----------
     start
-        The start of the returned index. If a pandas Timestamp is passed, the index will be a pandas
+        The start of the returned index. If a pandas Timestamp or a date string is passed, the index will be a pandas
         DatetimeIndex. If an integer is passed, the index will be a pandas RangeIndex index. Works only with
         either `length` or `end`.
     end
@@ -477,6 +537,9 @@ def generate_index(
         logger,
     )
 
+    start = pd.Timestamp(start) if isinstance(start, str) else start
+    end = pd.Timestamp(end) if isinstance(end, str) else end
+
     if isinstance(start, pd.Timestamp) or isinstance(end, pd.Timestamp):
         freq = "D" if freq is None else freq
         freq = pd.tseries.frequencies.to_offset(freq) if isinstance(freq, str) else freq