diff --git a/dev/index.html b/dev/index.html index 168920a..48ca5a9 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,14 +1,16 @@ -FractalDimensions.jl · FractalDimensions.jl

FractalDimensions.jl

FractalDimensionsModule

FractalDimensions.jl

CI codecov Package Downloads

A Julia package that estimates various definitions of fractal dimension from data. It can be used as a standalone package, or as part of DynamicalSystems.jl.

To install it, run import Pkg; Pkg.add("FractalDimensions").

All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.

Previously, this package was part of ChaosTools.jl.

Citation

If you use this package in a publication, please cite the paper below:

@ARTICLE{FractalDimensions.jl,
-  title     = "Estimating the fractal dimension: a comparative review and open
-               source implementations",
-  author    = "Datseris, George and Kottlarz, Inga and Braun, Anton P and
-               Parlitz, Ulrich",
-  publisher = "arXiv",
-  year      =  2021,
-  doi = {10.48550/ARXIV.2109.05937},
-  url = {https://arxiv.org/abs/2109.05937},
-}
source

Introduction

Note

This package is accompanying a review paper on estimating the fractal dimension: https://arxiv.org/abs/2109.05937. The paper is continuing the discussion of chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.

There are numerous methods that one can use to calculate a so-called "dimension" of a dataset which in the context of dynamical systems is called the Fractal dimension. One way to do this is to estimate the scaling behaviour of some quantity as a size/scale increases. In the Fractal dimension example below, one finds the scaling of the correlation sum versus a ball radius. In this case, it approximately holds $ \log(C) \approx \Delta\log(\varepsilon) $ for radius $\varepsilon$. The scaling of many other quantities can be estimated as well, such as the generalized entropy, the Higuchi length, or others provided here.

To actually find $\Delta$, one needs to find a linearly scaling region in the graph $\log(C)$ vs. $\log(\varepsilon)$ and estimate its slope. Hence, identifying a linear region is central to estimating a fractal dimension. That is why, the section Linear scaling regions is of central importance for this documentation.

Fractal dimension example

In this simplest example we will calculate the fractal dimension of the chaotic attractor of the Hénon map (for default parameters). For this example, we will generate the data on the spot:

using DynamicalSystemsBase # for simulating dynamical systems
+FractalDimensions.jl · FractalDimensions.jl

FractalDimensions.jl

FractalDimensionsModule

FractalDimensions.jl

CI codecov Package Downloads

A Julia package that estimates various definitions of fractal dimension from data. It can be used as a standalone package, or as part of DynamicalSystems.jl.

To install it, run import Pkg; Pkg.add("FractalDimensions").

All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.

Previously, this package was part of ChaosTools.jl.

Publication

FractalDimensions.jl is used in a review article comparing various estimators for fractal dimensions. The paper is likely a relevant read if you are interested in the package. And if you use the package, please cite the paper.

@article{FractalDimensions.jl,
+  doi = {10.1063/5.0160394},
+  url = {https://doi.org/10.1063/5.0160394},
+  year = {2023},
+  month = oct,
+  publisher = {{AIP} Publishing},
+  volume = {33},
+  number = {10},
+  author = {George Datseris and Inga Kottlarz and Anton P. Braun and Ulrich Parlitz},
+  title = {Estimating fractal dimensions: A comparative review and open source implementations},
+  journal = {Chaos: An Interdisciplinary Journal of Nonlinear Science}
+}
source

Introduction

Note

This package is accompanying a review paper on estimating the fractal dimension: https://arxiv.org/abs/2109.05937. The paper is continuing the discussion of chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.

There are numerous methods that one can use to calculate a so-called "dimension" of a dataset which in the context of dynamical systems is called the Fractal dimension. One way to do this is to estimate the scaling behaviour of some quantity as a size/scale increases. In the Fractal dimension example below, one finds the scaling of the correlation sum versus a ball radius. In this case, it approximately holds $ \log(C) \approx \Delta\log(\varepsilon) $ for radius $\varepsilon$. The scaling of many other quantities can be estimated as well, such as the generalized entropy, the Higuchi length, or others provided here.

To actually find $\Delta$, one needs to find a linearly scaling region in the graph $\log(C)$ vs. $\log(\varepsilon)$ and estimate its slope. Hence, identifying a linear region is central to estimating a fractal dimension. That is why, the section Linear scaling regions is of central importance for this documentation.

Fractal dimension example

In this simplest example we will calculate the fractal dimension of the chaotic attractor of the Hénon map (for default parameters). For this example, we will generate the data on the spot:

using DynamicalSystemsBase # for simulating dynamical systems
 using CairoMakie           # for plotting
 
 henon_rule(x, p, n) = SVector(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])
@@ -50,15 +52,46 @@
 for r in lrs
     scatterlines!(ax, xs[r], ys[r])
 end
-fig

The LargestLinearRegion method finds, and computes the slope of, the largest region:

Δ = slopefit(xs, ys, LargestLinearRegion())
(1.2318376178087478, 1.2233720116518771, 1.2403032239656184)

This result is an approximation of a fractal dimension.

The whole above pipeline we went through is bundled in grassberger_proccacia_dim. Similar work is done by generalized_dim and many other functions.

Be wary when using `xxxxx_dim`

As stated clearly by the documentation strings, all pre-made dimension estimating functions (ending in _dim) perform a lot of automated steps, each having its own heuristic choices for function default values. They are more like convenient bundles with on-average good defaults, rather than precise functions. You should be careful when considering the validity of the returned number!

Linear scaling regions

FractalDimensions.slopefitFunction
slopefit(x, y [, t::SLopeFit]; kw...) → s, s05, s95

Fit a linear scaling region in the curve of the two AbstractVectors y versus x using t as the estimation method. Return the estimated slope, as well as the confidence intervals for it.

The methods t that can be used for the estimation are:

The keyword ignore_saturation = true ignores saturation that (sometimes) happens at the start and end of the curve y(x), where the curve flattens. The keyword sat_threshold = 0.01 decides what saturation is: while abs(y[i]-y[i+1]) < sat_threshold we are in a saturation regime. Said differently, slopes with value sat_threshold/dx with dx = x[i+1] - x[i] are neglected.

The keyword ci = 0.95 specifies which quantile (and the 1 - quantile) the confidence interval values are returned at, and by defualt it is 95% (and hence also 5%).

source
FractalDimensions.LinearRegressionType
LinearRegression <: SLopeFit
-LinearRegression()

Standard linear regression fit to all available data. Estimation of the confidence intervals is based om the standard error of the slope following a T-distribution, see:

https://stattrek.com/regression/slope-confidence-interval

source
FractalDimensions.LargestLinearRegionType
LargestLinearRegion <: SlopeFit
-LargestLinearRegion(; dxi::Int = 1, tol = 0.25)

Identify regions where the curve y(x) is linear, by scanning the x-axis every dxi indices sequentially (e.g. at x[1] to x[5], x[5] to x[10], x[10] to x[15] and so on if dxi=5).

If the slope (calculated via linear regression) of a region of width dxi is approximatelly equal to that of the previous region, within relative tolerance tol and absolute tolerance 0, then these two regions belong to the same linear region.

The largest such region is then used to estimate the slope via standard linear regression of all points belonging to the largest linear region. "Largest" here means the region that covers the more extent along the x-axis.

Use linear_regions if you wish to obtain the decomposition into linear regions.

source
FractalDimensions.linear_regionsFunction
linear_regions(x, y; dxi, tol) → lrs, tangents

Apply the algorithm described by LargestLinearRegion, and return the indices of x that correspond to the linear regions, lrs, and the tangents at each region (obtained via a second linear regression at each accumulated region). lrs is hence a vector of UnitRanges.

source
FractalDimensions.AllSlopesDistributionType
AllSlopesDistribution <: SlopeFit
-AllSlopesDistribution()

Estimate a slope by computing the distribution of all possible slopes that can be estimated from the curve y(x), according to the method by [Deshmukh2021]. The returned slope is the distribution mean and the confidence intervals are simply the corresponding quantiles of the distribution.

Not implemented yet, the method is here as a placeholder.

source
FractalDimensions.estimate_boxsizesFunction
estimate_boxsizes(X::AbstractStateSpaceSet; kwargs...) → εs

Return k exponentially spaced values: εs = base .^ range(lower + w, upper + z; length = k), that are a good estimate for sizes ε that are used in calculating a Fractal Dimension. It is strongly recommended to standardize input dataset before using this function.

Let d₋ be the minimum pair-wise distance in X, d₋ = dminimum_pairwise_distance(X). Let d₊ be the average total length of X, d₊ = mean(ma - mi) with mi, ma = minmaxima(X). Then lower = log(base, d₋) and upper = log(base, d₊). Because by default w=1, z=-1, the returned sizes are an order of mangitude larger than the minimum distance, and an order of magnitude smaller than the maximum distance.

Keywords

  • w = 1, z = -1, k = 16 : as explained above.
  • base = MathConstants.e : the base used in the log function.
  • warning = true: Print some warnings for bad estimates.
  • autoexpand = true: If the final estimated range does not cover at least 2 orders of magnitude, it is automatically expanded by setting w -= we and z -= ze. You can set different default values to the keywords we = w, ze = z.
source
FractalDimensions.minimum_pairwise_distanceFunction
minimum_pairwise_distance(X::StateSpaceSet, kdtree = dimension(X) < 10, metric = Euclidean())

Return min_d, min_pair: the minimum pairwise distance of all points in the dataset, and the corresponding point pair. The third argument is a switch of whether to use KDTrees or a brute force search.

source

Generalized (entropy) dimension

Based on the definition of the Generalized entropy (genentropy), one can calculate an appropriate dimension, called generalized dimension:

FractalDimensions.generalized_dimFunction
generalized_dim(X::StateSpaceSet [, sizes]; q = 1, base = 2) -> Δ_q

Return the q order generalized dimension of X, by calculating its histogram-based Rényi entropy for each ε ∈ sizes.

The case of q = 0 is often called "capacity" or "box-counting" dimension, while q = 1 is the "information" dimension.

Description

The returned dimension is approximated by the (inverse) power law exponent of the scaling of the Renyi entropy $H_q$, versus the box size ε, where ε ∈ sizes:

\[H_q \approx -\Delta_q\log_{b}(\varepsilon)\]

$H_q$ is calculated using ComplexityMeasures: Renyi, ValueHistogram, entropy, i.e., by doing a histogram of the data with a given box size.

Calling this function performs a lot of automated steps:

  1. A vector of box sizes is decided by calling sizes = estimate_boxsizes(dataset), if sizes is not given.
  2. For each element of sizes the appropriate entropy is calculated as
    H = [entropy(Renyi(; q, base), ValueHistogram(ε), data) for ε ∈ sizes]
    Let x = -log.(sizes).
  3. The curve H(x) is decomposed into linear regions, using slopefit(x, h)[1].
  4. The biggest linear region is chosen, and a fit for the slope of that region is performed using the function linear_region, which does a simple linear regression fit using linreg. This slope is the return value of generalized_dim.

By doing these steps one by one yourself, you can adjust the keyword arguments given to each of these function calls, refining the accuracy of the result. The source code of this function is only 3 lines of code.

source
FractalDimensions.molteno_dimFunction
molteno_dim(X::AbstractStateSpaceSet; k0::Int = 10, q = 1.0, base = 2)

Return an estimate of the generalized_dim of X using the algorithm by [Molteno1993]. This function is a simple utilization of the probabilities estimated by molteno_boxing so see that function for more details. Here the entropy of the probabilities is computed at each size, and a line is fitted in the entropy vs log(size) graph, just like in generalized_dim.

source
FractalDimensions.molteno_boxingFunction
molteno_boxing(X::AbstractStateSpaceSet; k0::Int = 10) → (probs, εs)

Distribute X into boxes whose size is halved in each step, according to the algorithm by [Molteno1993]. Division stops if the average number of points per filled box falls below the threshold k0.

Return probs, a vector of Probabilities of finding points in boxes for different box sizes, and the corresponding box sizes εs. These outputs are used in molteno_dim.

Description

Project the data onto the whole interval of numbers that is covered by UInt64. The projected data is distributed into boxes whose size decreases by factor 2 in each step. For each box that contains more than one point 2^D new boxes are created where D is the dimension of the data.

The process of dividing the data into new boxes stops when the number of points over the number of filled boxes falls below k0. The box sizes εs are calculated and returned together with the probs.

This algorithm is faster than the traditional approach of using ValueHistogram(ε::Real), but it is only suited for low dimensional data since it divides each box into 2^D new boxes if D is the dimension. For large D this leads to low numbers of box divisions before the threshold is passed and the divison stops. This results to a low number of data points to fit the dimension to and thereby a poor estimate.

source

Correlation sum based dimension

FractalDimensions.grassberger_proccacia_dimFunction
grassberger_proccacia_dim(X::AbstractStateSpaceSet, εs = estimate_boxsizes(data); kwargs...)

Use the method of Grassberger and Proccacia[Grassberger1983], and the correction by Theiler[Theiler1986], to estimate the correlation dimension Δ_C of X.

This function does something extremely simple:

cm = correlationsum(data, εs; kwargs...)
-Δ_C = slopefit(rs, ys)(log2.(sizes), log2.(cm))[1]

i.e. it calculates correlationsum for various radii and then tries to find a linear region in the plot of the log of the correlation sum versus log(ε).

See correlationsum for the available keywords. See also takens_best_estimate, boxassisted_correlation_dim.

source
FractalDimensions.correlationsumFunction
correlationsum(X, ε::Real; w = 0, norm = Euclidean(), q = 2) → C_q(ε)

Calculate the q-order correlation sum of X (StateSpaceSet or timeseries) for a given radius ε and norm. They keyword show_progress = true can be used to display a progress bar for large X.

correlationsum(X, εs::AbstractVector; w, norm, q) → C_q(ε)

If εs is a vector, C_q is calculated for each ε ∈ εs more efficiently. Multithreading is also enabled over the available threads (Threads.nthreads()). The function boxed_correlationsum is typically faster if the dimension of X is small and if maximum(εs) is smaller than the size of X.

Keyword arguments

  • q = 2: order of the correlation sum
  • norm = Euclidean(): distance norm
  • w = 0: Theiler window
  • show_progress = true: display a progress bar

Description

The correlation sum is defined as follows for q=2:

\[C_2(\epsilon) = \frac{2}{(N-w)(N-w-1)}\sum_{i=1}^{N}\sum_{j=1+w+i}^{N} +fig

The LargestLinearRegion method finds, and computes the slope of, the largest region:

Δ = slopefit(xs, ys, LargestLinearRegion())
(1.2318376178087478, 1.2233720116518771, 1.2403032239656184)

This result is an approximation of a fractal dimension.

The whole above pipeline we went through is bundled in grassberger_proccacia_dim. Similar work is done by generalized_dim and many other functions.

Be wary when using `xxxxx_dim`

As stated clearly by the documentation strings, all pre-made dimension estimating functions (ending in _dim) perform a lot of automated steps, each having its own heuristic choices for function default values. They are more like convenient bundles with on-average good defaults, rather than precise functions. You should be careful when considering the validity of the returned number!

Linear scaling regions

FractalDimensions.slopefitFunction
slopefit(x, y [, t::SLopeFit]; kw...) → s, s05, s95

Fit a linear scaling region in the curve of the two AbstractVectors y versus x using t as the estimation method. Return the estimated slope, as well as the confidence intervals for it.

The methods t that can be used for the estimation are:

The keyword ignore_saturation = true ignores saturation that (sometimes) happens at the start and end of the curve y(x), where the curve flattens. The keyword sat_threshold = 0.01 decides what saturation is: while abs(y[i]-y[i+1]) < sat_threshold we are in a saturation regime. Said differently, slopes with value sat_threshold/dx with dx = x[i+1] - x[i] are neglected.

The keyword ci = 0.95 specifies which quantile (and the 1 - quantile) the confidence interval values are returned at, and by defualt it is 95% (and hence also 5%).

source
FractalDimensions.LinearRegressionType
LinearRegression <: SlopeFit
+LinearRegression()

Standard linear regression fit to all available data. Estimation of the confidence intervals is based om the standard error of the slope following a T-distribution, see:

https://stattrek.com/regression/slope-confidence-interval

source
FractalDimensions.LargestLinearRegionType
LargestLinearRegion <: SlopeFit
+LargestLinearRegion(; dxi::Int = 1, tol = 0.25)

Identify regions where the curve y(x) is linear, by scanning the x-axis every dxi indices sequentially (e.g. at x[1] to x[5], x[5] to x[10], x[10] to x[15] and so on if dxi=5).

If the slope (calculated via linear regression) of a region of width dxi is approximatelly equal to that of the previous region, within relative tolerance tol and absolute tolerance 0, then these two regions belong to the same linear region.

The largest such region is then used to estimate the slope via standard linear regression of all points belonging to the largest linear region. "Largest" here means the region that covers the more extent along the x-axis.

Use linear_regions if you wish to obtain the decomposition into linear regions.

source
FractalDimensions.linear_regionsFunction
linear_regions(x, y; dxi, tol) → lrs, tangents

Apply the algorithm described by LargestLinearRegion, and return the indices of x that correspond to the linear regions, lrs, and the tangents at each region (obtained via a second linear regression at each accumulated region). lrs is hence a vector of UnitRanges.

source
FractalDimensions.AllSlopesDistributionType
AllSlopesDistribution <: SlopeFit
+AllSlopesDistribution()

Estimate a slope by computing the distribution of all possible slopes that can be estimated from the curve y(x), according to the method by (Deshmukh et al., 2021). The returned slope is the distribution mean and the confidence intervals are simply the corresponding quantiles of the distribution.

Not implemented yet, the method is here as a placeholder.

source
FractalDimensions.estimate_boxsizesFunction
estimate_boxsizes(X::AbstractStateSpaceSet; kwargs...) → εs

Return k exponentially spaced values: εs = base .^ range(lower + w, upper + z; length = k), that are a good estimate for sizes ε that are used in calculating a Fractal Dimension. It is strongly recommended to standardize input dataset before using this function.

Let d₋ be the minimum pair-wise distance in X, d₋ = dminimum_pairwise_distance(X). Let d₊ be the average total length of X, d₊ = mean(ma - mi) with mi, ma = minmaxima(X). Then lower = log(base, d₋) and upper = log(base, d₊). Because by default w=1, z=-1, the returned sizes are an order of mangitude larger than the minimum distance, and an order of magnitude smaller than the maximum distance.

Keywords

  • w = 1, z = -1, k = 16 : as explained above.
  • base = MathConstants.e : the base used in the log function.
  • warning = true: Print some warnings for bad estimates.
  • autoexpand = true: If the final estimated range does not cover at least 2 orders of magnitude, it is automatically expanded by setting w -= we and z -= ze. You can set different default values to the keywords we = w, ze = z.
source
FractalDimensions.minimum_pairwise_distanceFunction
minimum_pairwise_distance(X::StateSpaceSet, kdtree = dimension(X) < 10, metric = Euclidean())

Return min_d, min_pair: the minimum pairwise distance of all points in the dataset, and the corresponding point pair. The third argument is a switch of whether to use KDTrees or a brute force search.

source

Generalized (entropy) dimension

Based on the definition of the Generalized entropy (genentropy), one can calculate an appropriate dimension, called generalized dimension:

FractalDimensions.generalized_dimFunction
generalized_dim(X::StateSpaceSet [, sizes]; q = 1, base = 2) -> Δ_q

Return the q order generalized dimension of X, by calculating its histogram-based Rényi entropy for each ε ∈ sizes.

The case of q = 0 is often called "capacity" or "box-counting" dimension, while q = 1 is the "information" dimension.

Description

The returned dimension is approximated by the (inverse) power law exponent of the scaling of the Renyi entropy $H_q$, versus the box size ε, where ε ∈ sizes:

\[H_q \approx -\Delta_q\log_{b}(\varepsilon)\]

$H_q$ is calculated using ComplexityMeasures: Renyi, ValueHistogram, entropy, i.e., by doing a histogram of the data with a given box size.

Calling this function performs a lot of automated steps:

  1. A vector of box sizes is decided by calling sizes = estimate_boxsizes(dataset), if sizes is not given.
  2. For each element of sizes the appropriate entropy is calculated as
    H = [entropy(Renyi(; q, base), ValueHistogram(ε), data) for ε ∈ sizes]
    Let x = -log.(sizes).
  3. The curve H(x) is decomposed into linear regions, using slopefit(x, h)[1].
  4. The biggest linear region is chosen, and a fit for the slope of that region is performed using the function linear_region, which does a simple linear regression fit using linreg. This slope is the return value of generalized_dim.

By doing these steps one by one yourself, you can adjust the keyword arguments given to each of these function calls, refining the accuracy of the result. The source code of this function is only 3 lines of code.

This approach to estimating the fractal dimension has been used (to our knowledge) for the first time in (Russell et al., 1980).

source
FractalDimensions.molteno_dimFunction
molteno_dim(X::AbstractStateSpaceSet; k0::Int = 10, q = 1.0, base = 2)

Return an estimate of the generalized_dim of X using the algorithm by (Molteno, 1993). This function is a simple utilization of the probabilities estimated by molteno_boxing so see that function for more details. Here the entropy of the probabilities is computed at each size, and a line is fitted in the entropy vs log(size) graph, just like in generalized_dim.

source
FractalDimensions.molteno_boxingFunction
molteno_boxing(X::AbstractStateSpaceSet; k0::Int = 10) → (probs, εs)

Distribute X into boxes whose size is halved in each step, according to the algorithm by (Molteno, 1993). Division stops if the average number of points per filled box falls below the threshold k0.

Return probs, a vector of Probabilities of finding points in boxes for different box sizes, and the corresponding box sizes εs. These outputs are used in molteno_dim.

Description

Project the data onto the whole interval of numbers that is covered by UInt64. The projected data is distributed into boxes whose size decreases by factor 2 in each step. For each box that contains more than one point 2^D new boxes are created where D is the dimension of the data.

The process of dividing the data into new boxes stops when the number of points over the number of filled boxes falls below k0. The box sizes εs are calculated and returned together with the probs.

This algorithm is faster than the traditional approach of using ValueHistogram(ε::Real), but it is only suited for low dimensional data since it divides each box into 2^D new boxes if D is the dimension. For large D this leads to low numbers of box divisions before the threshold is passed and the divison stops. This results to a low number of data points to fit the dimension to and thereby a poor estimate.

source

Correlation sum based dimension

FractalDimensions.grassberger_proccacia_dimFunction
grassberger_proccacia_dim(X::AbstractStateSpaceSet, εs = estimate_boxsizes(data); kwargs...)

Use the method of Grassberger and Proccacia (Grassberger and Procaccia, 1983), and the correction by (Theiler, 1986), to estimate the correlation dimension Δ_C of X.

This function does something extremely simple:

cm = correlationsum(data, εs; kwargs...)
+Δ_C = slopefit(rs, ys)(log2.(sizes), log2.(cm))[1]

i.e. it calculates correlationsum for various radii and then tries to find a linear region in the plot of the log of the correlation sum versus log(ε).

See correlationsum for the available keywords. See also takens_best_estimate, boxassisted_correlation_dim.

source
FractalDimensions.correlationsumFunction
correlationsum(X, ε::Real; w = 0, norm = Euclidean(), q = 2) → C_q(ε)

Calculate the q-order correlation sum of X (StateSpaceSet or timeseries) for a given radius ε and norm. They keyword show_progress = true can be used to display a progress bar for large X.

correlationsum(X, εs::AbstractVector; w, norm, q) → C_q(ε)

If εs is a vector, C_q is calculated for each ε ∈ εs more efficiently. Multithreading is also enabled over the available threads (Threads.nthreads()). The function boxed_correlationsum is typically faster if the dimension of X is small and if maximum(εs) is smaller than the size of X.

Keyword arguments

  • q = 2: order of the correlation sum
  • norm = Euclidean(): distance norm
  • w = 0: Theiler window
  • show_progress = true: display a progress bar

Description

The correlation sum is defined as follows for q=2:

\[C_2(\epsilon) = \frac{2}{(N-w)(N-w-1)}\sum_{i=1}^{N}\sum_{j=1+w+i}^{N} B(||X_i - X_j|| < \epsilon)\]

for as follows for q≠2

\[C_q(\epsilon) = \left[ \sum_{i=1}^{N} \alpha_i -\left[\sum_{j:|i-j| > w} B(||X_i - X_j|| < \epsilon)\right]^{q-1}\right]^{1/(q-1)}\]

where

\[\alpha_i = 1 / (N (\max(N-w, i) - \min(w + 1, i))^{(q-1)})\]

with $N$ the length of X and $B$ gives 1 if its argument is true. w is the Theiler window.

See the article of Grassberger for the general definition [Grassberger2007] and the book "Nonlinear Time Series Analysis" [Kantz2003], Ch. 6, for a discussion around choosing best values for w, and Ch. 11.3 for the explicit definition of the q-order correlationsum. Note that the formula in 11.3 is incorrect, but corrected here, indices are adapted to take advantage of all available points and also note that we immediatelly exponentiate $C_q$ to $1/(q-1)$, so that it scales exponentially as $C_q \propto \varepsilon ^\Delta_q$ versus the size $\varepsilon$.

source

Box-assisted version

FractalDimensions.boxassisted_correlation_dimFunction
boxassisted_correlation_dim(X::AbstractStateSpaceSet; kwargs...)

Use the box-assisted optimizations of [Bueno2007] to estimate the correlation dimension Δ_C of X.

This function does something extremely simple:

εs, Cs = boxed_correlationsum(X; kwargs...)
-slopefit(log2.(εs), log2.(Cs))[1]

and hence see boxed_correlationsum for more information and available keywords.

source
FractalDimensions.boxed_correlationsumFunction
boxed_correlationsum(X::AbstractStateSpaceSet, εs, r0 = maximum(εs); kwargs...) → Cs

Estimate the correlationsum for each size ε ∈ εs using an optimized algorithm that first distributes data into boxes of size r0, and then computes the correlation sum for each box and each neighboring box of each box. This method is much faster than correlationsum, provided that the box size r0 is significantly smaller than the attractor length. Good choices for r0 are estimate_r0_buenoorovio or estimate_r0_theiler.

boxed_correlationsum(X::AbstractStateSpaceSet; kwargs...) → εs, Cs

In this method the minimum inter-point distance and estimate_r0_buenoorovio of X are used to estimate suitable εs for the calculation, which are also returned.

Keyword arguments

  • q = 2 : The order of the correlation sum.
  • P = 2 : The prism dimension.
  • w = 0 : The Theiler window.
  • show_progress = false : Whether to display a progress bar for the calculation.
  • norm = Euclidean() : Distance norm.

Description

C_q(ε) is calculated for every ε ∈ εs and each of the boxes to then be summed up afterwards. The method of splitting the data into boxes was implemented according to Theiler[Theiler1987]. w is the Theiler window. P is the prism dimension. If P is unequal to the dimension of the data, only the first P dimensions are considered for the box distribution (this is called the prism-assisted version). By default P is 2, which is the version suggested by [Bueno2007]. Alternative for P is the prismdim_theiler. Note that only when P = dimension(X) the boxed version is guaranteed to be exact to the original correlationsum. For any other P, some point pairs that should have been included may be skipped due to having smaller distance in the remaining dimensions, but larger distance in the first P dimensions.

source
FractalDimensions.estimate_r0_buenoorovioFunction
estimate_r0_buenoorovio(X::AbstractStateSpaceSet, P = 2) → r0, ε0

Estimate a reasonable size for boxing X, proposed by Bueno-Orovio and Pérez-García[Bueno2007], before calculating the correlation dimension as presented by Theiler[Theiler1983]. Return the size r0 and the minimum interpoint distance ε0 in the data.

If instead of boxes, prisms are chosen everything stays the same but P is the dimension of the prism. To do so the dimension ν is estimated by running the algorithm by Grassberger and Procaccia[Grassberger1983] with √N points where N is the number of total data points. An effective size of the attractor is calculated by boxing a small subset of size N/10 into boxes of sidelength r_ℓ and counting the number of filled boxes η_ℓ.

\[\ell = r_\ell \eta_\ell ^{1/\nu}\]

The optimal number of filled boxes η_opt is calculated by minimising the number of calculations.

\[\eta_\textrm{opt} = N^{2/3}\cdot \frac{3^\nu - 1}{3^P - 1}^{1/2}.\]

P is the dimension of the data or the number of edges on the prism that don't span the whole dataset.

Then the optimal boxsize $r_0$ computes as

\[r_0 = \ell / \eta_\textrm{opt}^{1/\nu}.\]

source
FractalDimensions.estimate_r0_theilerFunction
estimate_r0_theiler(X::AbstractStateSpaceSet) → r0, ε0

Estimate a reasonable size for boxing the data X before calculating the boxed_correlationsum proposed by Theiler[Theiler1987]. Return the boxing size r0 and minimum inter-point distance in X, ε0.

To do so the dimension is estimated by running the algorithm by Grassberger and Procaccia[Grassberger1983] with √N points where N is the number of total data points. Then the optimal boxsize $r_0$ computes as

\[r_0 = R (2/N)^{1/\nu}\]

where $R$ is the size of the chaotic attractor and $\nu$ is the estimated dimension.

source

Fixed mass correlation sum

FractalDimensions.fixedmass_correlation_dimFunction
fixedmass_correlation_dim(X [, max_j]; kwargs...)

Use the fixed mass algorithm for computing the correlation sum, and use the result to compute the correlation dimension Δ_M of X.

This function does something extremely simple:

rs, ys = fixedmass_correlationsum(X, args...; kwargs...)
-slopefit(rs, ys)[1]
source
FractalDimensions.fixedmass_correlationsumFunction
fixedmass_correlationsum(X [, max_j]; metric = Euclidean(), M = length(X)) → rs, ys

A fixed mass algorithm for the calculation of the correlationsum, and subsequently a fractal dimension $\Delta$, with max_j the maximum number of neighbours that should be considered for the calculation.

By default max_j = clamp(N*(N-1)/2, 5, 32) with N the data length.

Keyword arguments

  • M defines the number of points considered for the averaging of distances, randomly subsampling them from X.
  • metric = Euclidean() is the distance metric.
  • start_j = 4 computes the equation below starting from j = start_j. Typically the first j values have not converged to the correct scaling of the fractal dimension.

Description

"Fixed mass" algorithms mean that instead of trying to find all neighboring points within a radius, one instead tries to find the max radius containing j points. A correlation sum is obtained with this constrain, and equivalently the mean radius containing k points. Based on this, one can calculate $\Delta$ approximating the information dimension. The implementation here is due to to [Grassberger1988], which defines

\[Ψ(j) - \log N \sim \Delta \times \overline{\log \left( r_{(j)}\right)}\]

where $\Psi(j) = \frac{\text{d} \log Γ(j)}{\text{d} j}$ is the digamma function, rs = $\overline{\log \left( r_{(j)}\right)}$ is the mean logarithm of a radius containing j neighboring points, and ys = $\Psi(j) - \log N$ ($N$ is the length of the data). The amount of neighbors found $j$ range from 2 to max_j. The numbers are also converted to base $2$ from base $e$.

$\Delta$ can be computed by using linear_region(rs, ys).

source

Takens best estimate

FractalDimensions.takens_best_estimate_dimFunction
takens_best_estimate_dim(X, εmax, metric = Chebyshev(), εmin = 0)

Use the "Takens' best estimate" [Takens1985][Theiler1988] method for estimating the correlation dimension.

The original formula is

\[\Delta_C \approx \frac{C(\epsilon_\text{max})}{\int_0^{\epsilon_\text{max}}(C(\epsilon) / \epsilon) \, d\epsilon}\]

where $C$ is the correlationsum and $\epsilon_\text{max}$ is an upper cutoff. Here we use the later expression

\[\Delta_C \approx - \frac{1}{\eta},\quad \eta = \frac{1}{(N-1)^*}\sum_{[i, j]^*}\log(||X_i - X_j|| / \epsilon_\text{max})\]

where the sum happens for all $i, j$ so that $i < j$ and $||X_i - X_j|| < \epsilon_\text{max}$. In the above expression, the bias in the original paper has already been corrected, as suggested in [Borovkova1999].

According to [Borovkova1999], introducing a lower cutoff εmin can make the algorithm more stable (no divergence), this option is given but defaults to zero.

If X comes from a delay coordinates embedding of a timseries x, a recommended value for $\epsilon_\text{max}$ is std(x)/4.

You may also use

Δ_C, Δu_C, Δl_C = FractalDimensions.takens_best_estimate(args...)

to obtain the upper and lower 95% confidence intervals. The intervals are estimated from the log-likelihood function by finding the values of Δ_C where the function has fallen by 2 from its maximum, see e.g. [Barlow] chapter 5.3.

source

Kaplan-Yorke dimension

FractalDimensions.kaplanyorke_dimFunction
kaplanyorke_dim(λs::AbstractVector)

Calculate the Kaplan-Yorke dimension, a.k.a. Lyapunov dimension[Kaplan1970] from the given Lyapunov exponents λs.

Description

The Kaplan-Yorke dimension is simply the point where cumsum(λs) becomes zero (interpolated):

\[ D_{KY} = k + \frac{\sum_{i=1}^k \lambda_i}{|\lambda_{k+1}|},\quad k = \max_j \left[ \sum_{i=1}^j \lambda_i > 0 \right].\]

If the sum of the exponents never becomes negative the function will return the length of the input vector.

Useful in combination with lyapunovspectrum from ChaosTools.jl.

source

Higuchi dimension

FractalDimensions.higuchi_dimFunction
higuchi_dim(x::AbstractVector [, ks])

Estimate the Higuchi dimension[Higuchi1988] of the graph of x.

Description

The Higuchi dimension is a number Δ ∈ [1, 2] that quantifies the roughness of the graph of the function x(t), assuming here that x is equi-sampled, like in the original paper.

The method estimates how the length of the graph increases as a function of the indices difference (which, in this context, is equivalent with differences in t). Specifically, we calculate the average length versus k as

\[L_m(k) = \frac{N-1}{\lfloor \frac{N-m}{k} floor k^2} +\left[\sum_{j:|i-j| > w} B(||X_i - X_j|| < \epsilon)\right]^{q-1}\right]^{1/(q-1)}\]

where

\[\alpha_i = 1 / (N (\max(N-w, i) - \min(w + 1, i))^{(q-1)})\]

with $N$ the length of X and $B$ gives 1 if its argument is true. w is the Theiler window.

See the article of Grassberger for the general definition (Grassberger, 2007) and the book "Nonlinear Time Series Analysis" (Kantz and Schreiber, 2003), Ch. 6, for a discussion around choosing best values for w, and Ch. 11.3 for the explicit definition of the q-order correlationsum. Note that the formula in 11.3 is incorrect, but corrected here, indices are adapted to take advantage of all available points and also note that we immediatelly exponentiate $C_q$ to $1/(q-1)$, so that it scales exponentially as $C_q \propto \varepsilon ^\Delta_q$ versus the size $\varepsilon$.

source

Box-assisted version

FractalDimensions.boxed_correlationsumFunction
boxed_correlationsum(X::AbstractStateSpaceSet, εs, r0 = maximum(εs); kwargs...) → Cs

Estimate the correlationsum for each size ε ∈ εs using an optimized algorithm that first distributes data into boxes of size r0, and then computes the correlation sum for each box and each neighboring box of each box. This method is much faster than correlationsum, provided that the box size r0 is significantly smaller than the attractor length. Good choices for r0 are estimate_r0_buenoorovio or estimate_r0_theiler.

boxed_correlationsum(X::AbstractStateSpaceSet; kwargs...) → εs, Cs

In this method the minimum inter-point distance and estimate_r0_buenoorovio of X are used to estimate suitable εs for the calculation, which are also returned.

Keyword arguments

  • q = 2 : The order of the correlation sum.
  • P = 2 : The prism dimension.
  • w = 0 : The Theiler window.
  • show_progress = false : Whether to display a progress bar for the calculation.
  • norm = Euclidean() : Distance norm.

Description

C_q(ε) is calculated for every ε ∈ εs and each of the boxes to then be summed up afterwards. The method of splitting the data into boxes was implemented according to (Theiler, 1987). w is the Theiler window. P is the prism dimension. If P is unequal to the dimension of the data, only the first P dimensions are considered for the box distribution (this is called the prism-assisted version). By default P is 2, which is the version suggested by [Bueno2007]. Alternative for P is the prismdim_theiler. Note that only when P = dimension(X) the boxed version is guaranteed to be exact to the original correlationsum. For any other P, some point pairs that should have been included may be skipped due to having smaller distance in the remaining dimensions, but larger distance in the first P dimensions.

source
FractalDimensions.estimate_r0_buenoorovioFunction
estimate_r0_buenoorovio(X::AbstractStateSpaceSet, P = 2) → r0, ε0

Estimate a reasonable size for boxing X, proposed by Bueno-Orovio and Pérez-García (Bueno-Orovio and P{é}rez-Garc{ı́}a, 2007), before calculating the correlation dimension as presented by Theiler1983. Return the size r0 and the minimum interpoint distance ε0 in the data.

If instead of boxes, prisms are chosen everything stays the same but P is the dimension of the prism. To do so the dimension ν is estimated by running the algorithm by Grassberger and Procaccia (Grassberger and Procaccia, 1983) with √N points where N is the number of total data points. An effective size of the attractor is calculated by boxing a small subset of size N/10 into boxes of sidelength r_ℓ and counting the number of filled boxes η_ℓ.

\[\ell = r_\ell \eta_\ell ^{1/\nu}\]

The optimal number of filled boxes η_opt is calculated by minimising the number of calculations.

\[\eta_\textrm{opt} = N^{2/3}\cdot \frac{3^\nu - 1}{3^P - 1}^{1/2}.\]

P is the dimension of the data or the number of edges on the prism that don't span the whole dataset.

Then the optimal boxsize $r_0$ computes as

\[r_0 = \ell / \eta_\textrm{opt}^{1/\nu}.\]

source
FractalDimensions.estimate_r0_theilerFunction
estimate_r0_theiler(X::AbstractStateSpaceSet) → r0, ε0

Estimate a reasonable size for boxing the data X before calculating the boxed_correlationsum proposed by (Theiler, 1987). Return the boxing size r0 and minimum inter-point distance in X, ε0.

To do so the dimension is estimated by running the algorithm by Grassberger and Procaccia (Grassberger and Procaccia, 1983) with √N points where N is the number of total data points. Then the optimal boxsize $r_0$ computes as

\[r_0 = R (2/N)^{1/\nu}\]

where $R$ is the size of the chaotic attractor and $\nu$ is the estimated dimension.

source

Fixed mass correlation sum

FractalDimensions.fixedmass_correlation_dimFunction
fixedmass_correlation_dim(X [, max_j]; kwargs...)

Use the fixed mass algorithm for computing the correlation sum, and use the result to compute the correlation dimension Δ_M of X.

This function does something extremely simple:

rs, ys = fixedmass_correlationsum(X, args...; kwargs...)
+slopefit(rs, ys)[1]
source
FractalDimensions.fixedmass_correlationsumFunction
fixedmass_correlationsum(X [, max_j]; metric = Euclidean(), M = length(X)) → rs, ys

A fixed mass algorithm for the calculation of the correlationsum, and subsequently a fractal dimension $\Delta$, with max_j the maximum number of neighbours that should be considered for the calculation.

By default max_j = clamp(N*(N-1)/2, 5, 32) with N the data length.

Keyword arguments

  • M defines the number of points considered for the averaging of distances, randomly subsampling them from X.
  • metric = Euclidean() is the distance metric.
  • start_j = 4 computes the equation below starting from j = start_j. Typically the first j values have not converged to the correct scaling of the fractal dimension.

Description

"Fixed mass" algorithms mean that instead of trying to find all neighboring points within a radius, one instead tries to find the max radius containing j points. A correlation sum is obtained with this constrain, and equivalently the mean radius containing k points. Based on this, one can calculate $\Delta$ approximating the information dimension. The implementation here is due to to (Grassberger, 1988), which defines

\[Ψ(j) - \log N \sim \Delta \times \overline{\log \left( r_{(j)}\right)}\]

where $\Psi(j) = \frac{\text{d} \log Γ(j)}{\text{d} j}$ is the digamma function, rs = $\overline{\log \left( r_{(j)}\right)}$ is the mean logarithm of a radius containing j neighboring points, and ys = $\Psi(j) - \log N$ ($N$ is the length of the data). The amount of neighbors found $j$ range from 2 to max_j. The numbers are also converted to base $2$ from base $e$.

$\Delta$ can be computed by using linear_region(rs, ys).

source

Takens best estimate

FractalDimensions.takens_best_estimate_dimFunction
takens_best_estimate_dim(X, εmax, metric = Chebyshev(), εmin = 0)

Use the "Takens' best estimate" [Takens1985][Theiler1988] method for estimating the correlation dimension.

The original formula is

\[\Delta_C \approx \frac{C(\epsilon_\text{max})}{\int_0^{\epsilon_\text{max}}(C(\epsilon) / \epsilon) \, d\epsilon}\]

where $C$ is the correlationsum and $\epsilon_\text{max}$ is an upper cutoff. Here we use the later expression

\[\Delta_C \approx - \frac{1}{\eta},\quad \eta = \frac{1}{(N-1)^*}\sum_{[i, j]^*}\log(||X_i - X_j|| / \epsilon_\text{max})\]

where the sum happens for all $i, j$ so that $i < j$ and $||X_i - X_j|| < \epsilon_\text{max}$. In the above expression, the bias in the original paper has already been corrected, as suggested in [Borovkova1999].

According to [Borovkova1999], introducing a lower cutoff εmin can make the algorithm more stable (no divergence), this option is given but defaults to zero.

If X comes from a delay coordinates embedding of a timseries x, a recommended value for $\epsilon_\text{max}$ is std(x)/4.

You may also use

Δ_C, Δu_C, Δl_C = FractalDimensions.takens_best_estimate(args...)

to obtain the upper and lower 95% confidence intervals. The intervals are estimated from the log-likelihood function by finding the values of Δ_C where the function has fallen by 2 from its maximum, see e.g. [Barlow] chapter 5.3.

source

Kaplan-Yorke dimension

FractalDimensions.kaplanyorke_dimFunction
kaplanyorke_dim(λs::AbstractVector)

Calculate the Kaplan-Yorke dimension, a.k.a. Lyapunov dimension (Kaplan and Yorke, 1979) from the given Lyapunov exponents λs.

Description

The Kaplan-Yorke dimension is simply the point where cumsum(λs) becomes zero (interpolated):

\[ D_{KY} = k + \frac{\sum_{i=1}^k \lambda_i}{|\lambda_{k+1}|},\quad k = \max_j \left[ \sum_{i=1}^j \lambda_i > 0 \right].\]

If the sum of the exponents never becomes negative the function will return the length of the input vector.

Useful in combination with lyapunovspectrum from ChaosTools.jl.

source

Higuchi dimension

FractalDimensions.higuchi_dimFunction
higuchi_dim(x::AbstractVector [, ks])

Estimate the Higuchi dimension (Higuchi, 1988) of the graph of x.

Description

The Higuchi dimension is a number Δ ∈ [1, 2] that quantifies the roughness of the graph of the function x(t), assuming here that x is equi-sampled, like in the original paper.

The method estimates how the length of the graph increases as a function of the indices difference (which, in this context, is equivalent with differences in t). Specifically, we calculate the average length versus k as

\[L_m(k) = \frac{N-1}{\lfloor \frac{N-m}{k} floor k^2} \sum_{i=1}^{\lfloor \frac{N-m}{k} \rfloor} |X_N(m+ik)-X_N(m+(i-1)k)| \\ -L(k) = \frac{1}{k} \sum_{m=1}^k L_m(k)\]

and then use linear_region in -log2.(k) vs log2.(L) as per usual when computing a fractal dimension.

The algorithm chooses default ks to be exponentially spaced in base-2, up to at most 2^8. A user can provide their own ks as a second argument otherwise.

Use FractalDimensions.higuchi_length(x, ks) to obtain $L(k)$ directly.

source

Extreme value value theory dimension

FractalDimensions.extremevaltheory_dims_persistencesFunction
extremevaltheory_dims_persistences(x::AbstractStateSpaceSet, p::Real; kwargs...)

Return the local dimensions Δloc and the persistences θloc for each point in the given set for quantile probability p, according to the estimation done via extreme value theory [Lucarini2016]. The computation is parallelized to available threads (Threads.nthreads()).

See also extremevaltheory_gpdfit_pvalues for obtaining confidence on the results.

Keyword arguments

  • show_progress = true: displays a progress bar.
  • estimator = :mm: how to estimate the σ parameter of the Generalized Pareto Distribution. The local fractal dimension is 1/σ. The possible values are: :exp, :mm, as in estimate_gpd_parameters.
  • compute_persistence = true: whether to aso compute local persistences θloc (also called extremal index). If false, θloc are NaNs.
  • allocate_matrix = false: If true, the code calls a method that attempts to allocate an N×N matrix (N = length(X)) that stores the pairwise Euclidean distances. This method is faster due to optimizations of Distances.pairwise but will error if the computer does not have enough available memory for the matrix allocation.

Description

For each state space point $\mathbf{x}_i$ in X we compute $g_i = -\log(||\mathbf{x}_i - \mathbf{x}_j|| ) \; \forall j = 1, \ldots, N$ with $||\cdot||$ the Euclidean distance. Next, we choose an extreme quantile probability $p$ (e.g., 0.99) for the distribution of $g_i$. We compute $g_p$ as the $p$-th quantile of $g_i$. Then, we collect the exceedances of $g_i$, defined as $E_i = \{ g_i - g_p: g_i \ge g_p \}$, i.e., all values of $g_i$ larger or equal to $g_p$, also shifted by $g_p$. There are in total $n = N(1-q)$ values in $E_i$. According to extreme value theory, in the limit $N \to \infty$ the values $E_i$ follow a two-parameter Generalized Pareto Distribution (GPD) with parameters $\sigma,\xi$ (the third parameter $\mu$ of the GPD is zero due to the positive-definite construction of $E$). Within this extreme value theory approach, the local dimension $\Delta^{(E)}_i$ assigned to state space point $\textbf{x}_i$ is given by the inverse of the $\sigma$ parameter of the GPD fit to the data[Lucarini2012], $\Delta^{(E)}_i = 1/\sigma$. $\sigma$ is estimated according to the estimator keyword.

source
FractalDimensions.estimate_gpd_parametersFunction
estimate_gpd_parameters(X::AbstractVector{<:Real}, estimator::Symbol = :mm)

Estimate and return the parameters σ, ξ of a Generalized Pareto Distribution fit to X, assuming that minimum(X) ≥ 0 and hence the parameter μ is 0 (if not, simply shift X by its minimum), according to the methods provided in [Flavio2023].

Optionally choose the estimator, which can be:

  • :exp: Assume the distribution is exponential instead of GP and get σ from mean of X and set ξ = 0.
  • mm: Standing for "method of moments", estimants are given by

    \[\xi = (\bar{x}^2/s^2 - 1)/2, \quad \sigma = \bar{x}(\bar{x}^2/s^2 + 1)/2\]

    with $\bar{x}$ the sample mean and $s^2$ the sample variance. This estimator only exists if the true distribution ξ value is < 0.5.
source
FractalDimensions.extremevaltheory_gpdfit_pvaluesFunction
extremevaltheory_gpdfit_pvalues(X, p; kw...)

Return various computed quantities that may quantify the significance of the results of extremevaltheory_dims_persistences(X, p; kw...), terms of quantifying how well a Generalized Pareto Distribution (GPD) describes exceedences in the input data.

Keyword arguments

  • show_progress, estimator as in extremevaltheory_dims_persistences
  • TestType = ApproximateOneSampleKSTest: the test type to use. It can be ApproximateOneSampleKSTest, ExactOneSampleKSTest, CramerVonMises. We noticed that OneSampleADTest sometimes yielded nonsensical results: all p-values were equal and were very small ≈ 1e-6.
  • nbins = round(Int, length(X)*(1-p)/20): number of bins to use when computing the histogram of the exceedances for computing the NRMSE. The default value will use equally spaced bins that are equal to the length of the exceedances divided by 20.

Description

The function computes the exceedances $E_i$ for each point $x_i \in X$ as in extremevaltheory_dims_persistences. It returns 5 quantities, all being vectors of length length(X):

  • Es, all exceedences, as a vector of vectors.
  • sigmas, xis the fitted σ, ξ to the GPD fits for each exceedance
  • nrmses the normalized root mean square distance of the fitted GPD to the histogram of the exceedances
  • pvalues the pvalues of a statistical test of the appropriateness of the GPD fit

The output nrmses quantifies the distance between the fitted GPD and the empirical histogram of the exceedances. It is computed as

\[NRMSE = \sqrt{\frac{\sum{(P_j - G_j)^2}{\sum{(P_j - U)^2}}\]

where $P_j$ the empirical (observed) probability at bin $j$, $G_j$ the fitted GPD probability at the midpoint of bin $j$, and $U$ same as $G_j$ but for the uniform distribution. The divisor of the equation normalizes the expression, so that the error of the empirical distribution is normalized to the error of the empirical distribution with fitting it with the uniform distribution. It is expected that NRMSE < 1. The smaller it is, the better the data are approximated by GPD versus uniform distribution.

The output pvalues is a vector of p-values. pvalues[i] corresponds to the p-value of the hypothesis: "The exceedences around point X[i] are sampled from a GPD" versus the alternative hypothesis that they are not. To extract the p-values, we perform a one-sample hypothesis via HypothesisTests.jl to the fitted GPD. Very small p-values then indicate that the hypothesis should be rejected and the data are not well described by a GPD. This can be an indication that we do not have enough data, or that we choose too high of a quantile probability p, or that the data are not suitable in general. This p-value based method for significance has been used in [Faranda2017], but it is unclear precisely how it was used.

For more details on how these quantities may quantify significance, see our review paper.

source

Theiler window

The Theiler window is a concept that is useful when finding neighbors in a dataset that is coming from the sampling of a continuous dynamical system. Itt tries to eliminate spurious "correlations" (wrongly counted neighbors) due to a potentially dense sampling of the trajectory. Typically a good choice for w coincides with the choice an optimal delay time, see DelayEmbeddings.estimate_delay, for any of the timeseries of the dataset.

For more details, see Chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.

StateSpaceSet reference

StateSpaceSets.StateSpaceSetType
StateSpaceSet{D, T} <: AbstractStateSpaceSet{D,T}

A dedicated interface for sets in a state space. It is an ordered container of equally-sized points of length D. Each point is represented by SVector{D, T}. The data are a standard Julia Vector{SVector}, and can be obtained with vec(ssset::StateSpaceSet). Typically the order of points in the set is the time direction, but it doesn't have to be.

When indexed with 1 index, StateSpaceSet is like a vector of points. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables. When iterated over, it iterates over its contained points. See description of indexing below for more.

StateSpaceSet also supports almost all sensible vector operations like append!, push!, hcat, eachrow, among others.

Description of indexing

In the following let i, j be integers, typeof(X) <: AbstractStateSpaceSet and v1, v2 be <: AbstractVector{Int} (v1, v2 could also be ranges, and for performance benefits make v2 an SVector{Int}).

  • X[i] == X[i, :] gives the ith point (returns an SVector)
  • X[v1] == X[v1, :], returns a StateSpaceSet with the points in those indices.
  • X[:, j] gives the jth variable timeseries (or collection), as Vector
  • X[v1, v2], X[:, v2] returns a StateSpaceSet with the appropriate entries (first indices being "time"/point index, while second being variables)
  • X[i, j] value of the jth variable, at the ith timepoint

Use Matrix(ssset) or StateSpaceSet(matrix) to convert. It is assumed that each column of the matrix is one variable. If you have various timeseries vectors x, y, z, ... pass them like StateSpaceSet(x, y, z, ...). You can use columns(dataset) to obtain the reverse, i.e. all columns of the dataset in a tuple.

StateSpaceSets.standardizeFunction
standardize(d::StateSpaceSet) → r

Create a standardized version of the input set where each column is transformed to have mean 0 and standard deviation 1.

standardize(x::AbstractVector{<:Real}) = (x - mean(x))/std(x)
+L(k) = \frac{1}{k} \sum_{m=1}^k L_m(k)\]

and then use linear_region in -log2.(k) vs log2.(L) as per usual when computing a fractal dimension.

The algorithm chooses default ks to be exponentially spaced in base-2, up to at most 2^8. A user can provide their own ks as a second argument otherwise.

Use FractalDimensions.higuchi_length(x, ks) to obtain $L(k)$ directly.

source

Extreme value value theory dimension

FractalDimensions.extremevaltheory_dimFunction
extremevaltheory_dim(X::StateSpaceSet, p::Real; kwargs...) → Δ

Convenience syntax that returns the mean of the local dimensions of extremevaltheory_dims_persistences, which approximates a fractal dimension of X using extreme value theory and quantile probability p.

See also extremevaltheory_gpdfit_pvalues for obtaining confidence on the results.

source
FractalDimensions.extremevaltheory_dims_persistencesFunction
extremevaltheory_dims_persistences(x::AbstractStateSpaceSet, p::Real; kwargs...)

Return the local dimensions Δloc and the persistences θloc for each point in the given set for quantile probability p, according to the estimation done via extreme value theory (Lucarini et al., 2016). The computation is parallelized to available threads (Threads.nthreads()).

See also extremevaltheory_gpdfit_pvalues for obtaining confidence on the results.

Keyword arguments

  • show_progress = true: displays a progress bar.
  • estimator = :mm: how to estimate the σ parameter of the Generalized Pareto Distribution. The local fractal dimension is 1/σ. The possible values are: :exp, :mm, as in estimate_gpd_parameters.
  • compute_persistence = true: whether to aso compute local persistences θloc (also called extremal index). If false, θloc are NaNs.
  • allocate_matrix = false: If true, the code calls a method that attempts to allocate an N×N matrix (N = length(X)) that stores the pairwise Euclidean distances. This method is faster due to optimizations of Distances.pairwise but will error if the computer does not have enough available memory for the matrix allocation.

Description

For each state space point $\mathbf{x}_i$ in X we compute $g_i = -\log(||\mathbf{x}_i - \mathbf{x}_j|| ) \; \forall j = 1, \ldots, N$ with $||\cdot||$ the Euclidean distance. Next, we choose an extreme quantile probability $p$ (e.g., 0.99) for the distribution of $g_i$. We compute $g_p$ as the $p$-th quantile of $g_i$. Then, we collect the exceedances of $g_i$, defined as $E_i = \{ g_i - g_p: g_i \ge g_p \}$, i.e., all values of $g_i$ larger or equal to $g_p$, also shifted by $g_p$. There are in total $n = N(1-q)$ values in $E_i$. According to extreme value theory, in the limit $N \to \infty$ the values $E_i$ follow a two-parameter Generalized Pareto Distribution (GPD) with parameters $\sigma,\xi$ (the third parameter $\mu$ of the GPD is zero due to the positive-definite construction of $E$). Within this extreme value theory approach, the local dimension $\Delta^{(E)}_i$ assigned to state space point $\textbf{x}_i$ is given by the inverse of the $\sigma$ parameter of the GPD fit to the data[Lucarini2012], $\Delta^{(E)}_i = 1/\sigma$. $\sigma$ is estimated according to the estimator keyword.

A more precise description of this process is given in the review paper (Datseris et al., 2023).

source
FractalDimensions.extremevaltheory_local_dim_persistenceFunction
extremevaltheory_local_dim_persistence(X::StateSpaceSet, ζ, p::Real; kw...)

Return the local values Δ, θ of the fractal dimension and persistence of X around a state space point ζ. p and kw are as in extremevaltheory_dims_persistences.

source
FractalDimensions.extremal_index_suevegesFunction
extremal_index_sueveges(y::AbstractVector, p)

Compute the extremal index θ of y through the Süveges formula for quantile probability p, using the algorithm of (Süveges, 2007).

source
FractalDimensions.estimate_gpd_parametersFunction
estimate_gpd_parameters(X::AbstractVector{<:Real}, estimator::Symbol)

Estimate and return the parameters σ, ξ of a Generalized Pareto Distribution fit to X, assuming that minimum(X) ≥ 0 and hence the parameter μ is 0 (if not, simply shift X by its minimum), according to the methods provided in Pons2023.

The estimator can be:

  • :exp: Assume the distribution is exponential instead of GP and get σ from mean of X and set ξ = 0.
  • mm: Standing for "method of moments", estimants are given by

    \[\xi = (\bar{x}^2/s^2 - 1)/2, \quad \sigma = \bar{x}(\bar{x}^2/s^2 + 1)/2\]

    with $\bar{x}$ the sample mean and $s^2$ the sample variance. This estimator only exists if the true distribution ξ value is < 0.5.
source
FractalDimensions.extremevaltheory_gpdfit_pvaluesFunction
extremevaltheory_gpdfit_pvalues(X, p; kw...)

Return various computed quantities that may quantify the significance of the results of extremevaltheory_dims_persistences(X, p; kw...), terms of quantifying how well a Generalized Pareto Distribution (GPD) describes exceedences in the input data.

Keyword arguments

  • show_progress, estimator as in extremevaltheory_dims_persistences
  • TestType = ApproximateOneSampleKSTest: the test type to use. It can be ApproximateOneSampleKSTest, ExactOneSampleKSTest, CramerVonMises. We noticed that OneSampleADTest sometimes yielded nonsensical results: all p-values were equal and were very small ≈ 1e-6.
  • nbins = round(Int, length(X)*(1-p)/20): number of bins to use when computing the histogram of the exceedances for computing the NRMSE. The default value will use equally spaced bins that are equal to the length of the exceedances divided by 20.

Description

The function computes the exceedances $E_i$ for each point $x_i \in X$ as in extremevaltheory_dims_persistences. It returns 5 quantities, all being vectors of length length(X):

  • Es, all exceedences, as a vector of vectors.
  • sigmas, xis the fitted σ, ξ to the GPD fits for each exceedance
  • nrmses the normalized root mean square distance of the fitted GPD to the histogram of the exceedances
  • pvalues the pvalues of a statistical test of the appropriateness of the GPD fit

The output nrmses quantifies the distance between the fitted GPD and the empirical histogram of the exceedances. It is computed as

\[NRMSE = \sqrt{\frac{\sum{(P_j - G_j)^2}{\sum{(P_j - U)^2}}\]

where $P_j$ the empirical (observed) probability at bin $j$, $G_j$ the fitted GPD probability at the midpoint of bin $j$, and $U$ same as $G_j$ but for the uniform distribution. The divisor of the equation normalizes the expression, so that the error of the empirical distribution is normalized to the error of the empirical distribution with fitting it with the uniform distribution. It is expected that NRMSE < 1. The smaller it is, the better the data are approximated by GPD versus uniform distribution.

The output pvalues is a vector of p-values. pvalues[i] corresponds to the p-value of the hypothesis: "The exceedences around point X[i] are sampled from a GPD" versus the alternative hypothesis that they are not. To extract the p-values, we perform a one-sample hypothesis via HypothesisTests.jl to the fitted GPD. Very small p-values then indicate that the hypothesis should be rejected and the data are not well described by a GPD. This can be an indication that we do not have enough data, or that we choose too high of a quantile probability p, or that the data are not suitable in general. This p-value based method for significance has been used in [Faranda2017], but it is unclear precisely how it was used.

For more details on how these quantities may quantify significance, see our review paper.

source

Theiler window

The Theiler window is a concept that is useful when finding neighbors in a dataset that is coming from the sampling of a continuous dynamical system. Itt tries to eliminate spurious "correlations" (wrongly counted neighbors) due to a potentially dense sampling of the trajectory. Typically a good choice for w coincides with the choice an optimal delay time, see DelayEmbeddings.estimate_delay, for any of the timeseries of the dataset.

For more details, see Chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.

StateSpaceSet reference

StateSpaceSets.StateSpaceSetType
StateSpaceSet{D, T} <: AbstractStateSpaceSet{D,T}

A dedicated interface for sets in a state space. It is an ordered container of equally-sized points of length D. Each point is represented by SVector{D, T}. The data are a standard Julia Vector{SVector}, and can be obtained with vec(ssset::StateSpaceSet). Typically the order of points in the set is the time direction, but it doesn't have to be.

When indexed with 1 index, StateSpaceSet is like a vector of points. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables. When iterated over, it iterates over its contained points. See description of indexing below for more.

StateSpaceSet also supports almost all sensible vector operations like append!, push!, hcat, eachrow, among others.

Description of indexing

In the following let i, j be integers, typeof(X) <: AbstractStateSpaceSet and v1, v2 be <: AbstractVector{Int} (v1, v2 could also be ranges, and for performance benefits make v2 an SVector{Int}).

  • X[i] == X[i, :] gives the ith point (returns an SVector)
  • X[v1] == X[v1, :], returns a StateSpaceSet with the points in those indices.
  • X[:, j] gives the jth variable timeseries (or collection), as Vector
  • X[v1, v2], X[:, v2] returns a StateSpaceSet with the appropriate entries (first indices being "time"/point index, while second being variables)
  • X[i, j] value of the jth variable, at the ith timepoint

Use Matrix(ssset) or StateSpaceSet(matrix) to convert. It is assumed that each column of the matrix is one variable. If you have various timeseries vectors x, y, z, ... pass them like StateSpaceSet(x, y, z, ...). You can use columns(dataset) to obtain the reverse, i.e. all columns of the dataset in a tuple.

StateSpaceSets.standardizeFunction
standardize(d::StateSpaceSet) → r

Create a standardized version of the input set where each column is transformed to have mean 0 and standard deviation 1.

standardize(x::AbstractVector{<:Real}) = (x - mean(x))/std(x)

References

diff --git a/dev/search/index.html b/dev/search/index.html index e6d5770..f0c07a7 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · FractalDimensions.jl

Loading search...

    +Search · FractalDimensions.jl

    Loading search...

      diff --git a/dev/search_index.js b/dev/search_index.js index d8d7609..b39c141 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"#FractalDimensions.jl","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"FractalDimensions","category":"page"},{"location":"#FractalDimensions","page":"FractalDimensions.jl","title":"FractalDimensions","text":"FractalDimensions.jl\n\n(Image: ) (Image: ) (Image: ) (Image: CI) (Image: codecov) (Image: Package Downloads)\n\nA Julia package that estimates various definitions of fractal dimension from data. It can be used as a standalone package, or as part of DynamicalSystems.jl.\n\nTo install it, run import Pkg; Pkg.add(\"FractalDimensions\").\n\nAll further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.\n\nPreviously, this package was part of ChaosTools.jl.\n\nCitation\n\nIf you use this package in a publication, please cite the paper below:\n\n@ARTICLE{FractalDimensions.jl,\n title = \"Estimating the fractal dimension: a comparative review and open\n source implementations\",\n author = \"Datseris, George and Kottlarz, Inga and Braun, Anton P and\n Parlitz, Ulrich\",\n publisher = \"arXiv\",\n year = 2021,\n doi = {10.48550/ARXIV.2109.05937},\n url = {https://arxiv.org/abs/2109.05937},\n}\n\n\n\n\n\n","category":"module"},{"location":"#Introduction","page":"FractalDimensions.jl","title":"Introduction","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"note: Note\nThis package is accompanying a review paper on estimating the fractal dimension: https://arxiv.org/abs/2109.05937. The paper is continuing the discussion of chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"There are numerous methods that one can use to calculate a so-called \"dimension\" of a dataset which in the context of dynamical systems is called the Fractal dimension. One way to do this is to estimate the scaling behaviour of some quantity as a size/scale increases. In the Fractal dimension example below, one finds the scaling of the correlation sum versus a ball radius. In this case, it approximately holds $ \\log(C) \\approx \\Delta\\log(\\varepsilon) $ for radius varepsilon. The scaling of many other quantities can be estimated as well, such as the generalized entropy, the Higuchi length, or others provided here.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"To actually find Delta, one needs to find a linearly scaling region in the graph log(C) vs. log(varepsilon) and estimate its slope. Hence, identifying a linear region is central to estimating a fractal dimension. That is why, the section Linear scaling regions is of central importance for this documentation.","category":"page"},{"location":"#Fractal-dimension-example","page":"FractalDimensions.jl","title":"Fractal dimension example","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"In this simplest example we will calculate the fractal dimension of the chaotic attractor of the Hénon map (for default parameters). For this example, we will generate the data on the spot:","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"using DynamicalSystemsBase # for simulating dynamical systems\nusing CairoMakie # for plotting\n\nhenon_rule(x, p, n) = SVector(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])\nu0 = zeros(2)\np0 = [1.4, 0.3]\nhenon = DeterministicIteratedMap(henon_rule, u0, p0)\n\nX, t = trajectory(henon, 20_000; Ttr = 100)\nscatter(X[:, 1], X[:, 2]; color = (\"black\", 0.01), markersize = 4)","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"instead of simulating the set X we could load it from disk, e.g., if there was a text file with two columns as x and y coordinates, we would load it as","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"using DelimitedFiles\nfile = \"path/to/file.csv\"\nM = readdlm(file) # here `M` is a metrix with two columns\nX = StateSpaceSet(M) # important to convert to a state space set","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"After we have X, we can start computing a fractal dimension and for this example we will use the correlationsum. Our goal is to compute the correlation sum of X for many different sizes/radii ε. This is as simple as","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"using FractalDimensions\nες = 2 .^ (-15:0.5:5) # semi-random guess\nCs = correlationsum(X, ες; show_progress = false)","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"For a fractal set X dynamical systems theory says that there should be an exponential relationship between the correlation sum and the sizes:","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"xs = log2.(ες)\nys = log2.(Cs)\nscatterlines(xs, ys; axis = (ylabel = L\"\\log(C_2)\", xlabel = L\"\\log (\\epsilon)\"))","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"The slope of the linear scaling region of the above plot is the fractal dimension (based on the correlation sum).","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"Given that we see the plot, we can estimate where the linear scaling region starts and ends. This is generally done using LargestLinearRegion in slopefit. But first, let's visualize what the method does, as it uses linear_regions.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"lrs, slopes = linear_regions(xs, ys, tol = 0.25)\nfig = Figure()\nax = Axis(fig[1,1]; ylabel = L\"\\log(C_2)\", xlabel = L\"\\log (\\epsilon)\")\nfor r in lrs\n scatterlines!(ax, xs[r], ys[r])\nend\nfig","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"The LargestLinearRegion method finds, and computes the slope of, the largest region:","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"Δ = slopefit(xs, ys, LargestLinearRegion())","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"This result is an approximation of a fractal dimension.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"The whole above pipeline we went through is bundled in grassberger_proccacia_dim. Similar work is done by generalized_dim and many other functions.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"danger: Be wary when using `xxxxx_dim`\nAs stated clearly by the documentation strings, all pre-made dimension estimating functions (ending in _dim) perform a lot of automated steps, each having its own heuristic choices for function default values. They are more like convenient bundles with on-average good defaults, rather than precise functions. You should be careful when considering the validity of the returned number!","category":"page"},{"location":"#Linear-scaling-regions","page":"FractalDimensions.jl","title":"Linear scaling regions","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"slopefit\nLinearRegression\nLargestLinearRegion\nlinear_regions\nAllSlopesDistribution\nestimate_boxsizes\nminimum_pairwise_distance","category":"page"},{"location":"#FractalDimensions.slopefit","page":"FractalDimensions.jl","title":"FractalDimensions.slopefit","text":"slopefit(x, y [, t::SLopeFit]; kw...) → s, s05, s95\n\nFit a linear scaling region in the curve of the two AbstractVectors y versus x using t as the estimation method. Return the estimated slope, as well as the confidence intervals for it.\n\nThe methods t that can be used for the estimation are:\n\nLinearRegression\nLargestLinearRegion (default)\nAllSlopesDistribution\n\nThe keyword ignore_saturation = true ignores saturation that (sometimes) happens at the start and end of the curve y(x), where the curve flattens. The keyword sat_threshold = 0.01 decides what saturation is: while abs(y[i]-y[i+1]) < sat_threshold we are in a saturation regime. Said differently, slopes with value sat_threshold/dx with dx = x[i+1] - x[i] are neglected.\n\nThe keyword ci = 0.95 specifies which quantile (and the 1 - quantile) the confidence interval values are returned at, and by defualt it is 95% (and hence also 5%).\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.LinearRegression","page":"FractalDimensions.jl","title":"FractalDimensions.LinearRegression","text":"LinearRegression <: SLopeFit\nLinearRegression()\n\nStandard linear regression fit to all available data. Estimation of the confidence intervals is based om the standard error of the slope following a T-distribution, see:\n\nhttps://stattrek.com/regression/slope-confidence-interval\n\n\n\n\n\n","category":"type"},{"location":"#FractalDimensions.LargestLinearRegion","page":"FractalDimensions.jl","title":"FractalDimensions.LargestLinearRegion","text":"LargestLinearRegion <: SlopeFit\nLargestLinearRegion(; dxi::Int = 1, tol = 0.25)\n\nIdentify regions where the curve y(x) is linear, by scanning the x-axis every dxi indices sequentially (e.g. at x[1] to x[5], x[5] to x[10], x[10] to x[15] and so on if dxi=5).\n\nIf the slope (calculated via linear regression) of a region of width dxi is approximatelly equal to that of the previous region, within relative tolerance tol and absolute tolerance 0, then these two regions belong to the same linear region.\n\nThe largest such region is then used to estimate the slope via standard linear regression of all points belonging to the largest linear region. \"Largest\" here means the region that covers the more extent along the x-axis.\n\nUse linear_regions if you wish to obtain the decomposition into linear regions.\n\n\n\n\n\n","category":"type"},{"location":"#FractalDimensions.linear_regions","page":"FractalDimensions.jl","title":"FractalDimensions.linear_regions","text":"linear_regions(x, y; dxi, tol) → lrs, tangents\n\nApply the algorithm described by LargestLinearRegion, and return the indices of x that correspond to the linear regions, lrs, and the tangents at each region (obtained via a second linear regression at each accumulated region). lrs is hence a vector of UnitRanges.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.AllSlopesDistribution","page":"FractalDimensions.jl","title":"FractalDimensions.AllSlopesDistribution","text":"AllSlopesDistribution <: SlopeFit\nAllSlopesDistribution()\n\nEstimate a slope by computing the distribution of all possible slopes that can be estimated from the curve y(x), according to the method by [Deshmukh2021]. The returned slope is the distribution mean and the confidence intervals are simply the corresponding quantiles of the distribution.\n\nNot implemented yet, the method is here as a placeholder.\n\n[Deshmukh2021]: Deshmukh et al., Toward automated extraction and characterization of scaling regions in dynamical systems, Chaos 31, 123102 (2021).\n\n\n\n\n\n","category":"type"},{"location":"#FractalDimensions.estimate_boxsizes","page":"FractalDimensions.jl","title":"FractalDimensions.estimate_boxsizes","text":"estimate_boxsizes(X::AbstractStateSpaceSet; kwargs...) → εs\n\nReturn k exponentially spaced values: εs = base .^ range(lower + w, upper + z; length = k), that are a good estimate for sizes ε that are used in calculating a Fractal Dimension. It is strongly recommended to standardize input dataset before using this function.\n\nLet d₋ be the minimum pair-wise distance in X, d₋ = dminimum_pairwise_distance(X). Let d₊ be the average total length of X, d₊ = mean(ma - mi) with mi, ma = minmaxima(X). Then lower = log(base, d₋) and upper = log(base, d₊). Because by default w=1, z=-1, the returned sizes are an order of mangitude larger than the minimum distance, and an order of magnitude smaller than the maximum distance.\n\nKeywords\n\nw = 1, z = -1, k = 16 : as explained above.\nbase = MathConstants.e : the base used in the log function.\nwarning = true: Print some warnings for bad estimates.\nautoexpand = true: If the final estimated range does not cover at least 2 orders of magnitude, it is automatically expanded by setting w -= we and z -= ze. You can set different default values to the keywords we = w, ze = z.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.minimum_pairwise_distance","page":"FractalDimensions.jl","title":"FractalDimensions.minimum_pairwise_distance","text":"minimum_pairwise_distance(X::StateSpaceSet, kdtree = dimension(X) < 10, metric = Euclidean())\n\nReturn min_d, min_pair: the minimum pairwise distance of all points in the dataset, and the corresponding point pair. The third argument is a switch of whether to use KDTrees or a brute force search.\n\n\n\n\n\n","category":"function"},{"location":"#Generalized-(entropy)-dimension","page":"FractalDimensions.jl","title":"Generalized (entropy) dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"Based on the definition of the Generalized entropy (genentropy), one can calculate an appropriate dimension, called generalized dimension:","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"generalized_dim\nmolteno_dim\nmolteno_boxing","category":"page"},{"location":"#FractalDimensions.generalized_dim","page":"FractalDimensions.jl","title":"FractalDimensions.generalized_dim","text":"generalized_dim(X::StateSpaceSet [, sizes]; q = 1, base = 2) -> Δ_q\n\nReturn the q order generalized dimension of X, by calculating its histogram-based Rényi entropy for each ε ∈ sizes.\n\nThe case of q = 0 is often called \"capacity\" or \"box-counting\" dimension, while q = 1 is the \"information\" dimension.\n\nDescription\n\nThe returned dimension is approximated by the (inverse) power law exponent of the scaling of the Renyi entropy H_q, versus the box size ε, where ε ∈ sizes:\n\nH_q approx -Delta_qlog_b(varepsilon)\n\nH_q is calculated using ComplexityMeasures: Renyi, ValueHistogram, entropy, i.e., by doing a histogram of the data with a given box size.\n\nCalling this function performs a lot of automated steps:\n\nA vector of box sizes is decided by calling sizes = estimate_boxsizes(dataset), if sizes is not given.\nFor each element of sizes the appropriate entropy is calculated as\nH = [entropy(Renyi(; q, base), ValueHistogram(ε), data) for ε ∈ sizes]\nLet x = -log.(sizes).\nThe curve H(x) is decomposed into linear regions, using slopefit(x, h)[1].\nThe biggest linear region is chosen, and a fit for the slope of that region is performed using the function linear_region, which does a simple linear regression fit using linreg. This slope is the return value of generalized_dim.\n\nBy doing these steps one by one yourself, you can adjust the keyword arguments given to each of these function calls, refining the accuracy of the result. The source code of this function is only 3 lines of code.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.molteno_dim","page":"FractalDimensions.jl","title":"FractalDimensions.molteno_dim","text":"molteno_dim(X::AbstractStateSpaceSet; k0::Int = 10, q = 1.0, base = 2)\n\nReturn an estimate of the generalized_dim of X using the algorithm by [Molteno1993]. This function is a simple utilization of the probabilities estimated by molteno_boxing so see that function for more details. Here the entropy of the probabilities is computed at each size, and a line is fitted in the entropy vs log(size) graph, just like in generalized_dim.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.molteno_boxing","page":"FractalDimensions.jl","title":"FractalDimensions.molteno_boxing","text":"molteno_boxing(X::AbstractStateSpaceSet; k0::Int = 10) → (probs, εs)\n\nDistribute X into boxes whose size is halved in each step, according to the algorithm by [Molteno1993]. Division stops if the average number of points per filled box falls below the threshold k0.\n\nReturn probs, a vector of Probabilities of finding points in boxes for different box sizes, and the corresponding box sizes εs. These outputs are used in molteno_dim.\n\nDescription\n\nProject the data onto the whole interval of numbers that is covered by UInt64. The projected data is distributed into boxes whose size decreases by factor 2 in each step. For each box that contains more than one point 2^D new boxes are created where D is the dimension of the data.\n\nThe process of dividing the data into new boxes stops when the number of points over the number of filled boxes falls below k0. The box sizes εs are calculated and returned together with the probs.\n\nThis algorithm is faster than the traditional approach of using ValueHistogram(ε::Real), but it is only suited for low dimensional data since it divides each box into 2^D new boxes if D is the dimension. For large D this leads to low numbers of box divisions before the threshold is passed and the divison stops. This results to a low number of data points to fit the dimension to and thereby a poor estimate.\n\n[Molteno1993]: Molteno, T. C. A., Fast O(N) box-counting algorithm for estimating dimensions. Phys. Rev. E 48, R3263(R) (1993)\n\n\n\n\n\n","category":"function"},{"location":"#Correlation-sum-based-dimension","page":"FractalDimensions.jl","title":"Correlation sum based dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"grassberger_proccacia_dim\ncorrelationsum","category":"page"},{"location":"#FractalDimensions.grassberger_proccacia_dim","page":"FractalDimensions.jl","title":"FractalDimensions.grassberger_proccacia_dim","text":"grassberger_proccacia_dim(X::AbstractStateSpaceSet, εs = estimate_boxsizes(data); kwargs...)\n\nUse the method of Grassberger and Proccacia[Grassberger1983], and the correction by Theiler[Theiler1986], to estimate the correlation dimension Δ_C of X.\n\nThis function does something extremely simple:\n\ncm = correlationsum(data, εs; kwargs...)\nΔ_C = slopefit(rs, ys)(log2.(sizes), log2.(cm))[1]\n\ni.e. it calculates correlationsum for various radii and then tries to find a linear region in the plot of the log of the correlation sum versus log(ε).\n\nSee correlationsum for the available keywords. See also takens_best_estimate, boxassisted_correlation_dim.\n\n[Grassberger1983]: Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983) \n\n[Theiler1986]: Theiler, Spurious dimension from correlation algorithms applied to limited time-series data. Physical Review A, 34\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.correlationsum","page":"FractalDimensions.jl","title":"FractalDimensions.correlationsum","text":"correlationsum(X, ε::Real; w = 0, norm = Euclidean(), q = 2) → C_q(ε)\n\nCalculate the q-order correlation sum of X (StateSpaceSet or timeseries) for a given radius ε and norm. They keyword show_progress = true can be used to display a progress bar for large X.\n\ncorrelationsum(X, εs::AbstractVector; w, norm, q) → C_q(ε)\n\nIf εs is a vector, C_q is calculated for each ε ∈ εs more efficiently. Multithreading is also enabled over the available threads (Threads.nthreads()). The function boxed_correlationsum is typically faster if the dimension of X is small and if maximum(εs) is smaller than the size of X.\n\nKeyword arguments\n\nq = 2: order of the correlation sum\nnorm = Euclidean(): distance norm\nw = 0: Theiler window\nshow_progress = true: display a progress bar\n\nDescription\n\nThe correlation sum is defined as follows for q=2:\n\nC_2(epsilon) = frac2(N-w)(N-w-1)sum_i=1^Nsum_j=1+w+i^N\nB(X_i - X_j epsilon)\n\nfor as follows for q≠2\n\nC_q(epsilon) = left sum_i=1^N alpha_i\nleftsum_ji-j w B(X_i - X_j epsilon)right^q-1right^1(q-1)\n\nwhere\n\nalpha_i = 1 (N (max(N-w i) - min(w + 1 i))^(q-1))\n\nwith N the length of X and B gives 1 if its argument is true. w is the Theiler window.\n\nSee the article of Grassberger for the general definition [Grassberger2007] and the book \"Nonlinear Time Series Analysis\" [Kantz2003], Ch. 6, for a discussion around choosing best values for w, and Ch. 11.3 for the explicit definition of the q-order correlationsum. Note that the formula in 11.3 is incorrect, but corrected here, indices are adapted to take advantage of all available points and also note that we immediatelly exponentiate C_q to 1(q-1), so that it scales exponentially as C_q propto varepsilon ^Delta_q versus the size varepsilon.\n\n[Grassberger2007]: Peter Grassberger (2007) Grassberger-Procaccia algorithm. Scholarpedia, 2(5):3043.\n\n[Kantz2003]: Kantz, H., & Schreiber, T. (2003). Nonlinear Time Series Analysis, Cambridge University Press.\n\n\n\n\n\n","category":"function"},{"location":"#Box-assisted-version","page":"FractalDimensions.jl","title":"Box-assisted version","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"boxassisted_correlation_dim\nboxed_correlationsum\nprismdim_theiler\nestimate_r0_buenoorovio\nestimate_r0_theiler","category":"page"},{"location":"#FractalDimensions.boxassisted_correlation_dim","page":"FractalDimensions.jl","title":"FractalDimensions.boxassisted_correlation_dim","text":"boxassisted_correlation_dim(X::AbstractStateSpaceSet; kwargs...)\n\nUse the box-assisted optimizations of [Bueno2007] to estimate the correlation dimension Δ_C of X.\n\nThis function does something extremely simple:\n\nεs, Cs = boxed_correlationsum(X; kwargs...)\nslopefit(log2.(εs), log2.(Cs))[1]\n\nand hence see boxed_correlationsum for more information and available keywords.\n\n[Bueno2007]: Bueno-Orovio and Pérez-García, Enhanced box and prism assisted algorithms for computing the correlation dimension. Chaos Solitons & Fractrals, 34(5) \n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.boxed_correlationsum","page":"FractalDimensions.jl","title":"FractalDimensions.boxed_correlationsum","text":"boxed_correlationsum(X::AbstractStateSpaceSet, εs, r0 = maximum(εs); kwargs...) → Cs\n\nEstimate the correlationsum for each size ε ∈ εs using an optimized algorithm that first distributes data into boxes of size r0, and then computes the correlation sum for each box and each neighboring box of each box. This method is much faster than correlationsum, provided that the box size r0 is significantly smaller than the attractor length. Good choices for r0 are estimate_r0_buenoorovio or estimate_r0_theiler.\n\nboxed_correlationsum(X::AbstractStateSpaceSet; kwargs...) → εs, Cs\n\nIn this method the minimum inter-point distance and estimate_r0_buenoorovio of X are used to estimate suitable εs for the calculation, which are also returned.\n\nKeyword arguments\n\nq = 2 : The order of the correlation sum.\nP = 2 : The prism dimension.\nw = 0 : The Theiler window.\nshow_progress = false : Whether to display a progress bar for the calculation.\nnorm = Euclidean() : Distance norm.\n\nDescription\n\nC_q(ε) is calculated for every ε ∈ εs and each of the boxes to then be summed up afterwards. The method of splitting the data into boxes was implemented according to Theiler[Theiler1987]. w is the Theiler window. P is the prism dimension. If P is unequal to the dimension of the data, only the first P dimensions are considered for the box distribution (this is called the prism-assisted version). By default P is 2, which is the version suggested by [Bueno2007]. Alternative for P is the prismdim_theiler. Note that only when P = dimension(X) the boxed version is guaranteed to be exact to the original correlationsum. For any other P, some point pairs that should have been included may be skipped due to having smaller distance in the remaining dimensions, but larger distance in the first P dimensions.\n\n[Theiler1987]: Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.prismdim_theiler","page":"FractalDimensions.jl","title":"FractalDimensions.prismdim_theiler","text":"prismdim_theiler(X)\n\nAn algorithm to find the ideal choice of a prism dimension for boxed_correlationsum using Theiler's original suggestion.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.estimate_r0_buenoorovio","page":"FractalDimensions.jl","title":"FractalDimensions.estimate_r0_buenoorovio","text":"estimate_r0_buenoorovio(X::AbstractStateSpaceSet, P = 2) → r0, ε0\n\nEstimate a reasonable size for boxing X, proposed by Bueno-Orovio and Pérez-García[Bueno2007], before calculating the correlation dimension as presented by Theiler[Theiler1983]. Return the size r0 and the minimum interpoint distance ε0 in the data.\n\nIf instead of boxes, prisms are chosen everything stays the same but P is the dimension of the prism. To do so the dimension ν is estimated by running the algorithm by Grassberger and Procaccia[Grassberger1983] with √N points where N is the number of total data points. An effective size ℓ of the attractor is calculated by boxing a small subset of size N/10 into boxes of sidelength r_ℓ and counting the number of filled boxes η_ℓ.\n\nell = r_ell eta_ell ^1nu\n\nThe optimal number of filled boxes η_opt is calculated by minimising the number of calculations.\n\neta_textrmopt = N^23cdot frac3^nu - 13^P - 1^12\n\nP is the dimension of the data or the number of edges on the prism that don't span the whole dataset.\n\nThen the optimal boxsize r_0 computes as\n\nr_0 = ell eta_textrmopt^1nu\n\n[Bueno2007]: Bueno-Orovio and Pérez-García, Enhanced box and prism assisted algorithms for computing the correlation dimension. Chaos Solitons & Fractrals, 34(5) \n\n[Theiler1987]: Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36\n\n[Grassberger1983]: Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983) \n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.estimate_r0_theiler","page":"FractalDimensions.jl","title":"FractalDimensions.estimate_r0_theiler","text":"estimate_r0_theiler(X::AbstractStateSpaceSet) → r0, ε0\n\nEstimate a reasonable size for boxing the data X before calculating the boxed_correlationsum proposed by Theiler[Theiler1987]. Return the boxing size r0 and minimum inter-point distance in X, ε0.\n\nTo do so the dimension is estimated by running the algorithm by Grassberger and Procaccia[Grassberger1983] with √N points where N is the number of total data points. Then the optimal boxsize r_0 computes as\n\nr_0 = R (2N)^1nu\n\nwhere R is the size of the chaotic attractor and nu is the estimated dimension.\n\n[Theiler1987]: Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36\n\n[Grassberger1983]: Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983) \n\n\n\n\n\n","category":"function"},{"location":"#Fixed-mass-correlation-sum","page":"FractalDimensions.jl","title":"Fixed mass correlation sum","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"fixedmass_correlation_dim\nfixedmass_correlationsum","category":"page"},{"location":"#FractalDimensions.fixedmass_correlation_dim","page":"FractalDimensions.jl","title":"FractalDimensions.fixedmass_correlation_dim","text":"fixedmass_correlation_dim(X [, max_j]; kwargs...)\n\nUse the fixed mass algorithm for computing the correlation sum, and use the result to compute the correlation dimension Δ_M of X.\n\nThis function does something extremely simple:\n\nrs, ys = fixedmass_correlationsum(X, args...; kwargs...)\nslopefit(rs, ys)[1]\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.fixedmass_correlationsum","page":"FractalDimensions.jl","title":"FractalDimensions.fixedmass_correlationsum","text":"fixedmass_correlationsum(X [, max_j]; metric = Euclidean(), M = length(X)) → rs, ys\n\nA fixed mass algorithm for the calculation of the correlationsum, and subsequently a fractal dimension Delta, with max_j the maximum number of neighbours that should be considered for the calculation.\n\nBy default max_j = clamp(N*(N-1)/2, 5, 32) with N the data length.\n\nKeyword arguments\n\nM defines the number of points considered for the averaging of distances, randomly subsampling them from X.\nmetric = Euclidean() is the distance metric.\nstart_j = 4 computes the equation below starting from j = start_j. Typically the first j values have not converged to the correct scaling of the fractal dimension.\n\nDescription\n\n\"Fixed mass\" algorithms mean that instead of trying to find all neighboring points within a radius, one instead tries to find the max radius containing j points. A correlation sum is obtained with this constrain, and equivalently the mean radius containing k points. Based on this, one can calculate Delta approximating the information dimension. The implementation here is due to to [Grassberger1988], which defines\n\nΨ(j) - log N sim Delta times overlinelog left( r_(j)right)\n\nwhere Psi(j) = fractextd log Γ(j)textd j is the digamma function, rs = overlinelog left( r_(j)right) is the mean logarithm of a radius containing j neighboring points, and ys = Psi(j) - log N (N is the length of the data). The amount of neighbors found j range from 2 to max_j. The numbers are also converted to base 2 from base e.\n\nDelta can be computed by using linear_region(rs, ys).\n\n[Grassberger1988]: Peter Grassberger (1988) Finite sample Corrections to Entropy and Dimension Estimates, Physics Letters A 128(6-7)\n\n\n\n\n\n","category":"function"},{"location":"#Takens-best-estimate","page":"FractalDimensions.jl","title":"Takens best estimate","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"takens_best_estimate_dim","category":"page"},{"location":"#FractalDimensions.takens_best_estimate_dim","page":"FractalDimensions.jl","title":"FractalDimensions.takens_best_estimate_dim","text":"takens_best_estimate_dim(X, εmax, metric = Chebyshev(), εmin = 0)\n\nUse the \"Takens' best estimate\" [Takens1985][Theiler1988] method for estimating the correlation dimension.\n\nThe original formula is\n\nDelta_C approx fracC(epsilon_textmax)int_0^epsilon_textmax(C(epsilon) epsilon) depsilon\n\nwhere C is the correlationsum and epsilon_textmax is an upper cutoff. Here we use the later expression\n\nDelta_C approx - frac1etaquad eta = frac1(N-1)^*sum_i j^*log(X_i - X_j epsilon_textmax)\n\nwhere the sum happens for all i j so that i j and X_i - X_j epsilon_textmax. In the above expression, the bias in the original paper has already been corrected, as suggested in [Borovkova1999].\n\nAccording to [Borovkova1999], introducing a lower cutoff εmin can make the algorithm more stable (no divergence), this option is given but defaults to zero.\n\nIf X comes from a delay coordinates embedding of a timseries x, a recommended value for epsilon_textmax is std(x)/4.\n\nYou may also use\n\nΔ_C, Δu_C, Δl_C = FractalDimensions.takens_best_estimate(args...)\n\nto obtain the upper and lower 95% confidence intervals. The intervals are estimated from the log-likelihood function by finding the values of Δ_C where the function has fallen by 2 from its maximum, see e.g. [Barlow] chapter 5.3.\n\n[Takens1985]: Takens, On the numerical determination of the dimension of an attractor, in: B.H.W. Braaksma, B.L.J.F. Takens (Eds.), Dynamical Systems and Bifurcations, in: Lecture Notes in Mathematics, Springer, Berlin, 1985, pp. 99–106.\n\n[Theiler1988]: Theiler, Lacunarity in a best estimator of fractal dimension. Physics Letters A, 133(4–5)\n\n[Borovkova1999]: Borovkova et al., Consistency of the Takens estimator for the correlation dimension. The Annals of Applied Probability, 9, 05 1999.\n\n[Barlow]: Barlow, R., Statistics - A Guide to the Use of Statistical Methods in the Physical Sciences. Vol 29. John Wiley & Sons, 1993\n\n\n\n\n\n","category":"function"},{"location":"#Kaplan-Yorke-dimension","page":"FractalDimensions.jl","title":"Kaplan-Yorke dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"kaplanyorke_dim","category":"page"},{"location":"#FractalDimensions.kaplanyorke_dim","page":"FractalDimensions.jl","title":"FractalDimensions.kaplanyorke_dim","text":"kaplanyorke_dim(λs::AbstractVector)\n\nCalculate the Kaplan-Yorke dimension, a.k.a. Lyapunov dimension[Kaplan1970] from the given Lyapunov exponents λs.\n\nDescription\n\nThe Kaplan-Yorke dimension is simply the point where cumsum(λs) becomes zero (interpolated):\n\n D_KY = k + fracsum_i=1^k lambda_ilambda_k+1quad k = max_j left sum_i=1^j lambda_i 0 right\n\nIf the sum of the exponents never becomes negative the function will return the length of the input vector.\n\nUseful in combination with lyapunovspectrum from ChaosTools.jl.\n\n[Kaplan1970]: J. Kaplan & J. Yorke, Chaotic behavior of multidimensional difference equations, Lecture Notes in Mathematics vol. 730, Springer (1979)\n\n\n\n\n\n","category":"function"},{"location":"#Higuchi-dimension","page":"FractalDimensions.jl","title":"Higuchi dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"higuchi_dim","category":"page"},{"location":"#FractalDimensions.higuchi_dim","page":"FractalDimensions.jl","title":"FractalDimensions.higuchi_dim","text":"higuchi_dim(x::AbstractVector [, ks])\n\nEstimate the Higuchi dimension[Higuchi1988] of the graph of x.\n\nDescription\n\nThe Higuchi dimension is a number Δ ∈ [1, 2] that quantifies the roughness of the graph of the function x(t), assuming here that x is equi-sampled, like in the original paper.\n\nThe method estimates how the length of the graph increases as a function of the indices difference (which, in this context, is equivalent with differences in t). Specifically, we calculate the average length versus k as\n\nL_m(k) = fracN-1lfloor fracN-mk \rfloor k^2\nsum_i=1^lfloor fracN-mk rfloor X_N(m+ik)-X_N(m+(i-1)k) \n\nL(k) = frac1k sum_m=1^k L_m(k)\n\nand then use linear_region in -log2.(k) vs log2.(L) as per usual when computing a fractal dimension.\n\nThe algorithm chooses default ks to be exponentially spaced in base-2, up to at most 2^8. A user can provide their own ks as a second argument otherwise.\n\nUse FractalDimensions.higuchi_length(x, ks) to obtain L(k) directly.\n\n[Higuchi1988]: Higuchi, Approach to an irregular time series on the basis of the fractal theory, Physica D: Nonlinear Phenomena (1988)\n\n\n\n\n\n","category":"function"},{"location":"#Extreme-value-value-theory-dimension","page":"FractalDimensions.jl","title":"Extreme value value theory dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"extremevaltheory_dim\nextremevaltheory_dims_persistences\nextremevaltheory_local_dim_persistence\nextremal_index_sueveges\nestimate_gpd_parameters\nextremevaltheory_gpdfit_pvalues","category":"page"},{"location":"#FractalDimensions.extremevaltheory_dim","page":"FractalDimensions.jl","title":"FractalDimensions.extremevaltheory_dim","text":"extremevaltheory_dim(X::StateSpaceSet, p::Real; kwargs...) → Δ\n\nConvenience syntax that returns the mean of the local dimensions of extremevaltheory_dims_persistences, which approximates a fractal dimension of X using extreme value theory and quantile probability p.\n\nSee also extremevaltheory_gpdfit_pvalues for obtaining confidence on the results.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.extremevaltheory_dims_persistences","page":"FractalDimensions.jl","title":"FractalDimensions.extremevaltheory_dims_persistences","text":"extremevaltheory_dims_persistences(x::AbstractStateSpaceSet, p::Real; kwargs...)\n\nReturn the local dimensions Δloc and the persistences θloc for each point in the given set for quantile probability p, according to the estimation done via extreme value theory [Lucarini2016]. The computation is parallelized to available threads (Threads.nthreads()).\n\nSee also extremevaltheory_gpdfit_pvalues for obtaining confidence on the results.\n\nKeyword arguments\n\nshow_progress = true: displays a progress bar.\nestimator = :mm: how to estimate the σ parameter of the Generalized Pareto Distribution. The local fractal dimension is 1/σ. The possible values are: :exp, :mm, as in estimate_gpd_parameters.\ncompute_persistence = true: whether to aso compute local persistences θloc (also called extremal index). If false, θloc are NaNs.\nallocate_matrix = false: If true, the code calls a method that attempts to allocate an N×N matrix (N = length(X)) that stores the pairwise Euclidean distances. This method is faster due to optimizations of Distances.pairwise but will error if the computer does not have enough available memory for the matrix allocation.\n\nDescription\n\nFor each state space point mathbfx_i in X we compute g_i = -log(mathbfx_i - mathbfx_j ) forall j = 1 ldots N with cdot the Euclidean distance. Next, we choose an extreme quantile probability p (e.g., 0.99) for the distribution of g_i. We compute g_p as the p-th quantile of g_i. Then, we collect the exceedances of g_i, defined as E_i = g_i - g_p g_i ge g_p , i.e., all values of g_i larger or equal to g_p, also shifted by g_p. There are in total n = N(1-q) values in E_i. According to extreme value theory, in the limit N to infty the values E_i follow a two-parameter Generalized Pareto Distribution (GPD) with parameters sigmaxi (the third parameter mu of the GPD is zero due to the positive-definite construction of E). Within this extreme value theory approach, the local dimension Delta^(E)_i assigned to state space point textbfx_i is given by the inverse of the sigma parameter of the GPD fit to the data[Lucarini2012], Delta^(E)_i = 1sigma. sigma is estimated according to the estimator keyword.\n\n[Lucarini2016]: Lucarini et al., Extremes and Recurrence in Dynamical Systems \n\n[Lucarini2012]: Lucarini et al., Universal Behaviour of Extreme Value Statistics for Selected Observables of Dynamical Systems, Journal of Statistical Physics, 147(1), 63–73. et al., [Physica D 400 132143\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.extremevaltheory_local_dim_persistence","page":"FractalDimensions.jl","title":"FractalDimensions.extremevaltheory_local_dim_persistence","text":"extremevaltheory_local_dim_persistence(X::StateSpaceSet, ζ, p::Real; kw...)\n\nReturn the local values Δ, θ of the fractal dimension and persistence of X around a state space point ζ. p and kw are as in extremevaltheory_dims_persistences.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.extremal_index_sueveges","page":"FractalDimensions.jl","title":"FractalDimensions.extremal_index_sueveges","text":"extremal_index_sueveges(y::AbstractVector, p)\n\nCompute the extremal index θ of y through the Süveges formula for quantile probability p.\n\n[Süveges2007]: Süveges. 2007. Likelihood estimation of the extremal index. Extremes, 10.1-2, 41-55, doi: 10.1007/s10687-007-0034-2\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.estimate_gpd_parameters","page":"FractalDimensions.jl","title":"FractalDimensions.estimate_gpd_parameters","text":"estimate_gpd_parameters(X::AbstractVector{<:Real}, estimator::Symbol = :mm)\n\nEstimate and return the parameters σ, ξ of a Generalized Pareto Distribution fit to X, assuming that minimum(X) ≥ 0 and hence the parameter μ is 0 (if not, simply shift X by its minimum), according to the methods provided in [Flavio2023].\n\nOptionally choose the estimator, which can be:\n\n:exp: Assume the distribution is exponential instead of GP and get σ from mean of X and set ξ = 0.\nmm: Standing for \"method of moments\", estimants are given by\nxi = (barx^2s^2 - 1)2 quad sigma = barx(barx^2s^2 + 1)2\nwith barx the sample mean and s^2 the sample variance. This estimator only exists if the true distribution ξ value is < 0.5.\n\n[Flavio2023]: Flavio et al., Stability of attractor local dimension estimates in non-Axiom A dynamical systems, preprint\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.extremevaltheory_gpdfit_pvalues","page":"FractalDimensions.jl","title":"FractalDimensions.extremevaltheory_gpdfit_pvalues","text":"extremevaltheory_gpdfit_pvalues(X, p; kw...)\n\nReturn various computed quantities that may quantify the significance of the results of extremevaltheory_dims_persistences(X, p; kw...), terms of quantifying how well a Generalized Pareto Distribution (GPD) describes exceedences in the input data.\n\nKeyword arguments\n\nshow_progress, estimator as in extremevaltheory_dims_persistences\nTestType = ApproximateOneSampleKSTest: the test type to use. It can be ApproximateOneSampleKSTest, ExactOneSampleKSTest, CramerVonMises. We noticed that OneSampleADTest sometimes yielded nonsensical results: all p-values were equal and were very small ≈ 1e-6.\nnbins = round(Int, length(X)*(1-p)/20): number of bins to use when computing the histogram of the exceedances for computing the NRMSE. The default value will use equally spaced bins that are equal to the length of the exceedances divided by 20.\n\nDescription\n\nThe function computes the exceedances E_i for each point x_i in X as in extremevaltheory_dims_persistences. It returns 5 quantities, all being vectors of length length(X):\n\nEs, all exceedences, as a vector of vectors.\nsigmas, xis the fitted σ, ξ to the GPD fits for each exceedance\nnrmses the normalized root mean square distance of the fitted GPD to the histogram of the exceedances\npvalues the pvalues of a statistical test of the appropriateness of the GPD fit\n\nThe output nrmses quantifies the distance between the fitted GPD and the empirical histogram of the exceedances. It is computed as\n\nNRMSE = sqrtfracsum(P_j - G_j)^2sum(P_j - U)^2\n\nwhere P_j the empirical (observed) probability at bin j, G_j the fitted GPD probability at the midpoint of bin j, and U same as G_j but for the uniform distribution. The divisor of the equation normalizes the expression, so that the error of the empirical distribution is normalized to the error of the empirical distribution with fitting it with the uniform distribution. It is expected that NRMSE < 1. The smaller it is, the better the data are approximated by GPD versus uniform distribution.\n\nThe output pvalues is a vector of p-values. pvalues[i] corresponds to the p-value of the hypothesis: \"The exceedences around point X[i] are sampled from a GPD\" versus the alternative hypothesis that they are not. To extract the p-values, we perform a one-sample hypothesis via HypothesisTests.jl to the fitted GPD. Very small p-values then indicate that the hypothesis should be rejected and the data are not well described by a GPD. This can be an indication that we do not have enough data, or that we choose too high of a quantile probability p, or that the data are not suitable in general. This p-value based method for significance has been used in [Faranda2017], but it is unclear precisely how it was used.\n\nFor more details on how these quantities may quantify significance, see our review paper.\n\n[Faranda2017]: Faranda et al. (2017), Dynamical proxies of North Atlantic predictability and extremes, Scientific Reports, 7\n\n\n\n\n\n","category":"function"},{"location":"#Theiler-window","page":"FractalDimensions.jl","title":"Theiler window","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"The Theiler window is a concept that is useful when finding neighbors in a dataset that is coming from the sampling of a continuous dynamical system. Itt tries to eliminate spurious \"correlations\" (wrongly counted neighbors) due to a potentially dense sampling of the trajectory. Typically a good choice for w coincides with the choice an optimal delay time, see DelayEmbeddings.estimate_delay, for any of the timeseries of the dataset.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"For more details, see Chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.","category":"page"},{"location":"#StateSpaceSet-reference","page":"FractalDimensions.jl","title":"StateSpaceSet reference","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"StateSpaceSet\nstandardize","category":"page"},{"location":"#StateSpaceSets.StateSpaceSet","page":"FractalDimensions.jl","title":"StateSpaceSets.StateSpaceSet","text":"StateSpaceSet{D, T} <: AbstractStateSpaceSet{D,T}\n\nA dedicated interface for sets in a state space. It is an ordered container of equally-sized points of length D. Each point is represented by SVector{D, T}. The data are a standard Julia Vector{SVector}, and can be obtained with vec(ssset::StateSpaceSet). Typically the order of points in the set is the time direction, but it doesn't have to be.\n\nWhen indexed with 1 index, StateSpaceSet is like a vector of points. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables. When iterated over, it iterates over its contained points. See description of indexing below for more.\n\nStateSpaceSet also supports almost all sensible vector operations like append!, push!, hcat, eachrow, among others.\n\nDescription of indexing\n\nIn the following let i, j be integers, typeof(X) <: AbstractStateSpaceSet and v1, v2 be <: AbstractVector{Int} (v1, v2 could also be ranges, and for performance benefits make v2 an SVector{Int}).\n\nX[i] == X[i, :] gives the ith point (returns an SVector)\nX[v1] == X[v1, :], returns a StateSpaceSet with the points in those indices.\nX[:, j] gives the jth variable timeseries (or collection), as Vector\nX[v1, v2], X[:, v2] returns a StateSpaceSet with the appropriate entries (first indices being \"time\"/point index, while second being variables)\nX[i, j] value of the jth variable, at the ith timepoint\n\nUse Matrix(ssset) or StateSpaceSet(matrix) to convert. It is assumed that each column of the matrix is one variable. If you have various timeseries vectors x, y, z, ... pass them like StateSpaceSet(x, y, z, ...). You can use columns(dataset) to obtain the reverse, i.e. all columns of the dataset in a tuple.\n\n\n\n\n\n","category":"type"},{"location":"#StateSpaceSets.standardize","page":"FractalDimensions.jl","title":"StateSpaceSets.standardize","text":"standardize(d::StateSpaceSet) → r\n\nCreate a standardized version of the input set where each column is transformed to have mean 0 and standard deviation 1.\n\n\n\n\n\nstandardize(x::AbstractVector{<:Real}) = (x - mean(x))/std(x)\n\n\n\n\n\n","category":"function"}] +[{"location":"#FractalDimensions.jl","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"FractalDimensions","category":"page"},{"location":"#FractalDimensions","page":"FractalDimensions.jl","title":"FractalDimensions","text":"FractalDimensions.jl\n\n(Image: ) (Image: ) (Image: ) (Image: CI) (Image: codecov) (Image: Package Downloads)\n\nA Julia package that estimates various definitions of fractal dimension from data. It can be used as a standalone package, or as part of DynamicalSystems.jl.\n\nTo install it, run import Pkg; Pkg.add(\"FractalDimensions\").\n\nAll further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.\n\nPreviously, this package was part of ChaosTools.jl.\n\nPublication\n\nFractalDimensions.jl is used in a review article comparing various estimators for fractal dimensions. The paper is likely a relevant read if you are interested in the package. And if you use the package, please cite the paper.\n\n@article{FractalDimensions.jl,\n doi = {10.1063/5.0160394},\n url = {https://doi.org/10.1063/5.0160394},\n year = {2023},\n month = oct,\n publisher = {{AIP} Publishing},\n volume = {33},\n number = {10},\n author = {George Datseris and Inga Kottlarz and Anton P. Braun and Ulrich Parlitz},\n title = {Estimating fractal dimensions: A comparative review and open source implementations},\n journal = {Chaos: An Interdisciplinary Journal of Nonlinear Science}\n}\n\n\n\n\n\n","category":"module"},{"location":"#Introduction","page":"FractalDimensions.jl","title":"Introduction","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"note: Note\nThis package is accompanying a review paper on estimating the fractal dimension: https://arxiv.org/abs/2109.05937. The paper is continuing the discussion of chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"There are numerous methods that one can use to calculate a so-called \"dimension\" of a dataset which in the context of dynamical systems is called the Fractal dimension. One way to do this is to estimate the scaling behaviour of some quantity as a size/scale increases. In the Fractal dimension example below, one finds the scaling of the correlation sum versus a ball radius. In this case, it approximately holds $ \\log(C) \\approx \\Delta\\log(\\varepsilon) $ for radius varepsilon. The scaling of many other quantities can be estimated as well, such as the generalized entropy, the Higuchi length, or others provided here.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"To actually find Delta, one needs to find a linearly scaling region in the graph log(C) vs. log(varepsilon) and estimate its slope. Hence, identifying a linear region is central to estimating a fractal dimension. That is why, the section Linear scaling regions is of central importance for this documentation.","category":"page"},{"location":"#Fractal-dimension-example","page":"FractalDimensions.jl","title":"Fractal dimension example","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"In this simplest example we will calculate the fractal dimension of the chaotic attractor of the Hénon map (for default parameters). For this example, we will generate the data on the spot:","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"using DynamicalSystemsBase # for simulating dynamical systems\nusing CairoMakie # for plotting\n\nhenon_rule(x, p, n) = SVector(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])\nu0 = zeros(2)\np0 = [1.4, 0.3]\nhenon = DeterministicIteratedMap(henon_rule, u0, p0)\n\nX, t = trajectory(henon, 20_000; Ttr = 100)\nscatter(X[:, 1], X[:, 2]; color = (\"black\", 0.01), markersize = 4)","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"instead of simulating the set X we could load it from disk, e.g., if there was a text file with two columns as x and y coordinates, we would load it as","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"using DelimitedFiles\nfile = \"path/to/file.csv\"\nM = readdlm(file) # here `M` is a metrix with two columns\nX = StateSpaceSet(M) # important to convert to a state space set","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"After we have X, we can start computing a fractal dimension and for this example we will use the correlationsum. Our goal is to compute the correlation sum of X for many different sizes/radii ε. This is as simple as","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"using FractalDimensions\nες = 2 .^ (-15:0.5:5) # semi-random guess\nCs = correlationsum(X, ες; show_progress = false)","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"For a fractal set X dynamical systems theory says that there should be an exponential relationship between the correlation sum and the sizes:","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"xs = log2.(ες)\nys = log2.(Cs)\nscatterlines(xs, ys; axis = (ylabel = L\"\\log(C_2)\", xlabel = L\"\\log (\\epsilon)\"))","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"The slope of the linear scaling region of the above plot is the fractal dimension (based on the correlation sum).","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"Given that we see the plot, we can estimate where the linear scaling region starts and ends. This is generally done using LargestLinearRegion in slopefit. But first, let's visualize what the method does, as it uses linear_regions.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"lrs, slopes = linear_regions(xs, ys, tol = 0.25)\nfig = Figure()\nax = Axis(fig[1,1]; ylabel = L\"\\log(C_2)\", xlabel = L\"\\log (\\epsilon)\")\nfor r in lrs\n scatterlines!(ax, xs[r], ys[r])\nend\nfig","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"The LargestLinearRegion method finds, and computes the slope of, the largest region:","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"Δ = slopefit(xs, ys, LargestLinearRegion())","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"This result is an approximation of a fractal dimension.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"The whole above pipeline we went through is bundled in grassberger_proccacia_dim. Similar work is done by generalized_dim and many other functions.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"danger: Be wary when using `xxxxx_dim`\nAs stated clearly by the documentation strings, all pre-made dimension estimating functions (ending in _dim) perform a lot of automated steps, each having its own heuristic choices for function default values. They are more like convenient bundles with on-average good defaults, rather than precise functions. You should be careful when considering the validity of the returned number!","category":"page"},{"location":"#Linear-scaling-regions","page":"FractalDimensions.jl","title":"Linear scaling regions","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"slopefit\nLinearRegression\nLargestLinearRegion\nlinear_regions\nAllSlopesDistribution\nestimate_boxsizes\nminimum_pairwise_distance","category":"page"},{"location":"#FractalDimensions.slopefit","page":"FractalDimensions.jl","title":"FractalDimensions.slopefit","text":"slopefit(x, y [, t::SLopeFit]; kw...) → s, s05, s95\n\nFit a linear scaling region in the curve of the two AbstractVectors y versus x using t as the estimation method. Return the estimated slope, as well as the confidence intervals for it.\n\nThe methods t that can be used for the estimation are:\n\nLinearRegression\nLargestLinearRegion (default)\nAllSlopesDistribution\n\nThe keyword ignore_saturation = true ignores saturation that (sometimes) happens at the start and end of the curve y(x), where the curve flattens. The keyword sat_threshold = 0.01 decides what saturation is: while abs(y[i]-y[i+1]) < sat_threshold we are in a saturation regime. Said differently, slopes with value sat_threshold/dx with dx = x[i+1] - x[i] are neglected.\n\nThe keyword ci = 0.95 specifies which quantile (and the 1 - quantile) the confidence interval values are returned at, and by defualt it is 95% (and hence also 5%).\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.LinearRegression","page":"FractalDimensions.jl","title":"FractalDimensions.LinearRegression","text":"LinearRegression <: SlopeFit\nLinearRegression()\n\nStandard linear regression fit to all available data. Estimation of the confidence intervals is based om the standard error of the slope following a T-distribution, see:\n\nhttps://stattrek.com/regression/slope-confidence-interval\n\n\n\n\n\n","category":"type"},{"location":"#FractalDimensions.LargestLinearRegion","page":"FractalDimensions.jl","title":"FractalDimensions.LargestLinearRegion","text":"LargestLinearRegion <: SlopeFit\nLargestLinearRegion(; dxi::Int = 1, tol = 0.25)\n\nIdentify regions where the curve y(x) is linear, by scanning the x-axis every dxi indices sequentially (e.g. at x[1] to x[5], x[5] to x[10], x[10] to x[15] and so on if dxi=5).\n\nIf the slope (calculated via linear regression) of a region of width dxi is approximatelly equal to that of the previous region, within relative tolerance tol and absolute tolerance 0, then these two regions belong to the same linear region.\n\nThe largest such region is then used to estimate the slope via standard linear regression of all points belonging to the largest linear region. \"Largest\" here means the region that covers the more extent along the x-axis.\n\nUse linear_regions if you wish to obtain the decomposition into linear regions.\n\n\n\n\n\n","category":"type"},{"location":"#FractalDimensions.linear_regions","page":"FractalDimensions.jl","title":"FractalDimensions.linear_regions","text":"linear_regions(x, y; dxi, tol) → lrs, tangents\n\nApply the algorithm described by LargestLinearRegion, and return the indices of x that correspond to the linear regions, lrs, and the tangents at each region (obtained via a second linear regression at each accumulated region). lrs is hence a vector of UnitRanges.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.AllSlopesDistribution","page":"FractalDimensions.jl","title":"FractalDimensions.AllSlopesDistribution","text":"AllSlopesDistribution <: SlopeFit\nAllSlopesDistribution()\n\nEstimate a slope by computing the distribution of all possible slopes that can be estimated from the curve y(x), according to the method by (Deshmukh et al., 2021). The returned slope is the distribution mean and the confidence intervals are simply the corresponding quantiles of the distribution.\n\nNot implemented yet, the method is here as a placeholder.\n\n\n\n\n\n","category":"type"},{"location":"#FractalDimensions.estimate_boxsizes","page":"FractalDimensions.jl","title":"FractalDimensions.estimate_boxsizes","text":"estimate_boxsizes(X::AbstractStateSpaceSet; kwargs...) → εs\n\nReturn k exponentially spaced values: εs = base .^ range(lower + w, upper + z; length = k), that are a good estimate for sizes ε that are used in calculating a Fractal Dimension. It is strongly recommended to standardize input dataset before using this function.\n\nLet d₋ be the minimum pair-wise distance in X, d₋ = dminimum_pairwise_distance(X). Let d₊ be the average total length of X, d₊ = mean(ma - mi) with mi, ma = minmaxima(X). Then lower = log(base, d₋) and upper = log(base, d₊). Because by default w=1, z=-1, the returned sizes are an order of mangitude larger than the minimum distance, and an order of magnitude smaller than the maximum distance.\n\nKeywords\n\nw = 1, z = -1, k = 16 : as explained above.\nbase = MathConstants.e : the base used in the log function.\nwarning = true: Print some warnings for bad estimates.\nautoexpand = true: If the final estimated range does not cover at least 2 orders of magnitude, it is automatically expanded by setting w -= we and z -= ze. You can set different default values to the keywords we = w, ze = z.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.minimum_pairwise_distance","page":"FractalDimensions.jl","title":"FractalDimensions.minimum_pairwise_distance","text":"minimum_pairwise_distance(X::StateSpaceSet, kdtree = dimension(X) < 10, metric = Euclidean())\n\nReturn min_d, min_pair: the minimum pairwise distance of all points in the dataset, and the corresponding point pair. The third argument is a switch of whether to use KDTrees or a brute force search.\n\n\n\n\n\n","category":"function"},{"location":"#Generalized-(entropy)-dimension","page":"FractalDimensions.jl","title":"Generalized (entropy) dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"Based on the definition of the Generalized entropy (genentropy), one can calculate an appropriate dimension, called generalized dimension:","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"generalized_dim\nmolteno_dim\nmolteno_boxing","category":"page"},{"location":"#FractalDimensions.generalized_dim","page":"FractalDimensions.jl","title":"FractalDimensions.generalized_dim","text":"generalized_dim(X::StateSpaceSet [, sizes]; q = 1, base = 2) -> Δ_q\n\nReturn the q order generalized dimension of X, by calculating its histogram-based Rényi entropy for each ε ∈ sizes.\n\nThe case of q = 0 is often called \"capacity\" or \"box-counting\" dimension, while q = 1 is the \"information\" dimension.\n\nDescription\n\nThe returned dimension is approximated by the (inverse) power law exponent of the scaling of the Renyi entropy H_q, versus the box size ε, where ε ∈ sizes:\n\nH_q approx -Delta_qlog_b(varepsilon)\n\nH_q is calculated using ComplexityMeasures: Renyi, ValueHistogram, entropy, i.e., by doing a histogram of the data with a given box size.\n\nCalling this function performs a lot of automated steps:\n\nA vector of box sizes is decided by calling sizes = estimate_boxsizes(dataset), if sizes is not given.\nFor each element of sizes the appropriate entropy is calculated as\nH = [entropy(Renyi(; q, base), ValueHistogram(ε), data) for ε ∈ sizes]\nLet x = -log.(sizes).\nThe curve H(x) is decomposed into linear regions, using slopefit(x, h)[1].\nThe biggest linear region is chosen, and a fit for the slope of that region is performed using the function linear_region, which does a simple linear regression fit using linreg. This slope is the return value of generalized_dim.\n\nBy doing these steps one by one yourself, you can adjust the keyword arguments given to each of these function calls, refining the accuracy of the result. The source code of this function is only 3 lines of code.\n\nThis approach to estimating the fractal dimension has been used (to our knowledge) for the first time in (Russell et al., 1980).\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.molteno_dim","page":"FractalDimensions.jl","title":"FractalDimensions.molteno_dim","text":"molteno_dim(X::AbstractStateSpaceSet; k0::Int = 10, q = 1.0, base = 2)\n\nReturn an estimate of the generalized_dim of X using the algorithm by (Molteno, 1993). This function is a simple utilization of the probabilities estimated by molteno_boxing so see that function for more details. Here the entropy of the probabilities is computed at each size, and a line is fitted in the entropy vs log(size) graph, just like in generalized_dim.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.molteno_boxing","page":"FractalDimensions.jl","title":"FractalDimensions.molteno_boxing","text":"molteno_boxing(X::AbstractStateSpaceSet; k0::Int = 10) → (probs, εs)\n\nDistribute X into boxes whose size is halved in each step, according to the algorithm by (Molteno, 1993). Division stops if the average number of points per filled box falls below the threshold k0.\n\nReturn probs, a vector of Probabilities of finding points in boxes for different box sizes, and the corresponding box sizes εs. These outputs are used in molteno_dim.\n\nDescription\n\nProject the data onto the whole interval of numbers that is covered by UInt64. The projected data is distributed into boxes whose size decreases by factor 2 in each step. For each box that contains more than one point 2^D new boxes are created where D is the dimension of the data.\n\nThe process of dividing the data into new boxes stops when the number of points over the number of filled boxes falls below k0. The box sizes εs are calculated and returned together with the probs.\n\nThis algorithm is faster than the traditional approach of using ValueHistogram(ε::Real), but it is only suited for low dimensional data since it divides each box into 2^D new boxes if D is the dimension. For large D this leads to low numbers of box divisions before the threshold is passed and the divison stops. This results to a low number of data points to fit the dimension to and thereby a poor estimate.\n\n\n\n\n\n","category":"function"},{"location":"#Correlation-sum-based-dimension","page":"FractalDimensions.jl","title":"Correlation sum based dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"grassberger_proccacia_dim\ncorrelationsum","category":"page"},{"location":"#FractalDimensions.grassberger_proccacia_dim","page":"FractalDimensions.jl","title":"FractalDimensions.grassberger_proccacia_dim","text":"grassberger_proccacia_dim(X::AbstractStateSpaceSet, εs = estimate_boxsizes(data); kwargs...)\n\nUse the method of Grassberger and Proccacia (Grassberger and Procaccia, 1983), and the correction by (Theiler, 1986), to estimate the correlation dimension Δ_C of X.\n\nThis function does something extremely simple:\n\ncm = correlationsum(data, εs; kwargs...)\nΔ_C = slopefit(rs, ys)(log2.(sizes), log2.(cm))[1]\n\ni.e. it calculates correlationsum for various radii and then tries to find a linear region in the plot of the log of the correlation sum versus log(ε).\n\nSee correlationsum for the available keywords. See also takens_best_estimate, boxassisted_correlation_dim.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.correlationsum","page":"FractalDimensions.jl","title":"FractalDimensions.correlationsum","text":"correlationsum(X, ε::Real; w = 0, norm = Euclidean(), q = 2) → C_q(ε)\n\nCalculate the q-order correlation sum of X (StateSpaceSet or timeseries) for a given radius ε and norm. They keyword show_progress = true can be used to display a progress bar for large X.\n\ncorrelationsum(X, εs::AbstractVector; w, norm, q) → C_q(ε)\n\nIf εs is a vector, C_q is calculated for each ε ∈ εs more efficiently. Multithreading is also enabled over the available threads (Threads.nthreads()). The function boxed_correlationsum is typically faster if the dimension of X is small and if maximum(εs) is smaller than the size of X.\n\nKeyword arguments\n\nq = 2: order of the correlation sum\nnorm = Euclidean(): distance norm\nw = 0: Theiler window\nshow_progress = true: display a progress bar\n\nDescription\n\nThe correlation sum is defined as follows for q=2:\n\nC_2(epsilon) = frac2(N-w)(N-w-1)sum_i=1^Nsum_j=1+w+i^N\nB(X_i - X_j epsilon)\n\nfor as follows for q≠2\n\nC_q(epsilon) = left sum_i=1^N alpha_i\nleftsum_ji-j w B(X_i - X_j epsilon)right^q-1right^1(q-1)\n\nwhere\n\nalpha_i = 1 (N (max(N-w i) - min(w + 1 i))^(q-1))\n\nwith N the length of X and B gives 1 if its argument is true. w is the Theiler window.\n\nSee the article of Grassberger for the general definition (Grassberger, 2007) and the book \"Nonlinear Time Series Analysis\" (Kantz and Schreiber, 2003), Ch. 6, for a discussion around choosing best values for w, and Ch. 11.3 for the explicit definition of the q-order correlationsum. Note that the formula in 11.3 is incorrect, but corrected here, indices are adapted to take advantage of all available points and also note that we immediatelly exponentiate C_q to 1(q-1), so that it scales exponentially as C_q propto varepsilon ^Delta_q versus the size varepsilon.\n\n\n\n\n\n","category":"function"},{"location":"#Box-assisted-version","page":"FractalDimensions.jl","title":"Box-assisted version","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"boxassisted_correlation_dim\nboxed_correlationsum\nprismdim_theiler\nestimate_r0_buenoorovio\nestimate_r0_theiler","category":"page"},{"location":"#FractalDimensions.boxassisted_correlation_dim","page":"FractalDimensions.jl","title":"FractalDimensions.boxassisted_correlation_dim","text":"boxassisted_correlation_dim(X::AbstractStateSpaceSet; kwargs...)\n\nUse the box-assisted optimizations of (Bueno-Orovio and P{é}rez-Garc{ı́}a, 2007) to estimate the correlation dimension Δ_C of X.\n\nThis function does something extremely simple:\n\nεs, Cs = boxed_correlationsum(X; kwargs...)\nslopefit(log2.(εs), log2.(Cs))[1]\n\nand hence see boxed_correlationsum for more information and available keywords.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.boxed_correlationsum","page":"FractalDimensions.jl","title":"FractalDimensions.boxed_correlationsum","text":"boxed_correlationsum(X::AbstractStateSpaceSet, εs, r0 = maximum(εs); kwargs...) → Cs\n\nEstimate the correlationsum for each size ε ∈ εs using an optimized algorithm that first distributes data into boxes of size r0, and then computes the correlation sum for each box and each neighboring box of each box. This method is much faster than correlationsum, provided that the box size r0 is significantly smaller than the attractor length. Good choices for r0 are estimate_r0_buenoorovio or estimate_r0_theiler.\n\nboxed_correlationsum(X::AbstractStateSpaceSet; kwargs...) → εs, Cs\n\nIn this method the minimum inter-point distance and estimate_r0_buenoorovio of X are used to estimate suitable εs for the calculation, which are also returned.\n\nKeyword arguments\n\nq = 2 : The order of the correlation sum.\nP = 2 : The prism dimension.\nw = 0 : The Theiler window.\nshow_progress = false : Whether to display a progress bar for the calculation.\nnorm = Euclidean() : Distance norm.\n\nDescription\n\nC_q(ε) is calculated for every ε ∈ εs and each of the boxes to then be summed up afterwards. The method of splitting the data into boxes was implemented according to (Theiler, 1987). w is the Theiler window. P is the prism dimension. If P is unequal to the dimension of the data, only the first P dimensions are considered for the box distribution (this is called the prism-assisted version). By default P is 2, which is the version suggested by [Bueno2007]. Alternative for P is the prismdim_theiler. Note that only when P = dimension(X) the boxed version is guaranteed to be exact to the original correlationsum. For any other P, some point pairs that should have been included may be skipped due to having smaller distance in the remaining dimensions, but larger distance in the first P dimensions.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.prismdim_theiler","page":"FractalDimensions.jl","title":"FractalDimensions.prismdim_theiler","text":"prismdim_theiler(X)\n\nAn algorithm to find the ideal choice of a prism dimension for boxed_correlationsum using Theiler's original suggestion.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.estimate_r0_buenoorovio","page":"FractalDimensions.jl","title":"FractalDimensions.estimate_r0_buenoorovio","text":"estimate_r0_buenoorovio(X::AbstractStateSpaceSet, P = 2) → r0, ε0\n\nEstimate a reasonable size for boxing X, proposed by Bueno-Orovio and Pérez-García (Bueno-Orovio and P{é}rez-Garc{ı́}a, 2007), before calculating the correlation dimension as presented by Theiler1983. Return the size r0 and the minimum interpoint distance ε0 in the data.\n\nIf instead of boxes, prisms are chosen everything stays the same but P is the dimension of the prism. To do so the dimension ν is estimated by running the algorithm by Grassberger and Procaccia (Grassberger and Procaccia, 1983) with √N points where N is the number of total data points. An effective size ℓ of the attractor is calculated by boxing a small subset of size N/10 into boxes of sidelength r_ℓ and counting the number of filled boxes η_ℓ.\n\nell = r_ell eta_ell ^1nu\n\nThe optimal number of filled boxes η_opt is calculated by minimising the number of calculations.\n\neta_textrmopt = N^23cdot frac3^nu - 13^P - 1^12\n\nP is the dimension of the data or the number of edges on the prism that don't span the whole dataset.\n\nThen the optimal boxsize r_0 computes as\n\nr_0 = ell eta_textrmopt^1nu\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.estimate_r0_theiler","page":"FractalDimensions.jl","title":"FractalDimensions.estimate_r0_theiler","text":"estimate_r0_theiler(X::AbstractStateSpaceSet) → r0, ε0\n\nEstimate a reasonable size for boxing the data X before calculating the boxed_correlationsum proposed by (Theiler, 1987). Return the boxing size r0 and minimum inter-point distance in X, ε0.\n\nTo do so the dimension is estimated by running the algorithm by Grassberger and Procaccia (Grassberger and Procaccia, 1983) with √N points where N is the number of total data points. Then the optimal boxsize r_0 computes as\n\nr_0 = R (2N)^1nu\n\nwhere R is the size of the chaotic attractor and nu is the estimated dimension.\n\n\n\n\n\n","category":"function"},{"location":"#Fixed-mass-correlation-sum","page":"FractalDimensions.jl","title":"Fixed mass correlation sum","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"fixedmass_correlation_dim\nfixedmass_correlationsum","category":"page"},{"location":"#FractalDimensions.fixedmass_correlation_dim","page":"FractalDimensions.jl","title":"FractalDimensions.fixedmass_correlation_dim","text":"fixedmass_correlation_dim(X [, max_j]; kwargs...)\n\nUse the fixed mass algorithm for computing the correlation sum, and use the result to compute the correlation dimension Δ_M of X.\n\nThis function does something extremely simple:\n\nrs, ys = fixedmass_correlationsum(X, args...; kwargs...)\nslopefit(rs, ys)[1]\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.fixedmass_correlationsum","page":"FractalDimensions.jl","title":"FractalDimensions.fixedmass_correlationsum","text":"fixedmass_correlationsum(X [, max_j]; metric = Euclidean(), M = length(X)) → rs, ys\n\nA fixed mass algorithm for the calculation of the correlationsum, and subsequently a fractal dimension Delta, with max_j the maximum number of neighbours that should be considered for the calculation.\n\nBy default max_j = clamp(N*(N-1)/2, 5, 32) with N the data length.\n\nKeyword arguments\n\nM defines the number of points considered for the averaging of distances, randomly subsampling them from X.\nmetric = Euclidean() is the distance metric.\nstart_j = 4 computes the equation below starting from j = start_j. Typically the first j values have not converged to the correct scaling of the fractal dimension.\n\nDescription\n\n\"Fixed mass\" algorithms mean that instead of trying to find all neighboring points within a radius, one instead tries to find the max radius containing j points. A correlation sum is obtained with this constrain, and equivalently the mean radius containing k points. Based on this, one can calculate Delta approximating the information dimension. The implementation here is due to to (Grassberger, 1988), which defines\n\nΨ(j) - log N sim Delta times overlinelog left( r_(j)right)\n\nwhere Psi(j) = fractextd log Γ(j)textd j is the digamma function, rs = overlinelog left( r_(j)right) is the mean logarithm of a radius containing j neighboring points, and ys = Psi(j) - log N (N is the length of the data). The amount of neighbors found j range from 2 to max_j. The numbers are also converted to base 2 from base e.\n\nDelta can be computed by using linear_region(rs, ys).\n\n\n\n\n\n","category":"function"},{"location":"#Takens-best-estimate","page":"FractalDimensions.jl","title":"Takens best estimate","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"takens_best_estimate_dim","category":"page"},{"location":"#FractalDimensions.takens_best_estimate_dim","page":"FractalDimensions.jl","title":"FractalDimensions.takens_best_estimate_dim","text":"takens_best_estimate_dim(X, εmax, metric = Chebyshev(), εmin = 0)\n\nUse the \"Takens' best estimate\" [Takens1985][Theiler1988] method for estimating the correlation dimension.\n\nThe original formula is\n\nDelta_C approx fracC(epsilon_textmax)int_0^epsilon_textmax(C(epsilon) epsilon) depsilon\n\nwhere C is the correlationsum and epsilon_textmax is an upper cutoff. Here we use the later expression\n\nDelta_C approx - frac1etaquad eta = frac1(N-1)^*sum_i j^*log(X_i - X_j epsilon_textmax)\n\nwhere the sum happens for all i j so that i j and X_i - X_j epsilon_textmax. In the above expression, the bias in the original paper has already been corrected, as suggested in [Borovkova1999].\n\nAccording to [Borovkova1999], introducing a lower cutoff εmin can make the algorithm more stable (no divergence), this option is given but defaults to zero.\n\nIf X comes from a delay coordinates embedding of a timseries x, a recommended value for epsilon_textmax is std(x)/4.\n\nYou may also use\n\nΔ_C, Δu_C, Δl_C = FractalDimensions.takens_best_estimate(args...)\n\nto obtain the upper and lower 95% confidence intervals. The intervals are estimated from the log-likelihood function by finding the values of Δ_C where the function has fallen by 2 from its maximum, see e.g. [Barlow] chapter 5.3.\n\n[Takens1985]: Takens, On the numerical determination of the dimension of an attractor, in: B.H.W. Braaksma, B.L.J.F. Takens (Eds.), Dynamical Systems and Bifurcations, in: Lecture Notes in Mathematics, Springer, Berlin, 1985, pp. 99–106.\n\n[Theiler1988]: Theiler, Lacunarity in a best estimator of fractal dimension. Physics Letters A, 133(4–5)\n\n[Borovkova1999]: Borovkova et al., Consistency of the Takens estimator for the correlation dimension. The Annals of Applied Probability, 9, 05 1999.\n\n[Barlow]: Barlow, R., Statistics - A Guide to the Use of Statistical Methods in the Physical Sciences. Vol 29. John Wiley & Sons, 1993\n\n\n\n\n\n","category":"function"},{"location":"#Kaplan-Yorke-dimension","page":"FractalDimensions.jl","title":"Kaplan-Yorke dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"kaplanyorke_dim","category":"page"},{"location":"#FractalDimensions.kaplanyorke_dim","page":"FractalDimensions.jl","title":"FractalDimensions.kaplanyorke_dim","text":"kaplanyorke_dim(λs::AbstractVector)\n\nCalculate the Kaplan-Yorke dimension, a.k.a. Lyapunov dimension (Kaplan and Yorke, 1979) from the given Lyapunov exponents λs.\n\nDescription\n\nThe Kaplan-Yorke dimension is simply the point where cumsum(λs) becomes zero (interpolated):\n\n D_KY = k + fracsum_i=1^k lambda_ilambda_k+1quad k = max_j left sum_i=1^j lambda_i 0 right\n\nIf the sum of the exponents never becomes negative the function will return the length of the input vector.\n\nUseful in combination with lyapunovspectrum from ChaosTools.jl.\n\n\n\n\n\n","category":"function"},{"location":"#Higuchi-dimension","page":"FractalDimensions.jl","title":"Higuchi dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"higuchi_dim","category":"page"},{"location":"#FractalDimensions.higuchi_dim","page":"FractalDimensions.jl","title":"FractalDimensions.higuchi_dim","text":"higuchi_dim(x::AbstractVector [, ks])\n\nEstimate the Higuchi dimension (Higuchi, 1988) of the graph of x.\n\nDescription\n\nThe Higuchi dimension is a number Δ ∈ [1, 2] that quantifies the roughness of the graph of the function x(t), assuming here that x is equi-sampled, like in the original paper.\n\nThe method estimates how the length of the graph increases as a function of the indices difference (which, in this context, is equivalent with differences in t). Specifically, we calculate the average length versus k as\n\nL_m(k) = fracN-1lfloor fracN-mk \rfloor k^2\nsum_i=1^lfloor fracN-mk rfloor X_N(m+ik)-X_N(m+(i-1)k) \n\nL(k) = frac1k sum_m=1^k L_m(k)\n\nand then use linear_region in -log2.(k) vs log2.(L) as per usual when computing a fractal dimension.\n\nThe algorithm chooses default ks to be exponentially spaced in base-2, up to at most 2^8. A user can provide their own ks as a second argument otherwise.\n\nUse FractalDimensions.higuchi_length(x, ks) to obtain L(k) directly.\n\n\n\n\n\n","category":"function"},{"location":"#Extreme-value-value-theory-dimension","page":"FractalDimensions.jl","title":"Extreme value value theory dimension","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"extremevaltheory_dim\nextremevaltheory_dims_persistences\nextremevaltheory_local_dim_persistence\nextremal_index_sueveges\nestimate_gpd_parameters\nextremevaltheory_gpdfit_pvalues","category":"page"},{"location":"#FractalDimensions.extremevaltheory_dim","page":"FractalDimensions.jl","title":"FractalDimensions.extremevaltheory_dim","text":"extremevaltheory_dim(X::StateSpaceSet, p::Real; kwargs...) → Δ\n\nConvenience syntax that returns the mean of the local dimensions of extremevaltheory_dims_persistences, which approximates a fractal dimension of X using extreme value theory and quantile probability p.\n\nSee also extremevaltheory_gpdfit_pvalues for obtaining confidence on the results.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.extremevaltheory_dims_persistences","page":"FractalDimensions.jl","title":"FractalDimensions.extremevaltheory_dims_persistences","text":"extremevaltheory_dims_persistences(x::AbstractStateSpaceSet, p::Real; kwargs...)\n\nReturn the local dimensions Δloc and the persistences θloc for each point in the given set for quantile probability p, according to the estimation done via extreme value theory (Lucarini et al., 2016). The computation is parallelized to available threads (Threads.nthreads()).\n\nSee also extremevaltheory_gpdfit_pvalues for obtaining confidence on the results.\n\nKeyword arguments\n\nshow_progress = true: displays a progress bar.\nestimator = :mm: how to estimate the σ parameter of the Generalized Pareto Distribution. The local fractal dimension is 1/σ. The possible values are: :exp, :mm, as in estimate_gpd_parameters.\ncompute_persistence = true: whether to aso compute local persistences θloc (also called extremal index). If false, θloc are NaNs.\nallocate_matrix = false: If true, the code calls a method that attempts to allocate an N×N matrix (N = length(X)) that stores the pairwise Euclidean distances. This method is faster due to optimizations of Distances.pairwise but will error if the computer does not have enough available memory for the matrix allocation.\n\nDescription\n\nFor each state space point mathbfx_i in X we compute g_i = -log(mathbfx_i - mathbfx_j ) forall j = 1 ldots N with cdot the Euclidean distance. Next, we choose an extreme quantile probability p (e.g., 0.99) for the distribution of g_i. We compute g_p as the p-th quantile of g_i. Then, we collect the exceedances of g_i, defined as E_i = g_i - g_p g_i ge g_p , i.e., all values of g_i larger or equal to g_p, also shifted by g_p. There are in total n = N(1-q) values in E_i. According to extreme value theory, in the limit N to infty the values E_i follow a two-parameter Generalized Pareto Distribution (GPD) with parameters sigmaxi (the third parameter mu of the GPD is zero due to the positive-definite construction of E). Within this extreme value theory approach, the local dimension Delta^(E)_i assigned to state space point textbfx_i is given by the inverse of the sigma parameter of the GPD fit to the data[Lucarini2012], Delta^(E)_i = 1sigma. sigma is estimated according to the estimator keyword.\n\nA more precise description of this process is given in the review paper (Datseris et al., 2023).\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.extremevaltheory_local_dim_persistence","page":"FractalDimensions.jl","title":"FractalDimensions.extremevaltheory_local_dim_persistence","text":"extremevaltheory_local_dim_persistence(X::StateSpaceSet, ζ, p::Real; kw...)\n\nReturn the local values Δ, θ of the fractal dimension and persistence of X around a state space point ζ. p and kw are as in extremevaltheory_dims_persistences.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.extremal_index_sueveges","page":"FractalDimensions.jl","title":"FractalDimensions.extremal_index_sueveges","text":"extremal_index_sueveges(y::AbstractVector, p)\n\nCompute the extremal index θ of y through the Süveges formula for quantile probability p, using the algorithm of (Süveges, 2007).\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.estimate_gpd_parameters","page":"FractalDimensions.jl","title":"FractalDimensions.estimate_gpd_parameters","text":"estimate_gpd_parameters(X::AbstractVector{<:Real}, estimator::Symbol)\n\nEstimate and return the parameters σ, ξ of a Generalized Pareto Distribution fit to X, assuming that minimum(X) ≥ 0 and hence the parameter μ is 0 (if not, simply shift X by its minimum), according to the methods provided in Pons2023.\n\nThe estimator can be:\n\n:exp: Assume the distribution is exponential instead of GP and get σ from mean of X and set ξ = 0.\nmm: Standing for \"method of moments\", estimants are given by\nxi = (barx^2s^2 - 1)2 quad sigma = barx(barx^2s^2 + 1)2\nwith barx the sample mean and s^2 the sample variance. This estimator only exists if the true distribution ξ value is < 0.5.\n\n\n\n\n\n","category":"function"},{"location":"#FractalDimensions.extremevaltheory_gpdfit_pvalues","page":"FractalDimensions.jl","title":"FractalDimensions.extremevaltheory_gpdfit_pvalues","text":"extremevaltheory_gpdfit_pvalues(X, p; kw...)\n\nReturn various computed quantities that may quantify the significance of the results of extremevaltheory_dims_persistences(X, p; kw...), terms of quantifying how well a Generalized Pareto Distribution (GPD) describes exceedences in the input data.\n\nKeyword arguments\n\nshow_progress, estimator as in extremevaltheory_dims_persistences\nTestType = ApproximateOneSampleKSTest: the test type to use. It can be ApproximateOneSampleKSTest, ExactOneSampleKSTest, CramerVonMises. We noticed that OneSampleADTest sometimes yielded nonsensical results: all p-values were equal and were very small ≈ 1e-6.\nnbins = round(Int, length(X)*(1-p)/20): number of bins to use when computing the histogram of the exceedances for computing the NRMSE. The default value will use equally spaced bins that are equal to the length of the exceedances divided by 20.\n\nDescription\n\nThe function computes the exceedances E_i for each point x_i in X as in extremevaltheory_dims_persistences. It returns 5 quantities, all being vectors of length length(X):\n\nEs, all exceedences, as a vector of vectors.\nsigmas, xis the fitted σ, ξ to the GPD fits for each exceedance\nnrmses the normalized root mean square distance of the fitted GPD to the histogram of the exceedances\npvalues the pvalues of a statistical test of the appropriateness of the GPD fit\n\nThe output nrmses quantifies the distance between the fitted GPD and the empirical histogram of the exceedances. It is computed as\n\nNRMSE = sqrtfracsum(P_j - G_j)^2sum(P_j - U)^2\n\nwhere P_j the empirical (observed) probability at bin j, G_j the fitted GPD probability at the midpoint of bin j, and U same as G_j but for the uniform distribution. The divisor of the equation normalizes the expression, so that the error of the empirical distribution is normalized to the error of the empirical distribution with fitting it with the uniform distribution. It is expected that NRMSE < 1. The smaller it is, the better the data are approximated by GPD versus uniform distribution.\n\nThe output pvalues is a vector of p-values. pvalues[i] corresponds to the p-value of the hypothesis: \"The exceedences around point X[i] are sampled from a GPD\" versus the alternative hypothesis that they are not. To extract the p-values, we perform a one-sample hypothesis via HypothesisTests.jl to the fitted GPD. Very small p-values then indicate that the hypothesis should be rejected and the data are not well described by a GPD. This can be an indication that we do not have enough data, or that we choose too high of a quantile probability p, or that the data are not suitable in general. This p-value based method for significance has been used in [Faranda2017], but it is unclear precisely how it was used.\n\nFor more details on how these quantities may quantify significance, see our review paper.\n\n[Faranda2017]: Faranda et al. (2017), Dynamical proxies of North Atlantic predictability and extremes, Scientific Reports, 7\n\n\n\n\n\n","category":"function"},{"location":"#Theiler-window","page":"FractalDimensions.jl","title":"Theiler window","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"The Theiler window is a concept that is useful when finding neighbors in a dataset that is coming from the sampling of a continuous dynamical system. Itt tries to eliminate spurious \"correlations\" (wrongly counted neighbors) due to a potentially dense sampling of the trajectory. Typically a good choice for w coincides with the choice an optimal delay time, see DelayEmbeddings.estimate_delay, for any of the timeseries of the dataset.","category":"page"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"For more details, see Chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.","category":"page"},{"location":"#StateSpaceSet-reference","page":"FractalDimensions.jl","title":"StateSpaceSet reference","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"StateSpaceSet\nstandardize","category":"page"},{"location":"#StateSpaceSets.StateSpaceSet","page":"FractalDimensions.jl","title":"StateSpaceSets.StateSpaceSet","text":"StateSpaceSet{D, T} <: AbstractStateSpaceSet{D,T}\n\nA dedicated interface for sets in a state space. It is an ordered container of equally-sized points of length D. Each point is represented by SVector{D, T}. The data are a standard Julia Vector{SVector}, and can be obtained with vec(ssset::StateSpaceSet). Typically the order of points in the set is the time direction, but it doesn't have to be.\n\nWhen indexed with 1 index, StateSpaceSet is like a vector of points. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables. When iterated over, it iterates over its contained points. See description of indexing below for more.\n\nStateSpaceSet also supports almost all sensible vector operations like append!, push!, hcat, eachrow, among others.\n\nDescription of indexing\n\nIn the following let i, j be integers, typeof(X) <: AbstractStateSpaceSet and v1, v2 be <: AbstractVector{Int} (v1, v2 could also be ranges, and for performance benefits make v2 an SVector{Int}).\n\nX[i] == X[i, :] gives the ith point (returns an SVector)\nX[v1] == X[v1, :], returns a StateSpaceSet with the points in those indices.\nX[:, j] gives the jth variable timeseries (or collection), as Vector\nX[v1, v2], X[:, v2] returns a StateSpaceSet with the appropriate entries (first indices being \"time\"/point index, while second being variables)\nX[i, j] value of the jth variable, at the ith timepoint\n\nUse Matrix(ssset) or StateSpaceSet(matrix) to convert. It is assumed that each column of the matrix is one variable. If you have various timeseries vectors x, y, z, ... pass them like StateSpaceSet(x, y, z, ...). You can use columns(dataset) to obtain the reverse, i.e. all columns of the dataset in a tuple.\n\n\n\n\n\n","category":"type"},{"location":"#StateSpaceSets.standardize","page":"FractalDimensions.jl","title":"StateSpaceSets.standardize","text":"standardize(d::StateSpaceSet) → r\n\nCreate a standardized version of the input set where each column is transformed to have mean 0 and standard deviation 1.\n\n\n\n\n\nstandardize(x::AbstractVector{<:Real}) = (x - mean(x))/std(x)\n\n\n\n\n\n","category":"function"},{"location":"#References","page":"FractalDimensions.jl","title":"References","text":"","category":"section"},{"location":"","page":"FractalDimensions.jl","title":"FractalDimensions.jl","text":"","category":"page"}] }