diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json
index 90ea2ba0c9..79cb38c178 100644
--- a/dev/.documenter-siteinfo.json
+++ b/dev/.documenter-siteinfo.json
@@ -1 +1 @@
-{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-08-03T08:23:32","documenter_version":"1.5.0"}}
\ No newline at end of file
+{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-08-09T18:43:59","documenter_version":"1.5.0"}}
\ No newline at end of file
diff --git a/dev/ecosystem/index.html b/dev/ecosystem/index.html
index 5e34e3bde6..8fcac343ea 100644
--- a/dev/ecosystem/index.html
+++ b/dev/ecosystem/index.html
@@ -3,4 +3,4 @@
   function gtag(){dataLayer.push(arguments);}
   gtag('js', new Date());
   gtag('config', 'UA-36890222-9', {'page_path': location.pathname + location.search + location.hash});
-</script><script data-outdated-warner src="../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../assets/documenter.js"></script><script src="../search_index.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../assets/themeswap.js"></script><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../"><img class="docs-light-only" src="../assets/logo.png" alt="Flux logo"/><img class="docs-dark-only" src="../assets/logo-dark.png" alt="Flux logo"/></a><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../">Welcome</a></li><li><span class="tocitem">Guide</span><ul><li><a class="tocitem" href="../guide/models/quickstart/">Quick Start</a></li><li><a class="tocitem" href="../guide/models/overview/">Fitting a Line</a></li><li><a class="tocitem" href="../guide/models/basics/">Gradients and Layers</a></li><li><a class="tocitem" href="../guide/models/custom_layers/">Custom Layers</a></li><li><a class="tocitem" href="../guide/training/training/">Training</a></li><li><a class="tocitem" href="../guide/models/recurrence/">Recurrence</a></li><li><a class="tocitem" href="../guide/gpu/">GPU Support</a></li><li><a class="tocitem" href="../guide/saving/">Saving &amp; Loading</a></li><li><a class="tocitem" href="../guide/performance/">Performance Tips</a></li></ul></li><li class="is-active"><a class="tocitem" href>Ecosystem</a><ul class="internal"><li><a class="tocitem" href="#Flux-models"><span>Flux models</span></a></li><li><a class="tocitem" href="#Tools-closely-associated-with-Flux"><span>Tools closely associated with Flux</span></a></li><li><a class="tocitem" href="#Differentiable-programming"><span>Differentiable programming</span></a></li><li><a class="tocitem" href="#Useful-miscellaneous-packages"><span>Useful miscellaneous packages</span></a></li><li><a class="tocitem" href="#Alternatives-to-Flux"><span>Alternatives to Flux</span></a></li></ul></li><li><span class="tocitem">Reference</span><ul><li><a class="tocitem" href="../reference/models/layers/">Built-in Layers</a></li><li><a class="tocitem" href="../reference/models/activation/">Activation Functions</a></li><li><a class="tocitem" href="../reference/utilities/">Weight Initialisation</a></li><li><a class="tocitem" href="../reference/models/losses/">Loss Functions</a></li><li><a class="tocitem" href="../reference/training/reference/">Training API</a></li><li><a class="tocitem" href="../reference/training/optimisers/">Optimisation Rules</a></li><li><a class="tocitem" href="../reference/outputsize/">Shape Inference</a></li><li><a class="tocitem" href="../reference/destructure/">Flat vs. Nested</a></li><li><a class="tocitem" href="../reference/training/callbacks/">Callback Helpers</a></li><li><a class="tocitem" href="../reference/training/zygote/">Gradients – Zygote.jl</a></li><li><a class="tocitem" href="../reference/data/mlutils/">Batching Data – MLUtils.jl</a></li><li><a class="tocitem" href="../reference/data/onehot/">OneHotArrays.jl</a></li><li><a class="tocitem" href="../reference/models/nnlib/">Low-level Operations – NNlib.jl</a></li><li><a class="tocitem" href="../reference/models/functors/">Nested Structures – Functors.jl</a></li></ul></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../tutorials/linear_regression/">Linear Regression</a></li><li><a class="tocitem" href="../tutorials/logistic_regression/">Logistic Regression</a></li><li><a class="tocitem" href="../tutorials/model_zoo/">Model Zoo</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li class="is-active"><a href>Ecosystem</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Ecosystem</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl" title="View the repository on GitHub"><span class="docs-icon fa-brands"></span><span class="docs-label is-hidden-touch">GitHub</span></a><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/ecosystem.md" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="The-Julia-Ecosystem-around-Flux"><a class="docs-heading-anchor" href="#The-Julia-Ecosystem-around-Flux">The Julia Ecosystem around Flux</a><a id="The-Julia-Ecosystem-around-Flux-1"></a><a class="docs-heading-anchor-permalink" href="#The-Julia-Ecosystem-around-Flux" title="Permalink"></a></h1><p>One of the main strengths of Julia lies in an ecosystem of packages  globally providing a rich and consistent user experience.</p><p>This is a non-exhaustive list of Julia packages, nicely complementing <code>Flux</code> in typical machine learning and deep learning workflows. To add your project please send a <a href="https://github.com/FluxML/Flux.jl/pulls">PR</a>. See also academic work <a href="https://scholar.google.com/scholar?cites=9731162218836700005&amp;hl=en">citing Flux</a> or <a href="https://scholar.google.com/scholar?cites=11943854577624257878&amp;hl=en">citing Zygote</a>.</p><h2 id="Flux-models"><a class="docs-heading-anchor" href="#Flux-models">Flux models</a><a id="Flux-models-1"></a><a class="docs-heading-anchor-permalink" href="#Flux-models" title="Permalink"></a></h2><ul><li>Flux&#39;s <a href="https://github.com/FluxML/model-zoo">model-zoo</a> contains examples from many domains.</li></ul><h3 id="Computer-vision"><a class="docs-heading-anchor" href="#Computer-vision">Computer vision</a><a id="Computer-vision-1"></a><a class="docs-heading-anchor-permalink" href="#Computer-vision" title="Permalink"></a></h3><ul><li><a href="https://github.com/r3tex/ObjectDetector.jl">ObjectDetector.jl</a> provides ready-to-go image detection via YOLO.</li><li><a href="https://github.com/FluxML/Metalhead.jl">Metalhead.jl</a> includes many state-of-the-art computer vision models which can easily be used for transfer learning.</li><li><a href="https://github.com/DhairyaLGandhi/UNet.jl">UNet.jl</a> is a generic UNet implementation.</li></ul><h3 id="Natural-language-processing"><a class="docs-heading-anchor" href="#Natural-language-processing">Natural language processing</a><a id="Natural-language-processing-1"></a><a class="docs-heading-anchor-permalink" href="#Natural-language-processing" title="Permalink"></a></h3><ul><li><a href="https://github.com/chengchingwen/Transformers.jl">Transformers.jl</a> provides components for Transformer models for NLP, as well as providing several trained models out of the box.</li><li><a href="https://github.com/JuliaText/TextAnalysis.jl">TextAnalysis.jl</a> provides several NLP algorithms that use Flux models under the hood.</li></ul><h3 id="Reinforcement-learning"><a class="docs-heading-anchor" href="#Reinforcement-learning">Reinforcement learning</a><a id="Reinforcement-learning-1"></a><a class="docs-heading-anchor-permalink" href="#Reinforcement-learning" title="Permalink"></a></h3><ul><li><a href="https://github.com/jonathan-laurent/AlphaZero.jl">AlphaZero.jl</a> provides a generic, simple and fast implementation of Deepmind&#39;s AlphaZero algorithm.</li><li><a href="https://juliareinforcementlearning.org/">ReinforcementLearning.jl</a> offers a collection of tools for doing reinforcement learning research in Julia.</li></ul><h3 id="Graph-learning"><a class="docs-heading-anchor" href="#Graph-learning">Graph learning</a><a id="Graph-learning-1"></a><a class="docs-heading-anchor-permalink" href="#Graph-learning" title="Permalink"></a></h3><ul><li><a href="https://github.com/CarloLucibello/GraphNeuralNetworks.jl">GraphNeuralNetworks.jl</a> is a fresh, performant and flexible graph neural network library based on Flux.jl.</li><li><a href="https://github.com/FluxML/GeometricFlux.jl">GeometricFlux.jl</a> is the first graph neural network library for julia. </li><li><a href="https://github.com/SciML/NeuralOperators.jl">NeuralOperators.jl</a> enables training infinite dimensional PDEs by learning a continuous function instead of using the finite element method.</li><li><a href="https://github.com/corail-research/SeaPearl.jl">SeaPearl.jl</a> is a Constraint Programming solver that uses Reinforcement Learning based on graphs as input.</li></ul><h3 id="Time-series"><a class="docs-heading-anchor" href="#Time-series">Time series</a><a id="Time-series-1"></a><a class="docs-heading-anchor-permalink" href="#Time-series" title="Permalink"></a></h3><ul><li><a href="https://github.com/sdobber/FluxArchitectures.jl">FluxArchitectures.jl</a> is a collection of advanced network architectures for time series forecasting.</li></ul><h3 id="Robust-networks"><a class="docs-heading-anchor" href="#Robust-networks">Robust networks</a><a id="Robust-networks-1"></a><a class="docs-heading-anchor-permalink" href="#Robust-networks" title="Permalink"></a></h3><ul><li><a href="https://github.com/acfr/RobustNeuralNetworks.jl">RobustNeuralNetworks.jl</a> includes classes of neural networks that are constructed to naturally satisfy robustness constraints.</li></ul><hr/><h2 id="Tools-closely-associated-with-Flux"><a class="docs-heading-anchor" href="#Tools-closely-associated-with-Flux">Tools closely associated with Flux</a><a id="Tools-closely-associated-with-Flux-1"></a><a class="docs-heading-anchor-permalink" href="#Tools-closely-associated-with-Flux" title="Permalink"></a></h2><p>Utility tools you&#39;re unlikely to have met if you never used Flux!</p><h3 id="High-level-training-flows"><a class="docs-heading-anchor" href="#High-level-training-flows">High-level training flows</a><a id="High-level-training-flows-1"></a><a class="docs-heading-anchor-permalink" href="#High-level-training-flows" title="Permalink"></a></h3><ul><li><a href="https://github.com/FluxML/FastAI.jl">FastAI.jl</a> is a Julia port of Python&#39;s fast.ai library.</li><li><a href="https://github.com/FluxML/FluxTraining.jl">FluxTraining.jl</a> is a package for using and writing powerful, extensible training loops for deep learning models. It supports callbacks for many common use cases like hyperparameter scheduling, metrics tracking and logging, checkpointing, early stopping, and more. It powers training in FastAI.jl</li><li><a href="https://github.com/jondeuce/Ignite.jl">Ignite.jl</a> is a Julia port of the Python library <code>ignite</code> for simplifying neural network training and validation loops, using events and handlers.</li><li><a href="https://github.com/CarloLucibello/Tsunami.jl">Tsunami.jl</a> adds high-level ways to control training, parameter schedules &amp; logging, heavily inspired by <code>pytorch-lightning</code>.</li></ul><h3 id="Datasets"><a class="docs-heading-anchor" href="#Datasets">Datasets</a><a id="Datasets-1"></a><a class="docs-heading-anchor-permalink" href="#Datasets" title="Permalink"></a></h3><p>Commonly used machine learning datasets are provided by the following packages in the julia ecosystem:</p><ul><li><a href="https://github.com/JuliaML/MLDatasets.jl">MLDatasets.jl</a> focuses on downloading, unpacking, and accessing benchmark datasets.</li><li><a href="https://github.com/yuehhua/GraphMLDatasets.jl">GraphMLDatasets.jl</a>: a library for machine learning datasets on graph.</li></ul><h3 id="Plumbing"><a class="docs-heading-anchor" href="#Plumbing">Plumbing</a><a id="Plumbing-1"></a><a class="docs-heading-anchor-permalink" href="#Plumbing" title="Permalink"></a></h3><p>Tools to put data into the right order for creating a model.</p><ul><li><a href="https://github.com/Evizero/Augmentor.jl">Augmentor.jl</a> is a real-time library augmentation library for increasing the number of training images.</li><li><a href="https://github.com/lorenzoh/DataAugmentation.jl">DataAugmentation.jl</a> aims to make it easy to build stochastic, label-preserving augmentation pipelines for vision use cases involving images, keypoints and segmentation masks.</li><li><a href="https://github.com/JuliaML/MLUtils.jl">MLUtils.jl</a> (replaces <a href="https://github.com/JuliaML/MLDataUtils.jl">MLDataUtils.jl</a> and <a href="https://github.com/JuliaML/MLLabelUtils.jl">MLLabelUtils.jl</a>) is a library for processing Machine Learning datasets.</li></ul><h3 id="Parameters"><a class="docs-heading-anchor" href="#Parameters">Parameters</a><a id="Parameters-1"></a><a class="docs-heading-anchor-permalink" href="#Parameters" title="Permalink"></a></h3><ul><li><a href="https://github.com/darsnack/ParameterSchedulers.jl">ParameterSchedulers.jl</a> standard scheduling policies for machine learning.</li></ul><hr/><h2 id="Differentiable-programming"><a class="docs-heading-anchor" href="#Differentiable-programming">Differentiable programming</a><a id="Differentiable-programming-1"></a><a class="docs-heading-anchor-permalink" href="#Differentiable-programming" title="Permalink"></a></h2><p>Packages based on differentiable programming but not necessarily related to Machine Learning. </p><ul><li>The <a href="https://sciml.ai/">SciML</a> ecosystem uses Flux and Zygote to mix neural nets with differential equations, to get the best of black box and mechanistic modelling.</li><li><a href="https://github.com/SciML/DiffEqFlux.jl">DiffEqFlux.jl</a> provides tools for creating Neural Differential Equations.</li><li><a href="https://github.com/FluxML/Flux3D.jl">Flux3D.jl</a> shows off machine learning on 3D data.</li><li><a href="https://github.com/avik-pal/RayTracer.jl">RayTracer.jl</a> combines ML with computer vision via a differentiable renderer.</li><li><a href="https://github.com/tejank10/Duckietown.jl">Duckietown.jl</a> Differentiable Duckietown simulator.</li><li>The <a href="https://github.com/QuantumBFS/Yao.jl">Yao.jl</a> project uses Flux and Zygote for Quantum Differentiable Programming.</li><li><a href="https://github.com/Chemellia/AtomicGraphNets.jl">AtomicGraphNets.jl</a> enables learning graph based models on atomic systems used in chemistry.</li><li><a href="https://github.com/FluxML/DiffImages.jl">DiffImages.jl</a> differentiable computer vision modeling in Julia with the Images.jl ecosystem.</li></ul><h3 id="Probabilistic-programming"><a class="docs-heading-anchor" href="#Probabilistic-programming">Probabilistic programming</a><a id="Probabilistic-programming-1"></a><a class="docs-heading-anchor-permalink" href="#Probabilistic-programming" title="Permalink"></a></h3><ul><li><a href="https://github.com/TuringLang/Turing.jl">Turing.jl</a> extends Flux&#39;s differentiable programming capabilities to probabilistic programming.</li><li><a href="https://github.com/zenna/Omega.jl">Omega.jl</a> is a research project aimed at causal, higher-order probabilistic programming.</li><li><a href="https://github.com/willtebbutt/Stheno.jl">Stheno.jl</a> provides flexible Gaussian processes.</li></ul><h3 id="Statistics"><a class="docs-heading-anchor" href="#Statistics">Statistics</a><a id="Statistics-1"></a><a class="docs-heading-anchor-permalink" href="#Statistics" title="Permalink"></a></h3><ul><li><a href="https://github.com/joshday/OnlineStats.jl">OnlineStats.jl</a> provides single-pass algorithms for statistics.</li></ul><hr/><h2 id="Useful-miscellaneous-packages"><a class="docs-heading-anchor" href="#Useful-miscellaneous-packages">Useful miscellaneous packages</a><a id="Useful-miscellaneous-packages-1"></a><a class="docs-heading-anchor-permalink" href="#Useful-miscellaneous-packages" title="Permalink"></a></h2><p>Some useful and random packages!</p><ul><li><a href="https://github.com/rizalzaf/AdversarialPrediction.jl">AdversarialPrediction.jl</a> provides a way to easily optimise generic performance metrics in supervised learning settings using the <a href="https://arxiv.org/abs/1812.07526">Adversarial Prediction</a> framework.</li><li><a href="https://github.com/CTUAvastLab/Mill.jl">Mill.jl</a> helps to prototype flexible multi-instance learning models.</li><li><a href="https://github.com/JuliaML/MLMetrics.jl">MLMetrics.jl</a> is a utility for scoring models in data science and machine learning.</li><li><a href="https://github.com/FluxML/Torch.jl">Torch.jl</a> exposes torch in Julia.</li><li><a href="https://github.com/JuliaML/ValueHistories.jl">ValueHistories.jl</a> is a utility for efficient tracking of optimization histories, training curves or other information of arbitrary types and at arbitrarily spaced sampling times.</li><li><a href="https://github.com/slimgroup/InvertibleNetworks.jl/">InvertibleNetworks.jl</a> Building blocks for invertible neural networks in the Julia programming language.</li><li><a href="https://github.com/timholy/ProgressMeter.jl">ProgressMeter.jl</a> progress meters for long-running computations.</li><li><a href="https://github.com/PhilipVinc/TensorBoardLogger.jl">TensorBoardLogger.jl</a> easy peasy logging to <a href="https://www.tensorflow.org/tensorboard">tensorboard</a> in Julia</li><li><a href="https://github.com/carlobaldassi/ArgParse.jl">ArgParse.jl</a> is a package for parsing command-line arguments to Julia programs.</li><li><a href="https://github.com/mauro3/Parameters.jl">Parameters.jl</a> types with default field values, keyword constructors and (un-)pack macros.</li><li><a href="https://github.com/JuliaIO/BSON.jl">BSON.jl</a> is a package for working with the Binary JSON serialisation format.</li><li><a href="https://github.com/JuliaData/DataFrames.jl">DataFrames.jl</a> in-memory tabular data in Julia.</li><li><a href="https://github.com/JuliaDynamics/DrWatson.jl">DrWatson.jl</a> is a scientific project assistant software.</li></ul><p>This tight integration among Julia packages is shown in some of the examples in the <a href="https://github.com/FluxML/model-zoo">model-zoo</a> repository.</p><hr/><h2 id="Alternatives-to-Flux"><a class="docs-heading-anchor" href="#Alternatives-to-Flux">Alternatives to Flux</a><a id="Alternatives-to-Flux-1"></a><a class="docs-heading-anchor-permalink" href="#Alternatives-to-Flux" title="Permalink"></a></h2><p>Julia has several other libraries for making neural networks. </p><ul><li><p><a href="https://github.com/PumasAI/SimpleChains.jl">SimpleChains.jl</a> is focused on making small, simple, CPU-based, neural networks fast. Uses <a href="https://github.com/JuliaSIMD/LoopVectorization.jl">LoopVectorization.jl</a>. (Was <code>FastChain</code> in DiffEqFlux.jl) </p></li><li><p><a href="https://github.com/denizyuret/Knet.jl">Knet.jl</a> is a neural network library built around <a href="https://github.com/denizyuret/AutoGrad.jl">AutoGrad.jl</a>.</p></li><li><p><a href="https://github.com/avik-pal/Lux.jl">Lux.jl</a> (earlier ExplicitFluxLayers.jl) shares much of the design, use-case, and NNlib.jl / Optimisers.jl back-end of Flux. But instead of encapsulating all parameters within the model structure, it separates this into 3 components: a model, a tree of parameters, and a tree of model states.</p></li></ul><div class="admonition is-compat"><header class="admonition-header">Explicit or explicit?</header><div class="admonition-body"><p>Flux&#39;s <a href="../guide/training/training/#man-training">training docs</a> talk about changes from Zygote&#39;s implicit to explicit gradients, dictionary-like to tree-like structures. (See also <a href="https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1">Zygote&#39;s description</a> of these.) Lux also uses Zygote, but uses the word &quot;explicit&quot; to mean something unrelated, namely storing the tree of parameters (and of state) separately from the model.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../guide/performance/">« Performance Tips</a><a class="docs-footer-nextpage" href="../reference/models/layers/">Built-in Layers »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+</script><script data-outdated-warner src="../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../assets/documenter.js"></script><script src="../search_index.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../assets/themeswap.js"></script><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../"><img class="docs-light-only" src="../assets/logo.png" alt="Flux logo"/><img class="docs-dark-only" src="../assets/logo-dark.png" alt="Flux logo"/></a><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../">Welcome</a></li><li><span class="tocitem">Guide</span><ul><li><a class="tocitem" href="../guide/models/quickstart/">Quick Start</a></li><li><a class="tocitem" href="../guide/models/overview/">Fitting a Line</a></li><li><a class="tocitem" href="../guide/models/basics/">Gradients and Layers</a></li><li><a class="tocitem" href="../guide/models/custom_layers/">Custom Layers</a></li><li><a class="tocitem" href="../guide/training/training/">Training</a></li><li><a class="tocitem" href="../guide/models/recurrence/">Recurrence</a></li><li><a class="tocitem" href="../guide/gpu/">GPU Support</a></li><li><a class="tocitem" href="../guide/saving/">Saving &amp; Loading</a></li><li><a class="tocitem" href="../guide/performance/">Performance Tips</a></li></ul></li><li class="is-active"><a class="tocitem" href>Ecosystem</a><ul class="internal"><li><a class="tocitem" href="#Flux-models"><span>Flux models</span></a></li><li><a class="tocitem" href="#Tools-closely-associated-with-Flux"><span>Tools closely associated with Flux</span></a></li><li><a class="tocitem" href="#Differentiable-programming"><span>Differentiable programming</span></a></li><li><a class="tocitem" href="#Useful-miscellaneous-packages"><span>Useful miscellaneous packages</span></a></li><li><a class="tocitem" href="#Alternatives-to-Flux"><span>Alternatives to Flux</span></a></li></ul></li><li><span class="tocitem">Reference</span><ul><li><a class="tocitem" href="../reference/models/layers/">Built-in Layers</a></li><li><a class="tocitem" href="../reference/models/activation/">Activation Functions</a></li><li><a class="tocitem" href="../reference/utilities/">Weight Initialisation</a></li><li><a class="tocitem" href="../reference/models/losses/">Loss Functions</a></li><li><a class="tocitem" href="../reference/training/reference/">Training API</a></li><li><a class="tocitem" href="../reference/training/optimisers/">Optimisation Rules</a></li><li><a class="tocitem" href="../reference/outputsize/">Shape Inference</a></li><li><a class="tocitem" href="../reference/destructure/">Flat vs. Nested</a></li><li><a class="tocitem" href="../reference/training/callbacks/">Callback Helpers</a></li><li><a class="tocitem" href="../reference/training/zygote/">Gradients – Zygote.jl</a></li><li><a class="tocitem" href="../reference/data/mlutils/">Batching Data – MLUtils.jl</a></li><li><a class="tocitem" href="../reference/data/onehot/">OneHotArrays.jl</a></li><li><a class="tocitem" href="../reference/models/nnlib/">Low-level Operations – NNlib.jl</a></li><li><a class="tocitem" href="../reference/models/functors/">Nested Structures – Functors.jl</a></li></ul></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../tutorials/linear_regression/">Linear Regression</a></li><li><a class="tocitem" href="../tutorials/logistic_regression/">Logistic Regression</a></li><li><a class="tocitem" href="../tutorials/model_zoo/">Model Zoo</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li class="is-active"><a href>Ecosystem</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Ecosystem</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl" title="View the repository on GitHub"><span class="docs-icon fa-brands"></span><span class="docs-label is-hidden-touch">GitHub</span></a><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/ecosystem.md" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="The-Julia-Ecosystem-around-Flux"><a class="docs-heading-anchor" href="#The-Julia-Ecosystem-around-Flux">The Julia Ecosystem around Flux</a><a id="The-Julia-Ecosystem-around-Flux-1"></a><a class="docs-heading-anchor-permalink" href="#The-Julia-Ecosystem-around-Flux" title="Permalink"></a></h1><p>One of the main strengths of Julia lies in an ecosystem of packages  globally providing a rich and consistent user experience.</p><p>This is a non-exhaustive list of Julia packages, nicely complementing <code>Flux</code> in typical machine learning and deep learning workflows. To add your project please send a <a href="https://github.com/FluxML/Flux.jl/pulls">PR</a>. See also academic work <a href="https://scholar.google.com/scholar?cites=9731162218836700005&amp;hl=en">citing Flux</a> or <a href="https://scholar.google.com/scholar?cites=11943854577624257878&amp;hl=en">citing Zygote</a>.</p><h2 id="Flux-models"><a class="docs-heading-anchor" href="#Flux-models">Flux models</a><a id="Flux-models-1"></a><a class="docs-heading-anchor-permalink" href="#Flux-models" title="Permalink"></a></h2><ul><li>Flux&#39;s <a href="https://github.com/FluxML/model-zoo">model-zoo</a> contains examples from many domains.</li></ul><h3 id="Computer-vision"><a class="docs-heading-anchor" href="#Computer-vision">Computer vision</a><a id="Computer-vision-1"></a><a class="docs-heading-anchor-permalink" href="#Computer-vision" title="Permalink"></a></h3><ul><li><a href="https://github.com/r3tex/ObjectDetector.jl">ObjectDetector.jl</a> provides ready-to-go image detection via YOLO.</li><li><a href="https://github.com/FluxML/Metalhead.jl">Metalhead.jl</a> includes many state-of-the-art computer vision models which can easily be used for transfer learning.</li><li><a href="https://github.com/DhairyaLGandhi/UNet.jl">UNet.jl</a> is a generic UNet implementation.</li></ul><h3 id="Natural-language-processing"><a class="docs-heading-anchor" href="#Natural-language-processing">Natural language processing</a><a id="Natural-language-processing-1"></a><a class="docs-heading-anchor-permalink" href="#Natural-language-processing" title="Permalink"></a></h3><ul><li><a href="https://github.com/chengchingwen/Transformers.jl">Transformers.jl</a> provides components for Transformer models for NLP, as well as providing several trained models out of the box.</li><li><a href="https://github.com/JuliaText/TextAnalysis.jl">TextAnalysis.jl</a> provides several NLP algorithms that use Flux models under the hood.</li></ul><h3 id="Reinforcement-learning"><a class="docs-heading-anchor" href="#Reinforcement-learning">Reinforcement learning</a><a id="Reinforcement-learning-1"></a><a class="docs-heading-anchor-permalink" href="#Reinforcement-learning" title="Permalink"></a></h3><ul><li><a href="https://github.com/jonathan-laurent/AlphaZero.jl">AlphaZero.jl</a> provides a generic, simple and fast implementation of Deepmind&#39;s AlphaZero algorithm.</li><li><a href="https://juliareinforcementlearning.org/">ReinforcementLearning.jl</a> offers a collection of tools for doing reinforcement learning research in Julia.</li></ul><h3 id="Graph-learning"><a class="docs-heading-anchor" href="#Graph-learning">Graph learning</a><a id="Graph-learning-1"></a><a class="docs-heading-anchor-permalink" href="#Graph-learning" title="Permalink"></a></h3><ul><li><a href="https://github.com/CarloLucibello/GraphNeuralNetworks.jl">GraphNeuralNetworks.jl</a> is a fresh, performant and flexible graph neural network library based on Flux.jl.</li><li><a href="https://github.com/FluxML/GeometricFlux.jl">GeometricFlux.jl</a> is the first graph neural network library for julia. </li><li><a href="https://github.com/SciML/NeuralOperators.jl">NeuralOperators.jl</a> enables training infinite dimensional PDEs by learning a continuous function instead of using the finite element method.</li><li><a href="https://github.com/corail-research/SeaPearl.jl">SeaPearl.jl</a> is a Constraint Programming solver that uses Reinforcement Learning based on graphs as input.</li></ul><h3 id="Time-series"><a class="docs-heading-anchor" href="#Time-series">Time series</a><a id="Time-series-1"></a><a class="docs-heading-anchor-permalink" href="#Time-series" title="Permalink"></a></h3><ul><li><a href="https://github.com/sdobber/FluxArchitectures.jl">FluxArchitectures.jl</a> is a collection of advanced network architectures for time series forecasting.</li></ul><h3 id="Robust-networks"><a class="docs-heading-anchor" href="#Robust-networks">Robust networks</a><a id="Robust-networks-1"></a><a class="docs-heading-anchor-permalink" href="#Robust-networks" title="Permalink"></a></h3><ul><li><a href="https://github.com/acfr/RobustNeuralNetworks.jl">RobustNeuralNetworks.jl</a> includes classes of neural networks that are constructed to naturally satisfy robustness constraints.</li></ul><hr/><h2 id="Tools-closely-associated-with-Flux"><a class="docs-heading-anchor" href="#Tools-closely-associated-with-Flux">Tools closely associated with Flux</a><a id="Tools-closely-associated-with-Flux-1"></a><a class="docs-heading-anchor-permalink" href="#Tools-closely-associated-with-Flux" title="Permalink"></a></h2><p>Utility tools you&#39;re unlikely to have met if you never used Flux!</p><h3 id="High-level-training-flows"><a class="docs-heading-anchor" href="#High-level-training-flows">High-level training flows</a><a id="High-level-training-flows-1"></a><a class="docs-heading-anchor-permalink" href="#High-level-training-flows" title="Permalink"></a></h3><ul><li><a href="https://github.com/FluxML/FastAI.jl">FastAI.jl</a> is a Julia port of Python&#39;s fast.ai library.</li><li><a href="https://github.com/FluxML/FluxTraining.jl">FluxTraining.jl</a> is a package for using and writing powerful, extensible training loops for deep learning models. It supports callbacks for many common use cases like hyperparameter scheduling, metrics tracking and logging, checkpointing, early stopping, and more. It powers training in FastAI.jl</li><li><a href="https://github.com/jondeuce/Ignite.jl">Ignite.jl</a> is a Julia port of the Python library <code>ignite</code> for simplifying neural network training and validation loops, using events and handlers.</li><li><a href="https://github.com/CarloLucibello/Tsunami.jl">Tsunami.jl</a> adds high-level ways to control training, parameter schedules &amp; logging, heavily inspired by <code>pytorch-lightning</code>.</li></ul><h3 id="Datasets"><a class="docs-heading-anchor" href="#Datasets">Datasets</a><a id="Datasets-1"></a><a class="docs-heading-anchor-permalink" href="#Datasets" title="Permalink"></a></h3><p>Commonly used machine learning datasets are provided by the following packages in the julia ecosystem:</p><ul><li><a href="https://github.com/JuliaML/MLDatasets.jl">MLDatasets.jl</a> focuses on downloading, unpacking, and accessing benchmark datasets.</li><li><a href="https://github.com/yuehhua/GraphMLDatasets.jl">GraphMLDatasets.jl</a>: a library for machine learning datasets on graph.</li></ul><h3 id="Plumbing"><a class="docs-heading-anchor" href="#Plumbing">Plumbing</a><a id="Plumbing-1"></a><a class="docs-heading-anchor-permalink" href="#Plumbing" title="Permalink"></a></h3><p>Tools to put data into the right order for creating a model.</p><ul><li><a href="https://github.com/Evizero/Augmentor.jl">Augmentor.jl</a> is a real-time library augmentation library for increasing the number of training images.</li><li><a href="https://github.com/lorenzoh/DataAugmentation.jl">DataAugmentation.jl</a> aims to make it easy to build stochastic, label-preserving augmentation pipelines for vision use cases involving images, keypoints and segmentation masks.</li><li><a href="https://github.com/JuliaML/MLUtils.jl">MLUtils.jl</a> (replaces <a href="https://github.com/JuliaML/MLDataUtils.jl">MLDataUtils.jl</a> and <a href="https://github.com/JuliaML/MLLabelUtils.jl">MLLabelUtils.jl</a>) is a library for processing Machine Learning datasets.</li></ul><h3 id="Parameters"><a class="docs-heading-anchor" href="#Parameters">Parameters</a><a id="Parameters-1"></a><a class="docs-heading-anchor-permalink" href="#Parameters" title="Permalink"></a></h3><ul><li><a href="https://github.com/darsnack/ParameterSchedulers.jl">ParameterSchedulers.jl</a> standard scheduling policies for machine learning.</li></ul><hr/><h2 id="Differentiable-programming"><a class="docs-heading-anchor" href="#Differentiable-programming">Differentiable programming</a><a id="Differentiable-programming-1"></a><a class="docs-heading-anchor-permalink" href="#Differentiable-programming" title="Permalink"></a></h2><p>Packages based on differentiable programming but not necessarily related to Machine Learning. </p><ul><li>The <a href="https://sciml.ai/">SciML</a> ecosystem uses Flux and Zygote to mix neural nets with differential equations, to get the best of black box and mechanistic modelling.</li><li><a href="https://github.com/SciML/DiffEqFlux.jl">DiffEqFlux.jl</a> provides tools for creating Neural Differential Equations.</li><li><a href="https://github.com/FluxML/Flux3D.jl">Flux3D.jl</a> shows off machine learning on 3D data.</li><li><a href="https://github.com/avik-pal/RayTracer.jl">RayTracer.jl</a> combines ML with computer vision via a differentiable renderer.</li><li><a href="https://github.com/tejank10/Duckietown.jl">Duckietown.jl</a> Differentiable Duckietown simulator.</li><li>The <a href="https://github.com/QuantumBFS/Yao.jl">Yao.jl</a> project uses Flux and Zygote for Quantum Differentiable Programming.</li><li><a href="https://github.com/Chemellia/AtomicGraphNets.jl">AtomicGraphNets.jl</a> enables learning graph based models on atomic systems used in chemistry.</li><li><a href="https://github.com/FluxML/DiffImages.jl">DiffImages.jl</a> differentiable computer vision modeling in Julia with the Images.jl ecosystem.</li></ul><h3 id="Probabilistic-programming"><a class="docs-heading-anchor" href="#Probabilistic-programming">Probabilistic programming</a><a id="Probabilistic-programming-1"></a><a class="docs-heading-anchor-permalink" href="#Probabilistic-programming" title="Permalink"></a></h3><ul><li><a href="https://github.com/TuringLang/Turing.jl">Turing.jl</a> extends Flux&#39;s differentiable programming capabilities to probabilistic programming.</li><li><a href="https://github.com/zenna/Omega.jl">Omega.jl</a> is a research project aimed at causal, higher-order probabilistic programming.</li><li><a href="https://github.com/willtebbutt/Stheno.jl">Stheno.jl</a> provides flexible Gaussian processes.</li></ul><h3 id="Statistics"><a class="docs-heading-anchor" href="#Statistics">Statistics</a><a id="Statistics-1"></a><a class="docs-heading-anchor-permalink" href="#Statistics" title="Permalink"></a></h3><ul><li><a href="https://github.com/joshday/OnlineStats.jl">OnlineStats.jl</a> provides single-pass algorithms for statistics.</li></ul><hr/><h2 id="Useful-miscellaneous-packages"><a class="docs-heading-anchor" href="#Useful-miscellaneous-packages">Useful miscellaneous packages</a><a id="Useful-miscellaneous-packages-1"></a><a class="docs-heading-anchor-permalink" href="#Useful-miscellaneous-packages" title="Permalink"></a></h2><p>Some useful and random packages!</p><ul><li><a href="https://github.com/rizalzaf/AdversarialPrediction.jl">AdversarialPrediction.jl</a> provides a way to easily optimise generic performance metrics in supervised learning settings using the <a href="https://arxiv.org/abs/1812.07526">Adversarial Prediction</a> framework.</li><li><a href="https://github.com/CTUAvastLab/Mill.jl">Mill.jl</a> helps to prototype flexible multi-instance learning models.</li><li><a href="https://github.com/JuliaML/MLMetrics.jl">MLMetrics.jl</a> is a utility for scoring models in data science and machine learning.</li><li><a href="https://github.com/FluxML/Torch.jl">Torch.jl</a> exposes torch in Julia.</li><li><a href="https://github.com/JuliaML/ValueHistories.jl">ValueHistories.jl</a> is a utility for efficient tracking of optimization histories, training curves or other information of arbitrary types and at arbitrarily spaced sampling times.</li><li><a href="https://github.com/slimgroup/InvertibleNetworks.jl/">InvertibleNetworks.jl</a> Building blocks for invertible neural networks in the Julia programming language.</li><li><a href="https://github.com/timholy/ProgressMeter.jl">ProgressMeter.jl</a> progress meters for long-running computations.</li><li><a href="https://github.com/PhilipVinc/TensorBoardLogger.jl">TensorBoardLogger.jl</a> easy peasy logging to <a href="https://www.tensorflow.org/tensorboard">tensorboard</a> in Julia</li><li><a href="https://github.com/carlobaldassi/ArgParse.jl">ArgParse.jl</a> is a package for parsing command-line arguments to Julia programs.</li><li><a href="https://github.com/mauro3/Parameters.jl">Parameters.jl</a> types with default field values, keyword constructors and (un-)pack macros.</li><li><a href="https://github.com/JuliaIO/BSON.jl">BSON.jl</a> is a package for working with the Binary JSON serialisation format.</li><li><a href="https://github.com/JuliaData/DataFrames.jl">DataFrames.jl</a> in-memory tabular data in Julia.</li><li><a href="https://github.com/JuliaDynamics/DrWatson.jl">DrWatson.jl</a> is a scientific project assistant software.</li></ul><p>This tight integration among Julia packages is shown in some of the examples in the <a href="https://github.com/FluxML/model-zoo">model-zoo</a> repository.</p><hr/><h2 id="Alternatives-to-Flux"><a class="docs-heading-anchor" href="#Alternatives-to-Flux">Alternatives to Flux</a><a id="Alternatives-to-Flux-1"></a><a class="docs-heading-anchor-permalink" href="#Alternatives-to-Flux" title="Permalink"></a></h2><p>Julia has several other libraries for making neural networks. </p><ul><li><p><a href="https://github.com/PumasAI/SimpleChains.jl">SimpleChains.jl</a> is focused on making small, simple, CPU-based, neural networks fast. Uses <a href="https://github.com/JuliaSIMD/LoopVectorization.jl">LoopVectorization.jl</a>. (Was <code>FastChain</code> in DiffEqFlux.jl) </p></li><li><p><a href="https://github.com/denizyuret/Knet.jl">Knet.jl</a> is a neural network library built around <a href="https://github.com/denizyuret/AutoGrad.jl">AutoGrad.jl</a>.</p></li><li><p><a href="https://github.com/avik-pal/Lux.jl">Lux.jl</a> (earlier ExplicitFluxLayers.jl) shares much of the design, use-case, and NNlib.jl / Optimisers.jl back-end of Flux. But instead of encapsulating all parameters within the model structure, it separates this into 3 components: a model, a tree of parameters, and a tree of model states.</p></li></ul><div class="admonition is-compat"><header class="admonition-header">Explicit or explicit?</header><div class="admonition-body"><p>Flux&#39;s <a href="../guide/training/training/#man-training">training docs</a> talk about changes from Zygote&#39;s implicit to explicit gradients, dictionary-like to tree-like structures. (See also <a href="https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1">Zygote&#39;s description</a> of these.) Lux also uses Zygote, but uses the word &quot;explicit&quot; to mean something unrelated, namely storing the tree of parameters (and of state) separately from the model.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../guide/performance/">« Performance Tips</a><a class="docs-footer-nextpage" href="../reference/models/layers/">Built-in Layers »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/gpu/index.html b/dev/guide/gpu/index.html
index cf7c008477..5f61b54351 100644
--- a/dev/guide/gpu/index.html
+++ b/dev/guide/gpu/index.html
@@ -169,10 +169,10 @@
 
 julia&gt; CUDA.device(dense_model.weight)
 CuDevice(1): GeForce RTX 2080 Ti
-</code></pre><p>Due to a limitation in <code>Metal.jl</code>, currently this kind of data movement across devices is only supported for <code>CUDA</code> and <code>AMDGPU</code> backends.</p><div class="admonition is-warning"><header class="admonition-header">Printing models after moving to a different device</header><div class="admonition-body"><p>Due to a limitation in how GPU packages currently work, printing models on the REPL after moving them to a GPU device which is different from the current device will lead to an error.</p></div></div><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.AbstractDevice" href="#Flux.AbstractDevice"><code>Flux.AbstractDevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Flux.AbstractDevice &lt;: Function</code></pre><p>An abstract type representing <code>device</code> objects for different GPU backends. The currently supported backends are <code>&quot;CUDA&quot;</code>, <code>&quot;AMDGPU&quot;</code>, <code>&quot;Metal&quot;</code> and <code>&quot;CPU&quot;</code>; the <code>&quot;CPU&quot;</code> backend is the fallback case when no GPU is available. GPU extensions of Flux define subtypes of this type.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L483-L488">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.FluxCPUDevice" href="#Flux.FluxCPUDevice"><code>Flux.FluxCPUDevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Flux.FluxCPUDevice &lt;: Flux.AbstractDevice</code></pre><p>A type representing <code>device</code> objects for the <code>&quot;CPU&quot;</code> backend for Flux. This is the fallback case when no GPU is available to Flux.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L512-L516">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.FluxCUDADevice" href="#Flux.FluxCUDADevice"><code>Flux.FluxCUDADevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">FluxCUDADevice &lt;: AbstractDevice</code></pre><p>A type representing <code>device</code> objects for the <code>&quot;CUDA&quot;</code> backend for Flux.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L524-L528">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.FluxAMDGPUDevice" href="#Flux.FluxAMDGPUDevice"><code>Flux.FluxAMDGPUDevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">FluxAMDGPUDevice &lt;: AbstractDevice</code></pre><p>A type representing <code>device</code> objects for the <code>&quot;AMDGPU&quot;</code> backend for Flux.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L533-L537">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.FluxMetalDevice" href="#Flux.FluxMetalDevice"><code>Flux.FluxMetalDevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">FluxMetalDevice &lt;: AbstractDevice</code></pre><p>A type representing <code>device</code> objects for the <code>&quot;Metal&quot;</code> backend for Flux.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L542-L546">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.supported_devices" href="#Flux.supported_devices"><code>Flux.supported_devices</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Flux.supported_devices()</code></pre><p>Get all supported backends for Flux, in order of preference.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Flux;
+</code></pre><p>Due to a limitation in <code>Metal.jl</code>, currently this kind of data movement across devices is only supported for <code>CUDA</code> and <code>AMDGPU</code> backends.</p><div class="admonition is-warning"><header class="admonition-header">Printing models after moving to a different device</header><div class="admonition-body"><p>Due to a limitation in how GPU packages currently work, printing models on the REPL after moving them to a GPU device which is different from the current device will lead to an error.</p></div></div><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.AbstractDevice" href="#Flux.AbstractDevice"><code>Flux.AbstractDevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Flux.AbstractDevice &lt;: Function</code></pre><p>An abstract type representing <code>device</code> objects for different GPU backends. The currently supported backends are <code>&quot;CUDA&quot;</code>, <code>&quot;AMDGPU&quot;</code>, <code>&quot;Metal&quot;</code> and <code>&quot;CPU&quot;</code>; the <code>&quot;CPU&quot;</code> backend is the fallback case when no GPU is available. GPU extensions of Flux define subtypes of this type.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L483-L488">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.FluxCPUDevice" href="#Flux.FluxCPUDevice"><code>Flux.FluxCPUDevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Flux.FluxCPUDevice &lt;: Flux.AbstractDevice</code></pre><p>A type representing <code>device</code> objects for the <code>&quot;CPU&quot;</code> backend for Flux. This is the fallback case when no GPU is available to Flux.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L512-L516">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.FluxCUDADevice" href="#Flux.FluxCUDADevice"><code>Flux.FluxCUDADevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">FluxCUDADevice &lt;: AbstractDevice</code></pre><p>A type representing <code>device</code> objects for the <code>&quot;CUDA&quot;</code> backend for Flux.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L524-L528">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.FluxAMDGPUDevice" href="#Flux.FluxAMDGPUDevice"><code>Flux.FluxAMDGPUDevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">FluxAMDGPUDevice &lt;: AbstractDevice</code></pre><p>A type representing <code>device</code> objects for the <code>&quot;AMDGPU&quot;</code> backend for Flux.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L533-L537">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.FluxMetalDevice" href="#Flux.FluxMetalDevice"><code>Flux.FluxMetalDevice</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">FluxMetalDevice &lt;: AbstractDevice</code></pre><p>A type representing <code>device</code> objects for the <code>&quot;Metal&quot;</code> backend for Flux.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L542-L546">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.supported_devices" href="#Flux.supported_devices"><code>Flux.supported_devices</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Flux.supported_devices()</code></pre><p>Get all supported backends for Flux, in order of preference.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Flux;
 
 julia&gt; Flux.supported_devices()
-(&quot;CUDA&quot;, &quot;AMDGPU&quot;, &quot;Metal&quot;, &quot;CPU&quot;)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L557-L570">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.get_device" href="#Flux.get_device"><code>Flux.get_device</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Flux.get_device(; verbose=false)::Flux.AbstractDevice</code></pre><p>Returns a <code>device</code> object for the most appropriate backend for the current Julia session. </p><p>First, the function checks whether a backend preference has been set via the <a href="#Flux.gpu_backend!"><code>Flux.gpu_backend!</code></a> function. If so, an attempt is made to load this backend. If the corresponding trigger package has been loaded and the backend is functional, a <code>device</code> corresponding to the given backend is loaded. Otherwise, the backend is chosen automatically. To update the backend preference, use <a href="#Flux.gpu_backend!"><code>Flux.gpu_backend!</code></a>.</p><p>If there is no preference, then for each of the <code>&quot;CUDA&quot;</code>, <code>&quot;AMDGPU&quot;</code>, <code>&quot;Metal&quot;</code> and <code>&quot;CPU&quot;</code> backends in the given order, this function checks whether the given backend has been loaded via the corresponding trigger package, and whether the backend is functional. If so, the <code>device</code> corresponding to the backend is returned. If no GPU backend is available, a <code>Flux.FluxCPUDevice</code> is returned.</p><p>If <code>verbose</code> is set to <code>true</code>, then the function prints informative log messages.</p><p><strong>Examples</strong></p><p>For the example given below, the backend preference was set to <code>&quot;AMDGPU&quot;</code> via the <a href="#Flux.gpu_backend!"><code>gpu_backend!</code></a> function.</p><pre><code class="language-julia-repl hljs">julia&gt; using Flux;
+(&quot;CUDA&quot;, &quot;AMDGPU&quot;, &quot;Metal&quot;, &quot;CPU&quot;)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L557-L570">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.get_device" href="#Flux.get_device"><code>Flux.get_device</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Flux.get_device(; verbose=false)::Flux.AbstractDevice</code></pre><p>Returns a <code>device</code> object for the most appropriate backend for the current Julia session. </p><p>First, the function checks whether a backend preference has been set via the <a href="#Flux.gpu_backend!"><code>Flux.gpu_backend!</code></a> function. If so, an attempt is made to load this backend. If the corresponding trigger package has been loaded and the backend is functional, a <code>device</code> corresponding to the given backend is loaded. Otherwise, the backend is chosen automatically. To update the backend preference, use <a href="#Flux.gpu_backend!"><code>Flux.gpu_backend!</code></a>.</p><p>If there is no preference, then for each of the <code>&quot;CUDA&quot;</code>, <code>&quot;AMDGPU&quot;</code>, <code>&quot;Metal&quot;</code> and <code>&quot;CPU&quot;</code> backends in the given order, this function checks whether the given backend has been loaded via the corresponding trigger package, and whether the backend is functional. If so, the <code>device</code> corresponding to the backend is returned. If no GPU backend is available, a <code>Flux.FluxCPUDevice</code> is returned.</p><p>If <code>verbose</code> is set to <code>true</code>, then the function prints informative log messages.</p><p><strong>Examples</strong></p><p>For the example given below, the backend preference was set to <code>&quot;AMDGPU&quot;</code> via the <a href="#Flux.gpu_backend!"><code>gpu_backend!</code></a> function.</p><pre><code class="language-julia-repl hljs">julia&gt; using Flux;
 
 julia&gt; model = Dense(2 =&gt; 3)
 Dense(2 =&gt; 3)       # 9 parameters
@@ -212,7 +212,7 @@
 3×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
   0.820013   0.527131
  -0.915589   0.549048
-  0.290744  -0.0592499</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L573-L636">source</a></section><section><div><pre><code class="language-julia hljs">Flux.get_device(backend::String, idx::Int = 0)::Flux.AbstractDevice</code></pre><p>Get a device object for a backend specified by the string <code>backend</code> and <code>idx</code>. The currently supported values of <code>backend</code> are <code>&quot;CUDA&quot;</code>, <code>&quot;AMDGPU&quot;</code> and <code>&quot;CPU&quot;</code>. <code>idx</code> must be an integer value between <code>0</code> and the number of available devices.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Flux, CUDA;
+  0.290744  -0.0592499</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L573-L636">source</a></section><section><div><pre><code class="language-julia hljs">Flux.get_device(backend::String, idx::Int = 0)::Flux.AbstractDevice</code></pre><p>Get a device object for a backend specified by the string <code>backend</code> and <code>idx</code>. The currently supported values of <code>backend</code> are <code>&quot;CUDA&quot;</code>, <code>&quot;AMDGPU&quot;</code> and <code>&quot;CPU&quot;</code>. <code>idx</code> must be an integer value between <code>0</code> and the number of available devices.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Flux, CUDA;
 
 julia&gt; CUDA.devices()
 CUDA.DeviceIterator() for 3 devices:
@@ -234,4 +234,4 @@
 
 julia&gt; cpu_device = Flux.get_device(&quot;CPU&quot;)
 (::Flux.FluxCPUDevice) (generic function with 1 method)
-</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L678-L711">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.gpu_backend!" href="#Flux.gpu_backend!"><code>Flux.gpu_backend!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">gpu_backend!(backend::String)</code></pre><p>Set the GPU backend to <code>backend</code> in the <code>LocalPreferences.toml</code> file in you project directory.  After restarting Julia, the new backend will affect all subsequent calls to <a href="../../reference/models/functors/#Flux.gpu-Tuple{Any}"><code>gpu</code></a> and <a href="#Flux.get_device"><code>get_device</code></a>.</p><p>The supported backends are <code>&quot;CUDA&quot;</code>, <code>&quot;AMDGPU&quot;</code>, <code>&quot;Metal&quot;</code> and <code>&quot;CPU&quot;</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L197-L204">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../models/recurrence/">« Recurrence</a><a class="docs-footer-nextpage" href="../saving/">Saving &amp; Loading »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L678-L711">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.gpu_backend!" href="#Flux.gpu_backend!"><code>Flux.gpu_backend!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">gpu_backend!(backend::String)</code></pre><p>Set the GPU backend to <code>backend</code> in the <code>LocalPreferences.toml</code> file in you project directory.  After restarting Julia, the new backend will affect all subsequent calls to <a href="../../reference/models/functors/#Flux.gpu-Tuple{Any}"><code>gpu</code></a> and <a href="#Flux.get_device"><code>get_device</code></a>.</p><p>The supported backends are <code>&quot;CUDA&quot;</code>, <code>&quot;AMDGPU&quot;</code>, <code>&quot;Metal&quot;</code> and <code>&quot;CPU&quot;</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L197-L204">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../models/recurrence/">« Recurrence</a><a class="docs-footer-nextpage" href="../saving/">Saving &amp; Loading »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/models/basics/index.html b/dev/guide/models/basics/index.html
index 7ee2332bb4..b87accf0ba 100644
--- a/dev/guide/models/basics/index.html
+++ b/dev/guide/models/basics/index.html
@@ -109,4 +109,4 @@
   return Affine(W, b)
 end
 
-Affine(3 =&gt; 1, bias=false) |&gt; gpu</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../overview/">« Fitting a Line</a><a class="docs-footer-nextpage" href="../custom_layers/">Custom Layers »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+Affine(3 =&gt; 1, bias=false) |&gt; gpu</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../overview/">« Fitting a Line</a><a class="docs-footer-nextpage" href="../custom_layers/">Custom Layers »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/models/custom_layers/index.html b/dev/guide/models/custom_layers/index.html
index 700788ac60..88d9732f4e 100644
--- a/dev/guide/models/custom_layers/index.html
+++ b/dev/guide/models/custom_layers/index.html
@@ -104,4 +104,4 @@
   # rms over all the mse
   ŷs = model(x)
   return sqrt(mean(Flux.mse(y, ŷ) for (y, ŷ) in zip(ys, ŷs)))
-end</code></pre><div class="admonition is-info"><header class="admonition-header">Note</header><div class="admonition-body"><p>This <code>Split</code> layer is available from the <a href="https://github.com/FluxML/Fluxperimental.jl">Fluxperimental.jl</a> package.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../basics/">« Gradients and Layers</a><a class="docs-footer-nextpage" href="../../training/training/">Training »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+end</code></pre><div class="admonition is-info"><header class="admonition-header">Note</header><div class="admonition-body"><p>This <code>Split</code> layer is available from the <a href="https://github.com/FluxML/Fluxperimental.jl">Fluxperimental.jl</a> package.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../basics/">« Gradients and Layers</a><a class="docs-footer-nextpage" href="../../training/training/">Training »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/models/overview/index.html b/dev/guide/models/overview/index.html
index 043d3f7fe2..0f0c514554 100644
--- a/dev/guide/models/overview/index.html
+++ b/dev/guide/models/overview/index.html
@@ -56,4 +56,4 @@
 
 julia&gt; y_test
 1×5 Matrix{Int64}:
- 26  30  34  38  42</code></pre><p>The predictions are good. Here&#39;s how we got there. </p><p>First, we gathered real-world data into the variables <code>x_train</code>, <code>y_train</code>, <code>x_test</code>, and <code>y_test</code>. The <code>x_*</code> data defines inputs, and the <code>y_*</code> data defines outputs. The <code>*_train</code> data is for training the model, and the <code>*_test</code> data is for verifying the model. Our data was based on the function <code>4x + 2</code>.</p><p>Then, we built a single input, single output predictive model, <code>predict = Dense(1 =&gt; 1)</code>. The initial predictions weren&#39;t accurate, because we had not trained the model yet.</p><p>After building the model, we trained it with <code>train!(loss, predict, data, opt)</code>. The loss function is first, followed by the model itself, the training data, and the <code>Descent</code> optimiser provided by Flux. We ran the training step once, and observed that the parameters changed and the loss went down. Then, we ran the <code>train!</code> many times to finish the training process.</p><p>After we trained the model, we verified it with the test data to verify the results. </p><p>This overall flow represents how Flux works. Let&#39;s drill down a bit to understand what&#39;s going on inside the individual layers of Flux.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../quickstart/">« Quick Start</a><a class="docs-footer-nextpage" href="../basics/">Gradients and Layers »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+ 26  30  34  38  42</code></pre><p>The predictions are good. Here&#39;s how we got there. </p><p>First, we gathered real-world data into the variables <code>x_train</code>, <code>y_train</code>, <code>x_test</code>, and <code>y_test</code>. The <code>x_*</code> data defines inputs, and the <code>y_*</code> data defines outputs. The <code>*_train</code> data is for training the model, and the <code>*_test</code> data is for verifying the model. Our data was based on the function <code>4x + 2</code>.</p><p>Then, we built a single input, single output predictive model, <code>predict = Dense(1 =&gt; 1)</code>. The initial predictions weren&#39;t accurate, because we had not trained the model yet.</p><p>After building the model, we trained it with <code>train!(loss, predict, data, opt)</code>. The loss function is first, followed by the model itself, the training data, and the <code>Descent</code> optimiser provided by Flux. We ran the training step once, and observed that the parameters changed and the loss went down. Then, we ran the <code>train!</code> many times to finish the training process.</p><p>After we trained the model, we verified it with the test data to verify the results. </p><p>This overall flow represents how Flux works. Let&#39;s drill down a bit to understand what&#39;s going on inside the individual layers of Flux.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../quickstart/">« Quick Start</a><a class="docs-footer-nextpage" href="../basics/">Gradients and Layers »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/models/quickstart/index.html b/dev/guide/models/quickstart/index.html
index bc6d328dac..a6db7fa207 100644
--- a/dev/guide/models/quickstart/index.html
+++ b/dev/guide/models/quickstart/index.html
@@ -59,4 +59,4 @@
         y_hat = m(x)
         Flux.logitcrossentropy(y_hat, y)
     end
-end</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../../">« Welcome</a><a class="docs-footer-nextpage" href="../overview/">Fitting a Line »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+end</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../../">« Welcome</a><a class="docs-footer-nextpage" href="../overview/">Fitting a Line »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/models/recurrence/index.html b/dev/guide/models/recurrence/index.html
index 4dbed0c3ad..8d2bb0bcd5 100644
--- a/dev/guide/models/recurrence/index.html
+++ b/dev/guide/models/recurrence/index.html
@@ -99,4 +99,4 @@
 true</code></pre><p>In many situations, such as when dealing with a language model, the sentences in each batch are independent (i.e. the last item of the first sentence of the first batch is independent from the first item of the first sentence of the second batch), so we cannot handle the model as if each batch was the direct continuation of the previous one. To handle such situations, we need to reset the state of the model between each batch, which can be conveniently performed within the loss function:</p><pre><code class="language-julia hljs">function loss(x, y)
   Flux.reset!(m)
   sum(mse(m(xi), yi) for (xi, yi) in zip(x, y))
-end</code></pre><p>A potential source of ambiguity with RNN in Flux can come from the different data layout compared to some common frameworks where data is typically a 3 dimensional array: <code>(features, seq length, samples)</code>. In Flux, those 3 dimensions are provided through a vector of seq length containing a matrix <code>(features, samples)</code>.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../training/training/">« Training</a><a class="docs-footer-nextpage" href="../../gpu/">GPU Support »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+end</code></pre><p>A potential source of ambiguity with RNN in Flux can come from the different data layout compared to some common frameworks where data is typically a 3 dimensional array: <code>(features, seq length, samples)</code>. In Flux, those 3 dimensions are provided through a vector of seq length containing a matrix <code>(features, samples)</code>.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../training/training/">« Training</a><a class="docs-footer-nextpage" href="../../gpu/">GPU Support »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/performance/index.html b/dev/guide/performance/index.html
index 6e4c5e52cb..58861e483a 100644
--- a/dev/guide/performance/index.html
+++ b/dev/guide/performance/index.html
@@ -14,4 +14,4 @@
 function loss_total(x_batch::Matrix, y_batch::Matrix)
     y_preds = model(x_batch)
     sum(loss.(y_preds, y_batch))
-end</code></pre><p>When doing this kind of concatenation use <code>reduce(hcat, xs)</code> rather than <code>hcat(xs...)</code>. This will avoid the splatting penalty, and will hit the optimised <code>reduce</code> method.</p><h2 id="Be-aware-of-GPU-memory-inefficiencies"><a class="docs-heading-anchor" href="#Be-aware-of-GPU-memory-inefficiencies">Be aware of GPU memory inefficiencies</a><a id="Be-aware-of-GPU-memory-inefficiencies-1"></a><a class="docs-heading-anchor-permalink" href="#Be-aware-of-GPU-memory-inefficiencies" title="Permalink"></a></h2><p>Currently, GPU memory is not handled as well as system memory. If your training loop is allocating significantly on the GPU, you can quickly fill your GPU memory and the piecemeal reclamation and shuffling of data between GPU and system memory can become extremely slow. If profiling shows that a significant portion of time is spent in the <code>gpu</code> function and your data sizes are not large, this may be the cause. Running an incremental garbage collection manually (<code>GC.gc(false)</code>) at regular intervals can keep your GPU memory free and responsive. See other tips for CUDA memory management <a href="https://cuda.juliagpu.org/stable/usage/memory/">here</a>.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../saving/">« Saving &amp; Loading</a><a class="docs-footer-nextpage" href="../../ecosystem/">Ecosystem »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+end</code></pre><p>When doing this kind of concatenation use <code>reduce(hcat, xs)</code> rather than <code>hcat(xs...)</code>. This will avoid the splatting penalty, and will hit the optimised <code>reduce</code> method.</p><h2 id="Be-aware-of-GPU-memory-inefficiencies"><a class="docs-heading-anchor" href="#Be-aware-of-GPU-memory-inefficiencies">Be aware of GPU memory inefficiencies</a><a id="Be-aware-of-GPU-memory-inefficiencies-1"></a><a class="docs-heading-anchor-permalink" href="#Be-aware-of-GPU-memory-inefficiencies" title="Permalink"></a></h2><p>Currently, GPU memory is not handled as well as system memory. If your training loop is allocating significantly on the GPU, you can quickly fill your GPU memory and the piecemeal reclamation and shuffling of data between GPU and system memory can become extremely slow. If profiling shows that a significant portion of time is spent in the <code>gpu</code> function and your data sizes are not large, this may be the cause. Running an incremental garbage collection manually (<code>GC.gc(false)</code>) at regular intervals can keep your GPU memory free and responsive. See other tips for CUDA memory management <a href="https://cuda.juliagpu.org/stable/usage/memory/">here</a>.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../saving/">« Saving &amp; Loading</a><a class="docs-footer-nextpage" href="../../ecosystem/">Ecosystem »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/saving/index.html b/dev/guide/saving/index.html
index e924e3718c..874994d835 100644
--- a/dev/guide/saving/index.html
+++ b/dev/guide/saving/index.html
@@ -59,4 +59,4 @@
 Chain(
   Dense(10 =&gt; 5, relu),                 # 55 parameters
   Dense(5 =&gt; 2),                        # 12 parameters
-)                   # Total: 4 arrays, 67 parameters, 524 bytes.</code></pre><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>Saving models this way could lead to compatibility issues across julia versions and across Flux versions if some of the Flux layers&#39; internals are changed. It is therefore not recommended for long term storage, use <a href="../../reference/destructure/#Flux.state"><code>Flux.state</code></a> instead.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../gpu/">« GPU Support</a><a class="docs-footer-nextpage" href="../performance/">Performance Tips »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+)                   # Total: 4 arrays, 67 parameters, 524 bytes.</code></pre><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>Saving models this way could lead to compatibility issues across julia versions and across Flux versions if some of the Flux layers&#39; internals are changed. It is therefore not recommended for long term storage, use <a href="../../reference/destructure/#Flux.state"><code>Flux.state</code></a> instead.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../gpu/">« GPU Support</a><a class="docs-footer-nextpage" href="../performance/">Performance Tips »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/guide/training/training/index.html b/dev/guide/training/training/index.html
index cd65104259..b9b17a849c 100644
--- a/dev/guide/training/training/index.html
+++ b/dev/guide/training/training/index.html
@@ -118,4 +118,4 @@
 train!(loss, bimodel, data, opt_state)
 
 # Un-freeze the entire model:
-Flux.thaw!(opt_state)</code></pre><p>While <code>adjust!</code> and <code>freeze!</code>/<code>thaw!</code> make temporary modifications to the optimiser state, permanently removing some fields of a new layer type from training is usually done when defining the layer, by calling for example <a href="../../../reference/models/functors/#Flux.@layer"><code>@layer</code></a><code>NewLayer trainable=(weight,)</code>.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../models/custom_layers/">« Custom Layers</a><a class="docs-footer-nextpage" href="../../models/recurrence/">Recurrence »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+Flux.thaw!(opt_state)</code></pre><p>While <code>adjust!</code> and <code>freeze!</code>/<code>thaw!</code> make temporary modifications to the optimiser state, permanently removing some fields of a new layer type from training is usually done when defining the layer, by calling for example <a href="../../../reference/models/functors/#Flux.@layer"><code>@layer</code></a><code>NewLayer trainable=(weight,)</code>.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../models/custom_layers/">« Custom Layers</a><a class="docs-footer-nextpage" href="../../models/recurrence/">Recurrence »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/index.html b/dev/index.html
index b800eff5b4..e4b186e9eb 100644
--- a/dev/index.html
+++ b/dev/index.html
@@ -3,4 +3,4 @@
   function gtag(){dataLayer.push(arguments);}
   gtag('js', new Date());
   gtag('config', 'UA-36890222-9', {'page_path': location.pathname + location.search + location.hash});
-</script><script data-outdated-warner src="assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="assets/documenter.js"></script><script src="search_index.js"></script><script src="siteinfo.js"></script><script src="../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="assets/themeswap.js"></script><link href="assets/flux.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href><img class="docs-light-only" src="assets/logo.png" alt="Flux logo"/><img class="docs-dark-only" src="assets/logo-dark.png" alt="Flux logo"/></a><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li class="is-active"><a class="tocitem" href>Welcome</a></li><li><span class="tocitem">Guide</span><ul><li><a class="tocitem" href="guide/models/quickstart/">Quick Start</a></li><li><a class="tocitem" href="guide/models/overview/">Fitting a Line</a></li><li><a class="tocitem" href="guide/models/basics/">Gradients and Layers</a></li><li><a class="tocitem" href="guide/models/custom_layers/">Custom Layers</a></li><li><a class="tocitem" href="guide/training/training/">Training</a></li><li><a class="tocitem" href="guide/models/recurrence/">Recurrence</a></li><li><a class="tocitem" href="guide/gpu/">GPU Support</a></li><li><a class="tocitem" href="guide/saving/">Saving &amp; Loading</a></li><li><a class="tocitem" href="guide/performance/">Performance Tips</a></li></ul></li><li><a class="tocitem" href="ecosystem/">Ecosystem</a></li><li><span class="tocitem">Reference</span><ul><li><a class="tocitem" href="reference/models/layers/">Built-in Layers</a></li><li><a class="tocitem" href="reference/models/activation/">Activation Functions</a></li><li><a class="tocitem" href="reference/utilities/">Weight Initialisation</a></li><li><a class="tocitem" href="reference/models/losses/">Loss Functions</a></li><li><a class="tocitem" href="reference/training/reference/">Training API</a></li><li><a class="tocitem" href="reference/training/optimisers/">Optimisation Rules</a></li><li><a class="tocitem" href="reference/outputsize/">Shape Inference</a></li><li><a class="tocitem" href="reference/destructure/">Flat vs. Nested</a></li><li><a class="tocitem" href="reference/training/callbacks/">Callback Helpers</a></li><li><a class="tocitem" href="reference/training/zygote/">Gradients – Zygote.jl</a></li><li><a class="tocitem" href="reference/data/mlutils/">Batching Data – MLUtils.jl</a></li><li><a class="tocitem" href="reference/data/onehot/">OneHotArrays.jl</a></li><li><a class="tocitem" href="reference/models/nnlib/">Low-level Operations – NNlib.jl</a></li><li><a class="tocitem" href="reference/models/functors/">Nested Structures – Functors.jl</a></li></ul></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="tutorials/linear_regression/">Linear Regression</a></li><li><a class="tocitem" href="tutorials/logistic_regression/">Logistic Regression</a></li><li><a class="tocitem" href="tutorials/model_zoo/">Model Zoo</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li class="is-active"><a href>Welcome</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Welcome</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl" title="View the repository on GitHub"><span class="docs-icon fa-brands"></span><span class="docs-label is-hidden-touch">GitHub</span></a><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/index.md" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Flux:-The-Julia-Machine-Learning-Library"><a class="docs-heading-anchor" href="#Flux:-The-Julia-Machine-Learning-Library">Flux: The Julia Machine Learning Library</a><a id="Flux:-The-Julia-Machine-Learning-Library-1"></a><a class="docs-heading-anchor-permalink" href="#Flux:-The-Julia-Machine-Learning-Library" title="Permalink"></a></h1><p>Flux is a library for machine learning. It comes &quot;batteries-included&quot; with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:</p><ul><li><strong>Doing the obvious thing</strong>. Flux has relatively few explicit APIs. Instead, writing down the mathematical form will work – and be fast.</li><li><strong>Extensible by default</strong>. Flux is written to be highly flexible while being performant. Extending Flux is as simple as using your own code as part of the model you want - it is all <a href="https://github.com/FluxML/Flux.jl/tree/master/src">high-level Julia code</a>.</li><li><strong>Play nicely with others</strong>. Flux works well with unrelated Julia libraries from <a href="https://github.com/JuliaImages/Images.jl">images</a> to <a href="https://github.com/SciML/DifferentialEquations.jl">differential equation solvers</a>, rather than duplicating them.</li></ul><h3 id="Installation"><a class="docs-heading-anchor" href="#Installation">Installation</a><a id="Installation-1"></a><a class="docs-heading-anchor-permalink" href="#Installation" title="Permalink"></a></h3><p>Download <a href="https://julialang.org/downloads/">Julia 1.9</a> or later, preferably the current stable release. You can add Flux using Julia&#39;s package manager, by typing <code>] add Flux</code> in the Julia prompt.  For Nvidia GPU support, you will also need to install the <code>CUDA</code> and the <code>cuDNN</code> packages. For AMD GPU support, install the <code>AMDGPU</code> package. For acceleration on Apple Silicon, install the <code>Metal</code> package.</p><h3 id="Learning-Flux"><a class="docs-heading-anchor" href="#Learning-Flux">Learning Flux</a><a id="Learning-Flux-1"></a><a class="docs-heading-anchor-permalink" href="#Learning-Flux" title="Permalink"></a></h3><p>The <strong><a href="guide/models/quickstart/#man-quickstart">quick start</a></strong> page trains a simple neural network.</p><p>This rest of the <strong>guide</strong> provides a from-scratch introduction to Flux&#39;s take on models and how they work, starting with <a href="guide/models/overview/#man-overview">fitting a line</a>. Once you understand these docs, congratulations, you also understand <a href="https://github.com/FluxML/Flux.jl">Flux&#39;s source code</a>, which is intended to be concise, legible and a good reference for more advanced concepts.</p><p>There are some <strong>tutorials</strong> about building particular models. The <strong><a href="https://github.com/FluxML/model-zoo/">model zoo</a></strong> has starting points for many other common ones. And finally, the <strong><a href="ecosystem/">ecosystem page</a></strong> lists packages which define Flux models.</p><p>The <strong>reference</strong> section includes, beside Flux&#39;s own functions, those of some companion packages: <a href="https://github.com/FluxML/Zygote.jl">Zygote.jl</a> (automatic differentiation), <a href="https://github.com/FluxML/Optimisers.jl">Optimisers.jl</a> (training) and others.</p><h3 id="Community"><a class="docs-heading-anchor" href="#Community">Community</a><a id="Community-1"></a><a class="docs-heading-anchor-permalink" href="#Community" title="Permalink"></a></h3><p>Everyone is welcome to join our community on the <a href="https://discourse.julialang.org/">Julia discourse forum</a>, or the <a href="https://discourse.julialang.org/t/announcing-a-julia-slack/4866">slack chat</a> (channel #machine-learning). If you have questions or issues we&#39;ll try to help you out.</p><p>If you&#39;re interested in hacking on Flux, the <a href="https://github.com/FluxML/Flux.jl">source code</a> is open and easy to understand – it&#39;s all just the same Julia code you work with normally. You might be interested in our <a href="https://github.com/FluxML/Flux.jl/labels/good%20first%20issue">intro issues</a> to get started, or our <a href="https://github.com/FluxML/Flux.jl/blob/master/CONTRIBUTING.md">contributing guide</a>.</p></article><nav class="docs-footer"><a class="docs-footer-nextpage" href="guide/models/quickstart/">Quick Start »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+</script><script data-outdated-warner src="assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="assets/documenter.js"></script><script src="search_index.js"></script><script src="siteinfo.js"></script><script src="../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="assets/themeswap.js"></script><link href="assets/flux.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href><img class="docs-light-only" src="assets/logo.png" alt="Flux logo"/><img class="docs-dark-only" src="assets/logo-dark.png" alt="Flux logo"/></a><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li class="is-active"><a class="tocitem" href>Welcome</a></li><li><span class="tocitem">Guide</span><ul><li><a class="tocitem" href="guide/models/quickstart/">Quick Start</a></li><li><a class="tocitem" href="guide/models/overview/">Fitting a Line</a></li><li><a class="tocitem" href="guide/models/basics/">Gradients and Layers</a></li><li><a class="tocitem" href="guide/models/custom_layers/">Custom Layers</a></li><li><a class="tocitem" href="guide/training/training/">Training</a></li><li><a class="tocitem" href="guide/models/recurrence/">Recurrence</a></li><li><a class="tocitem" href="guide/gpu/">GPU Support</a></li><li><a class="tocitem" href="guide/saving/">Saving &amp; Loading</a></li><li><a class="tocitem" href="guide/performance/">Performance Tips</a></li></ul></li><li><a class="tocitem" href="ecosystem/">Ecosystem</a></li><li><span class="tocitem">Reference</span><ul><li><a class="tocitem" href="reference/models/layers/">Built-in Layers</a></li><li><a class="tocitem" href="reference/models/activation/">Activation Functions</a></li><li><a class="tocitem" href="reference/utilities/">Weight Initialisation</a></li><li><a class="tocitem" href="reference/models/losses/">Loss Functions</a></li><li><a class="tocitem" href="reference/training/reference/">Training API</a></li><li><a class="tocitem" href="reference/training/optimisers/">Optimisation Rules</a></li><li><a class="tocitem" href="reference/outputsize/">Shape Inference</a></li><li><a class="tocitem" href="reference/destructure/">Flat vs. Nested</a></li><li><a class="tocitem" href="reference/training/callbacks/">Callback Helpers</a></li><li><a class="tocitem" href="reference/training/zygote/">Gradients – Zygote.jl</a></li><li><a class="tocitem" href="reference/data/mlutils/">Batching Data – MLUtils.jl</a></li><li><a class="tocitem" href="reference/data/onehot/">OneHotArrays.jl</a></li><li><a class="tocitem" href="reference/models/nnlib/">Low-level Operations – NNlib.jl</a></li><li><a class="tocitem" href="reference/models/functors/">Nested Structures – Functors.jl</a></li></ul></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="tutorials/linear_regression/">Linear Regression</a></li><li><a class="tocitem" href="tutorials/logistic_regression/">Logistic Regression</a></li><li><a class="tocitem" href="tutorials/model_zoo/">Model Zoo</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li class="is-active"><a href>Welcome</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Welcome</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl" title="View the repository on GitHub"><span class="docs-icon fa-brands"></span><span class="docs-label is-hidden-touch">GitHub</span></a><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/index.md" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Flux:-The-Julia-Machine-Learning-Library"><a class="docs-heading-anchor" href="#Flux:-The-Julia-Machine-Learning-Library">Flux: The Julia Machine Learning Library</a><a id="Flux:-The-Julia-Machine-Learning-Library-1"></a><a class="docs-heading-anchor-permalink" href="#Flux:-The-Julia-Machine-Learning-Library" title="Permalink"></a></h1><p>Flux is a library for machine learning. It comes &quot;batteries-included&quot; with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:</p><ul><li><strong>Doing the obvious thing</strong>. Flux has relatively few explicit APIs. Instead, writing down the mathematical form will work – and be fast.</li><li><strong>Extensible by default</strong>. Flux is written to be highly flexible while being performant. Extending Flux is as simple as using your own code as part of the model you want - it is all <a href="https://github.com/FluxML/Flux.jl/tree/master/src">high-level Julia code</a>.</li><li><strong>Play nicely with others</strong>. Flux works well with unrelated Julia libraries from <a href="https://github.com/JuliaImages/Images.jl">images</a> to <a href="https://github.com/SciML/DifferentialEquations.jl">differential equation solvers</a>, rather than duplicating them.</li></ul><h3 id="Installation"><a class="docs-heading-anchor" href="#Installation">Installation</a><a id="Installation-1"></a><a class="docs-heading-anchor-permalink" href="#Installation" title="Permalink"></a></h3><p>Download <a href="https://julialang.org/downloads/">Julia 1.9</a> or later, preferably the current stable release. You can add Flux using Julia&#39;s package manager, by typing <code>] add Flux</code> in the Julia prompt.  For Nvidia GPU support, you will also need to install the <code>CUDA</code> and the <code>cuDNN</code> packages. For AMD GPU support, install the <code>AMDGPU</code> package. For acceleration on Apple Silicon, install the <code>Metal</code> package.</p><h3 id="Learning-Flux"><a class="docs-heading-anchor" href="#Learning-Flux">Learning Flux</a><a id="Learning-Flux-1"></a><a class="docs-heading-anchor-permalink" href="#Learning-Flux" title="Permalink"></a></h3><p>The <strong><a href="guide/models/quickstart/#man-quickstart">quick start</a></strong> page trains a simple neural network.</p><p>This rest of the <strong>guide</strong> provides a from-scratch introduction to Flux&#39;s take on models and how they work, starting with <a href="guide/models/overview/#man-overview">fitting a line</a>. Once you understand these docs, congratulations, you also understand <a href="https://github.com/FluxML/Flux.jl">Flux&#39;s source code</a>, which is intended to be concise, legible and a good reference for more advanced concepts.</p><p>There are some <strong>tutorials</strong> about building particular models. The <strong><a href="https://github.com/FluxML/model-zoo/">model zoo</a></strong> has starting points for many other common ones. And finally, the <strong><a href="ecosystem/">ecosystem page</a></strong> lists packages which define Flux models.</p><p>The <strong>reference</strong> section includes, beside Flux&#39;s own functions, those of some companion packages: <a href="https://github.com/FluxML/Zygote.jl">Zygote.jl</a> (automatic differentiation), <a href="https://github.com/FluxML/Optimisers.jl">Optimisers.jl</a> (training) and others.</p><h3 id="Community"><a class="docs-heading-anchor" href="#Community">Community</a><a id="Community-1"></a><a class="docs-heading-anchor-permalink" href="#Community" title="Permalink"></a></h3><p>Everyone is welcome to join our community on the <a href="https://discourse.julialang.org/">Julia discourse forum</a>, or the <a href="https://discourse.julialang.org/t/announcing-a-julia-slack/4866">slack chat</a> (channel #machine-learning). If you have questions or issues we&#39;ll try to help you out.</p><p>If you&#39;re interested in hacking on Flux, the <a href="https://github.com/FluxML/Flux.jl">source code</a> is open and easy to understand – it&#39;s all just the same Julia code you work with normally. You might be interested in our <a href="https://github.com/FluxML/Flux.jl/labels/good%20first%20issue">intro issues</a> to get started, or our <a href="https://github.com/FluxML/Flux.jl/blob/master/CONTRIBUTING.md">contributing guide</a>.</p></article><nav class="docs-footer"><a class="docs-footer-nextpage" href="guide/models/quickstart/">Quick Start »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/objects.inv b/dev/objects.inv
index f361a5491c..af88217e97 100644
Binary files a/dev/objects.inv and b/dev/objects.inv differ
diff --git a/dev/reference/data/mlutils/index.html b/dev/reference/data/mlutils/index.html
index 5c0c5c74e3..b65e548aa7 100644
--- a/dev/reference/data/mlutils/index.html
+++ b/dev/reference/data/mlutils/index.html
@@ -570,4 +570,4 @@
 julia&gt; zeros_like(x, Float64)
 2×2 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
  0.0  0.0
- 0.0  0.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaML/MLUtils.jl/blob/v0.4.4/src/utils.jl#L566-L603">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../training/zygote/">« Gradients – Zygote.jl</a><a class="docs-footer-nextpage" href="../onehot/">OneHotArrays.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+ 0.0  0.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaML/MLUtils.jl/blob/v0.4.4/src/utils.jl#L566-L603">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../training/zygote/">« Gradients – Zygote.jl</a><a class="docs-footer-nextpage" href="../onehot/">OneHotArrays.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/data/onehot/index.html b/dev/reference/data/onehot/index.html
index c3efd4beaf..8dd4a17c9c 100644
--- a/dev/reference/data/onehot/index.html
+++ b/dev/reference/data/onehot/index.html
@@ -73,4 +73,4 @@
  3  6  15  3  9  3  12  3  6  15  3</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/OneHotArrays.jl/blob/v0.2.5/src/onehot.jl#L50-L83">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="OneHotArrays.OneHotArray" href="#OneHotArrays.OneHotArray"><code>OneHotArrays.OneHotArray</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">OneHotArray{T, N, M, I} &lt;: AbstractArray{Bool, M}
 OneHotArray(indices, L)</code></pre><p>A one-hot <code>M</code>-dimensional array with <code>L</code> labels (i.e. <code>size(A, 1) == L</code> and <code>sum(A, dims=1) == 1</code>) stored as a compact <code>N == M-1</code>-dimensional array of indices.</p><p>Typically constructed by <a href="#OneHotArrays.onehot"><code>onehot</code></a> and <a href="#OneHotArrays.onehotbatch"><code>onehotbatch</code></a>. Parameter <code>I</code> is the type of the underlying storage, and <code>T</code> its eltype.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/OneHotArrays.jl/blob/v0.2.5/src/array.jl#L1-L10">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="OneHotArrays.OneHotVector" href="#OneHotArrays.OneHotVector"><code>OneHotArrays.OneHotVector</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">OneHotVector{T} = OneHotArray{T, 0, 1, T}
 OneHotVector(indices, L)</code></pre><p>A one-hot vector with <code>L</code> labels (i.e. <code>length(A) == L</code> and <code>count(A) == 1</code>) typically constructed by <a href="#OneHotArrays.onehot"><code>onehot</code></a>. Stored efficiently as a single index of type <code>T</code>, usually <code>UInt32</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/OneHotArrays.jl/blob/v0.2.5/src/array.jl#L23-L29">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="OneHotArrays.OneHotMatrix" href="#OneHotArrays.OneHotMatrix"><code>OneHotArrays.OneHotMatrix</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">OneHotMatrix{T, I} = OneHotArray{T, 1, 2, I}
-OneHotMatrix(indices, L)</code></pre><p>A one-hot matrix (with <code>L</code> labels) typically constructed using <a href="#OneHotArrays.onehotbatch"><code>onehotbatch</code></a>. Stored efficiently as a vector of indices with type <code>I</code> and eltype <code>T</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/OneHotArrays.jl/blob/v0.2.5/src/array.jl#L33-L39">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../mlutils/">« Batching Data – MLUtils.jl</a><a class="docs-footer-nextpage" href="../../models/nnlib/">Low-level Operations – NNlib.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+OneHotMatrix(indices, L)</code></pre><p>A one-hot matrix (with <code>L</code> labels) typically constructed using <a href="#OneHotArrays.onehotbatch"><code>onehotbatch</code></a>. Stored efficiently as a vector of indices with type <code>I</code> and eltype <code>T</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/OneHotArrays.jl/blob/v0.2.5/src/array.jl#L33-L39">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../mlutils/">« Batching Data – MLUtils.jl</a><a class="docs-footer-nextpage" href="../../models/nnlib/">Low-level Operations – NNlib.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/destructure/index.html b/dev/reference/destructure/index.html
index 8cd663a592..fcab366ad7 100644
--- a/dev/reference/destructure/index.html
+++ b/dev/reference/destructure/index.html
@@ -106,7 +106,7 @@
 L2 (generic function with 1 method)
 
 julia&gt; L2(m2) isa Float32
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L569-L611">source</a></section></article><h3 id="Save-and-Load"><a class="docs-heading-anchor" href="#Save-and-Load">Save and Load</a><a id="Save-and-Load-1"></a><a class="docs-heading-anchor-permalink" href="#Save-and-Load" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.state" href="#Flux.state"><code>Flux.state</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">state(x)</code></pre><p>Return an object with the same nested structure as <code>x</code> according to <code>Functors.children</code>,  but made only of basic containers (e.g. named tuples, tuples, arrays, and dictionaries).</p><p>Besides trainable and non-trainable arrays, the state will contain leaf nodes that are not arrays, such as numbers, symbols, strings, and nothing values. The leaf types that end up in the state could increase in the future.</p><p>This method is particularly useful for saving and loading models,  since the state contain only simple data types that can be easily serialized.</p><p>The state can be passed to <a href="#Flux.loadmodel!"><code>loadmodel!</code></a> to restore the model.</p><p><strong>Examples</strong></p><p><strong>Copy the state into another model</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m1 = Chain(Dense(1, 2, tanh; init=ones), Dense(2, 1; init=ones));
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L569-L611">source</a></section></article><h3 id="Save-and-Load"><a class="docs-heading-anchor" href="#Save-and-Load">Save and Load</a><a id="Save-and-Load-1"></a><a class="docs-heading-anchor-permalink" href="#Save-and-Load" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.state" href="#Flux.state"><code>Flux.state</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">state(x)</code></pre><p>Return an object with the same nested structure as <code>x</code> according to <code>Functors.children</code>,  but made only of basic containers (e.g. named tuples, tuples, arrays, and dictionaries).</p><p>Besides trainable and non-trainable arrays, the state will contain leaf nodes that are not arrays, such as numbers, symbols, strings, and nothing values. The leaf types that end up in the state could increase in the future.</p><p>This method is particularly useful for saving and loading models,  since the state contain only simple data types that can be easily serialized.</p><p>The state can be passed to <a href="#Flux.loadmodel!"><code>loadmodel!</code></a> to restore the model.</p><p><strong>Examples</strong></p><p><strong>Copy the state into another model</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m1 = Chain(Dense(1, 2, tanh; init=ones), Dense(2, 1; init=ones));
 
 julia&gt; s = Flux.state(m1)
 (layers = ((weight = [1.0; 1.0;;], bias = [0.0, 0.0], σ = ()), (weight = [1.0 1.0], bias = [0.0], σ = ())),)
@@ -132,7 +132,7 @@
 
 julia&gt; JLD2.jldsave(&quot;checkpoint.jld2&quot;, model_state = s)
 
-julia&gt; Flux.loadmodel!(m2, JLD2.load(&quot;checkpoint.jld2&quot;, &quot;model_state&quot;))</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/loading.jl#L112-L172">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.loadmodel!" href="#Flux.loadmodel!"><code>Flux.loadmodel!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">loadmodel!(dst, src)</code></pre><p>Copy all the parameters (trainable and non-trainable) from <code>src</code> into <code>dst</code>.</p><p>Recursively walks <code>dst</code> and <code>src</code> together using <a href="../models/functors/#Functors.children"><code>Functors.children</code></a>, and calling <code>copyto!</code> on parameter arrays or throwing an error when there is a mismatch. Non-array elements (such as activation functions) are not copied and need not match. Zero bias vectors and <code>bias=false</code> are considered equivalent (see extended help for more details).</p><p>See also <a href="#Flux.state"><code>Flux.state</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia hljs">julia&gt; dst = Chain(Dense(Flux.ones32(2, 5), Flux.ones32(2), tanh), Dense(2 =&gt; 1; bias = [1f0]))
+julia&gt; Flux.loadmodel!(m2, JLD2.load(&quot;checkpoint.jld2&quot;, &quot;model_state&quot;))</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/loading.jl#L112-L172">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.loadmodel!" href="#Flux.loadmodel!"><code>Flux.loadmodel!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">loadmodel!(dst, src)</code></pre><p>Copy all the parameters (trainable and non-trainable) from <code>src</code> into <code>dst</code>.</p><p>Recursively walks <code>dst</code> and <code>src</code> together using <a href="../models/functors/#Functors.children"><code>Functors.children</code></a>, and calling <code>copyto!</code> on parameter arrays or throwing an error when there is a mismatch. Non-array elements (such as activation functions) are not copied and need not match. Zero bias vectors and <code>bias=false</code> are considered equivalent (see extended help for more details).</p><p>See also <a href="#Flux.state"><code>Flux.state</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia hljs">julia&gt; dst = Chain(Dense(Flux.ones32(2, 5), Flux.ones32(2), tanh), Dense(2 =&gt; 1; bias = [1f0]))
 Chain(
   Dense(5 =&gt; 2, tanh),                  # 12 parameters
   Dense(2 =&gt; 1),                        # 3 parameters
@@ -149,7 +149,7 @@
 false
 
 julia&gt; iszero(dst[2].bias)
-true</code></pre><p><strong>Extended help</strong></p><p>Throws an error when:</p><ul><li><code>dst</code> and <code>src</code> do not share the same fields (at any level)</li><li>the sizes of leaf nodes are mismatched between <code>dst</code> and <code>src</code></li><li>copying non-array values to/from an array parameter (except inactive parameters described below)</li><li><code>dst</code> is a &quot;tied&quot; parameter (i.e. refers to another parameter) and loaded into multiple times with mismatched source values</li></ul><p>Inactive parameters can be encoded by using the boolean value <code>false</code> instead of an array. If <code>dst == false</code> and <code>src</code> is an all-zero array, no error will be raised (and no values copied); however, attempting to copy a non-zero array to an inactive parameter will throw an error. Likewise, copying a <code>src</code> value of <code>false</code> to any <code>dst</code> array is valid, but copying a <code>src</code> value of <code>true</code> will error.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/loading.jl#L39-L89">source</a></section></article><h3 id="KeyPath"><a class="docs-heading-anchor" href="#KeyPath">KeyPath</a><a id="KeyPath-1"></a><a class="docs-heading-anchor-permalink" href="#KeyPath" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Functors.KeyPath" href="#Functors.KeyPath"><code>Functors.KeyPath</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">KeyPath(keys...)</code></pre><p>A type for representing a path of keys to a value in a nested structure. Can be constructed with a sequence of keys, or by concatenating other <code>KeyPath</code>s. Keys can be of type <code>Symbol</code>, <code>String</code>, or <code>Int</code>.</p><p>For custom types, access through symbol keys is assumed to be done with <code>getproperty</code>. For consistency, the method <code>Base.propertynames</code> is used to get the viable property names.</p><p>For string and integer keys instead, the access is done with <code>getindex</code>.</p><p>See also <a href="#Functors.getkeypath"><code>getkeypath</code></a>, <a href="#Functors.haskeypath"><code>haskeypath</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; kp = KeyPath(:b, 3)
+true</code></pre><p><strong>Extended help</strong></p><p>Throws an error when:</p><ul><li><code>dst</code> and <code>src</code> do not share the same fields (at any level)</li><li>the sizes of leaf nodes are mismatched between <code>dst</code> and <code>src</code></li><li>copying non-array values to/from an array parameter (except inactive parameters described below)</li><li><code>dst</code> is a &quot;tied&quot; parameter (i.e. refers to another parameter) and loaded into multiple times with mismatched source values</li></ul><p>Inactive parameters can be encoded by using the boolean value <code>false</code> instead of an array. If <code>dst == false</code> and <code>src</code> is an all-zero array, no error will be raised (and no values copied); however, attempting to copy a non-zero array to an inactive parameter will throw an error. Likewise, copying a <code>src</code> value of <code>false</code> to any <code>dst</code> array is valid, but copying a <code>src</code> value of <code>true</code> will error.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/loading.jl#L39-L89">source</a></section></article><h3 id="KeyPath"><a class="docs-heading-anchor" href="#KeyPath">KeyPath</a><a id="KeyPath-1"></a><a class="docs-heading-anchor-permalink" href="#KeyPath" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Functors.KeyPath" href="#Functors.KeyPath"><code>Functors.KeyPath</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">KeyPath(keys...)</code></pre><p>A type for representing a path of keys to a value in a nested structure. Can be constructed with a sequence of keys, or by concatenating other <code>KeyPath</code>s. Keys can be of type <code>Symbol</code>, <code>String</code>, or <code>Int</code>.</p><p>For custom types, access through symbol keys is assumed to be done with <code>getproperty</code>. For consistency, the method <code>Base.propertynames</code> is used to get the viable property names.</p><p>For string and integer keys instead, the access is done with <code>getindex</code>.</p><p>See also <a href="#Functors.getkeypath"><code>getkeypath</code></a>, <a href="#Functors.haskeypath"><code>haskeypath</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; kp = KeyPath(:b, 3)
 KeyPath(:b, 3)
 
 julia&gt; KeyPath(:a, kp, :c, 4) # construct mixing keys and keypaths
@@ -196,4 +196,4 @@
 true
 
 julia&gt; haskeypath(x, KeyPath(:b, &quot;d&quot;, 4))
-false</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Functors.jl/blob/v0.4.12/src/keypath.jl#L132-L155">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../outputsize/">« Shape Inference</a><a class="docs-footer-nextpage" href="../training/callbacks/">Callback Helpers »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+false</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Functors.jl/blob/v0.4.12/src/keypath.jl#L132-L155">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../outputsize/">« Shape Inference</a><a class="docs-footer-nextpage" href="../training/callbacks/">Callback Helpers »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/models/activation/index.html b/dev/reference/models/activation/index.html
index 27607fa70a..05a8bac091 100644
--- a/dev/reference/models/activation/index.html
+++ b/dev/reference/models/activation/index.html
@@ -17,7 +17,7 @@
            ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        
 
 julia&gt; celu(-10f0)
--0.9999546f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L505-L527">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">elu(x, α=1) = x &gt; 0 ? x : α * (exp(x) - 1)</code></pre><p>Exponential Linear Unit activation function. See <a href="https://arxiv.org/abs/1511.07289">&quot;Fast and Accurate Deep Network Learning by Exponential Linear Units&quot;</a>. You can also specify the coefficient explicitly, e.g. <code>elu(x, 1)</code>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(elu, -2, 2, height=7)
+-0.9999546f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L505-L527">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">elu(x, α=1) = x &gt; 0 ? x : α * (exp(x) - 1)</code></pre><p>Exponential Linear Unit activation function. See <a href="https://arxiv.org/abs/1511.07289">&quot;Fast and Accurate Deep Network Learning by Exponential Linear Units&quot;</a>. You can also specify the coefficient explicitly, e.g. <code>elu(x, 1)</code>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(elu, -2, 2, height=7)
            ┌────────────────────────────────────────┐       
          2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠤⠒⠉│ elu(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠔⠊⠉⠀⠀⠀⠀│       
@@ -34,7 +34,7 @@
 -0.9999546f0
 
 julia&gt; elu(-10f0, 2)
--1.9999092f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L271-L298">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.gelu" href="#NNlib.gelu"><code>NNlib.gelu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">gelu(x) = 0.5x * (1 + tanh(√(2/π) * (x + 0.044715x^3)))</code></pre><p>Activation function from <a href="https://arxiv.org/abs/1606.08415">&quot;Gaussian Error Linear Units&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(gelu, -2, 2, height=7)
+-1.9999092f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L271-L298">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.gelu" href="#NNlib.gelu"><code>NNlib.gelu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">gelu(x) = 0.5x * (1 + tanh(√(2/π) * (x + 0.044715x^3)))</code></pre><p>Activation function from <a href="https://arxiv.org/abs/1606.08415">&quot;Gaussian Error Linear Units&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(gelu, -2, 2, height=7)
            ┌────────────────────────────────────────┐        
          2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠔⠊│ gelu(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠊⠁⠀⠀⠀│        
@@ -60,7 +60,7 @@
         -0.2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠓⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⠇⠀⠀⠀│         
              └────────────────────────────────────────┘         
              ⠀-5⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀0⠀         
-             ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L303-L337">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.hardsigmoid" href="#NNlib.hardsigmoid"><code>NNlib.hardsigmoid</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">hardσ(x) = max(0, min(1, (x + 3) / 6))</code></pre><p>Piecewise linear approximation of <a href="#NNlib.sigmoid"><code>sigmoid</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(hardsigmoid, -5, 5, height=7)
+             ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L303-L337">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.hardsigmoid" href="#NNlib.hardsigmoid"><code>NNlib.hardsigmoid</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">hardσ(x) = max(0, min(1, (x + 3) / 6))</code></pre><p>Piecewise linear approximation of <a href="#NNlib.sigmoid"><code>sigmoid</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(hardsigmoid, -5, 5, height=7)
           ┌────────────────────────────────────────┐         
         1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⢀⡠⠖⠋⠉⠉⠉⠉⠉⠉⠉⠉│ hardσ(x)
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⣀⡤⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│         
@@ -84,7 +84,7 @@
         0 │⣀⣀⣀⣀⣀⣀⣀⠤⠤⠤⠒⠊⠉⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│     
           └────────────────────────────────────────┘     
           ⠀-5⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀5⠀     
-          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀     </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L61-L93">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.hardswish" href="#NNlib.hardswish"><code>NNlib.hardswish</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">hardswish(x) = x * hardσ(x)</code></pre><p>Hard-Swish activation function. See <a href="https://arxiv.org/abs/1905.02244">&quot;Searching for MobileNetV3&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(hardswish, -2, 5, height = 7)
+          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀     </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L61-L93">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.hardswish" href="#NNlib.hardswish"><code>NNlib.hardswish</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">hardswish(x) = x * hardσ(x)</code></pre><p>Hard-Swish activation function. See <a href="https://arxiv.org/abs/1905.02244">&quot;Searching for MobileNetV3&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(hardswish, -2, 5, height = 7)
            ┌────────────────────────────────────────┐             
          5 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠔⠒⠉│ hardswish(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡠⠔⠒⠉⠁⠀⠀⠀⠀│             
@@ -114,7 +114,7 @@
 
 julia&gt; hardswish.(-5:5)&#39;
 1×11 adjoint(::Vector{Float64}) with eltype Float64:
- -0.0  -0.0  -0.0  -0.333333  -0.333333  0.0  0.666667  1.66667  3.0  4.0  5.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L383-L422">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.hardtanh" href="#NNlib.hardtanh"><code>NNlib.hardtanh</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">hardtanh(x) = max(-1, min(1, x))</code></pre><p>Segment-wise linear approximation of <code>tanh</code>, much cheaper to compute. See <a href="https://ronan.collobert.com/pub/matos/2004_phdthesis_lip6.pdf">&quot;Large Scale Machine Learning&quot;</a>.</p><p>See also <a href="#NNlib.tanh_fast"><code>tanh_fast</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(hardtanh, -2, 2, height=7)
+ -0.0  -0.0  -0.0  -0.333333  -0.333333  0.0  0.666667  1.66667  3.0  4.0  5.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L383-L422">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.hardtanh" href="#NNlib.hardtanh"><code>NNlib.hardtanh</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">hardtanh(x) = max(-1, min(1, x))</code></pre><p>Segment-wise linear approximation of <code>tanh</code>, much cheaper to compute. See <a href="https://ronan.collobert.com/pub/matos/2004_phdthesis_lip6.pdf">&quot;Large Scale Machine Learning&quot;</a>.</p><p>See also <a href="#NNlib.tanh_fast"><code>tanh_fast</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(hardtanh, -2, 2, height=7)
            ┌────────────────────────────────────────┐            
          1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⣀⠔⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉│ hardtanh(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⣀⡤⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│            
@@ -138,7 +138,7 @@
         -1 │⣀⣀⣀⡠⠤⠤⠤⠖⠒⠊⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
            └────────────────────────────────────────┘        
            ⠀-2⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀2⠀        
-           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L124-L158">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">leakyrelu(x, a=0.01) = max(a*x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p><pre><code class="language-julia hljs">julia&gt; lineplot(x -&gt; leakyrelu(x, 0.5), -2, 2, height=7)
+           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L124-L158">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">leakyrelu(x, a=0.01) = max(a*x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p><pre><code class="language-julia hljs">julia&gt; lineplot(x -&gt; leakyrelu(x, 0.5), -2, 2, height=7)
            ┌────────────────────────────────────────┐       
          2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠤⠒⠉│ #42(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠔⠊⠉⠀⠀⠀⠀│       
@@ -155,7 +155,7 @@
 -2.0f0
 
 julia&gt; leakyrelu(-10f0, 0.02)
--0.5f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L184-L211">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.lisht" href="#NNlib.lisht"><code>NNlib.lisht</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">lisht(x) = x * tanh(x)</code></pre><p>Activation function from  <a href="https://arxiv.org/abs/1901.05894">&quot;LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent ...&quot;</a></p><pre><code class="nohighlight hljs">julia&gt; lineplot(lisht, -2, 2, height=7)
+-0.5f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L184-L211">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.lisht" href="#NNlib.lisht"><code>NNlib.lisht</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">lisht(x) = x * tanh(x)</code></pre><p>Activation function from  <a href="https://arxiv.org/abs/1901.05894">&quot;LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent ...&quot;</a></p><pre><code class="nohighlight hljs">julia&gt; lineplot(lisht, -2, 2, height=7)
           ┌────────────────────────────────────────┐         
         2 │⠢⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔│ lisht(x)
           │⠀⠈⠑⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡤⠊⠁⠀│         
@@ -179,7 +179,7 @@
         0 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠓⠪⠷⣦⣄⣀⣀⣇⣀⣀⣤⠶⠕⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│           
           └────────────────────────────────────────┘           
           ⠀-2⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀2⠀           
-          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀           </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L427-L460">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.logcosh" href="#NNlib.logcosh"><code>NNlib.logcosh</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logcosh(x)</code></pre><p>Return <code>log(cosh(x))</code> which is computed in a numerically stable way.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(logcosh, -5, 5, height=7)
+          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀           </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L427-L460">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.logcosh" href="#NNlib.logcosh"><code>NNlib.logcosh</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logcosh(x)</code></pre><p>Return <code>log(cosh(x))</code> which is computed in a numerically stable way.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(logcosh, -5, 5, height=7)
           ┌────────────────────────────────────────┐           
         5 │⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ logcosh(x)
           │⠉⠢⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠋│           
@@ -190,7 +190,7 @@
         0 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠑⠢⢄⣀⣀⣇⣀⡠⠔⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│           
           └────────────────────────────────────────┘           
           ⠀-5⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀5⠀           
-          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀           </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L638-L657">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.logsigmoid" href="#NNlib.logsigmoid"><code>NNlib.logsigmoid</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logσ(x)</code></pre><p>Return <code>log(σ(x))</code> which is computed in a numerically stable way.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(logsigmoid, -5, 5, height=7)
+          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀           </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L638-L657">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.logsigmoid" href="#NNlib.logsigmoid"><code>NNlib.logsigmoid</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logσ(x)</code></pre><p>Return <code>log(σ(x))</code> which is computed in a numerically stable way.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(logsigmoid, -5, 5, height=7)
            ┌────────────────────────────────────────┐        
          0 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡧⠤⠔⠒⠒⠒⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉│ logσ(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠊⠉⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
@@ -201,7 +201,7 @@
         -6 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
            └────────────────────────────────────────┘        
            ⠀-5⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀5⠀        
-           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L100-L119">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.mish" href="#NNlib.mish"><code>NNlib.mish</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">mish(x) = x * tanh(softplus(x))</code></pre><p>Activation function from <a href="https://arxiv.org/abs/1908.08681">&quot;Mish: A Self Regularized Non-Monotonic Neural Activation Function&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(mish, -5, 5, height=7)
+           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L100-L119">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.mish" href="#NNlib.mish"><code>NNlib.mish</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">mish(x) = x * tanh(softplus(x))</code></pre><p>Activation function from <a href="https://arxiv.org/abs/1908.08681">&quot;Mish: A Self Regularized Non-Monotonic Neural Activation Function&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(mish, -5, 5, height=7)
            ┌────────────────────────────────────────┐        
          5 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠖⠋│ mish(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠒⠁⠀⠀⠀│        
@@ -212,7 +212,7 @@
         -1 │⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
            └────────────────────────────────────────┘        
            ⠀-5⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀5⠀        
-           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L662-L681">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(relu, -2, 2, height=7)
+           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L662-L681">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(relu, -2, 2, height=7)
           ┌────────────────────────────────────────┐        
         2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠋│ relu(x)
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠊⠁⠀⠀│        
@@ -223,7 +223,7 @@
         0 │⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣇⠔⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
           └────────────────────────────────────────┘        
           ⠀-2⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀2⠀        
-          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L161-L181">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.relu6" href="#NNlib.relu6"><code>NNlib.relu6</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">relu6(x) = min(max(0, x), 6)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function capped at 6. See <a href="https://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf">&quot;Convolutional Deep Belief Networks&quot;</a> from CIFAR-10.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(relu6, -10, 10, height=7)
+          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L161-L181">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.relu6" href="#NNlib.relu6"><code>NNlib.relu6</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">relu6(x) = min(max(0, x), 6)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function capped at 6. See <a href="https://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf">&quot;Convolutional Deep Belief Networks&quot;</a> from CIFAR-10.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(relu6, -10, 10, height=7)
           ┌────────────────────────────────────────┐         
         6 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠎⠉⠉⠉⠉⠉⠉⠉⠉│ relu6(x)
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⢀⡔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀│         
@@ -234,7 +234,7 @@
         0 │⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⡧⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│         
           └────────────────────────────────────────┘         
           ⠀-10⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀10⠀         
-          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L216-L237">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.rrelu" href="#NNlib.rrelu"><code>NNlib.rrelu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">rrelu(x, lo=1/8, hi=1/3) = max(a*x, x)
+          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L216-L237">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.rrelu" href="#NNlib.rrelu"><code>NNlib.rrelu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">rrelu(x, lo=1/8, hi=1/3) = max(a*x, x)
 # where `a` is randomly sampled from uniform distribution `U(lo, hi)`</code></pre><p>Randomized Leaky Rectified Linear Unit activation function. See <a href="https://arxiv.org/abs/1505.00853">&quot;Empirical Evaluation of Rectified Activations&quot;</a> You can also specify the bound explicitly, e.g. <code>rrelu(x, 0.0, 1.0)</code>.</p><pre><code class="language-julia hljs">julia&gt; lineplot(rrelu, -20, 10, height=7)
             ┌────────────────────────────────────────┐         
          10 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠋│ rrelu(x)
@@ -249,7 +249,7 @@
             ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         
 
 julia&gt; extrema(rrelu.(fill(-10f0, 1000)))
-(-3.3316886f0, -1.2548422f0)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L240-L265">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.selu" href="#NNlib.selu"><code>NNlib.selu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">selu(x) = λ * (x ≥ 0 ? x : α * (exp(x) - 1))
+(-3.3316886f0, -1.2548422f0)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L240-L265">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.selu" href="#NNlib.selu"><code>NNlib.selu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">selu(x) = λ * (x ≥ 0 ? x : α * (exp(x) - 1))
 
 λ ≈ 1.05070...
 α ≈ 1.67326...</code></pre><p>Scaled exponential linear units. See <a href="https://arxiv.org/abs/1706.02515">&quot;Self-Normalizing Neural Networks&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(selu, -3, 2, height=7)
@@ -266,7 +266,7 @@
            ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        
 
 julia&gt; selu(-10f0)
--1.7580194f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L463-L489">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.sigmoid" href="#NNlib.sigmoid"><code>NNlib.sigmoid</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function. Unicode <code>σ</code> can be entered as <code>\sigma</code> then tab, in many editors. The ascii name <code>sigmoid</code> is also exported.</p><p>See also <a href="#NNlib.sigmoid_fast"><code>sigmoid_fast</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; using UnicodePlots
+-1.7580194f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L463-L489">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.sigmoid" href="#NNlib.sigmoid"><code>NNlib.sigmoid</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function. Unicode <code>σ</code> can be entered as <code>\sigma</code> then tab, in many editors. The ascii name <code>sigmoid</code> is also exported.</p><p>See also <a href="#NNlib.sigmoid_fast"><code>sigmoid_fast</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; using UnicodePlots
 
 julia&gt; lineplot(sigmoid, -5, 5, height=7)
           ┌────────────────────────────────────────┐     
@@ -282,14 +282,14 @@
           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀     
 
 julia&gt; sigmoid === σ
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L24-L53">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.sigmoid_fast" href="#NNlib.sigmoid_fast"><code>NNlib.sigmoid_fast</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">sigmoid_fast(x)</code></pre><p>This is a faster, and very slightly less accurate, version of <code>sigmoid</code>. For `x::Float32, perhaps 3 times faster, and maximum errors 2 eps instead of 1.</p><p>See also <a href="#NNlib.tanh_fast"><code>tanh_fast</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; sigmoid(0.2f0)
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L24-L53">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.sigmoid_fast" href="#NNlib.sigmoid_fast"><code>NNlib.sigmoid_fast</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">sigmoid_fast(x)</code></pre><p>This is a faster, and very slightly less accurate, version of <code>sigmoid</code>. For `x::Float32, perhaps 3 times faster, and maximum errors 2 eps instead of 1.</p><p>See also <a href="#NNlib.tanh_fast"><code>tanh_fast</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; sigmoid(0.2f0)
 0.54983395f0
 
 julia&gt; sigmoid_fast(0.2f0)
 0.54983395f0
 
 julia&gt; hardσ(0.2f0)
-0.53333336f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L807-L825">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.softplus" href="#NNlib.softplus"><code>NNlib.softplus</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">softplus(x) = log(exp(x) + 1)</code></pre><p>See <a href="http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf">&quot;Deep Sparse Rectifier Neural Networks&quot;</a>, JMLR 2011.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(softplus, -3, 3, height=7)
+0.53333336f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L807-L825">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.softplus" href="#NNlib.softplus"><code>NNlib.softplus</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">softplus(x) = log(exp(x) + 1)</code></pre><p>See <a href="http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf">&quot;Deep Sparse Rectifier Neural Networks&quot;</a>, JMLR 2011.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(softplus, -3, 3, height=7)
           ┌────────────────────────────────────────┐            
         4 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ softplus(x)
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠│            
@@ -316,7 +316,7 @@
           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀            
 
 julia&gt; softplus(16f0)
-16.0f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L600-L635">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.softshrink" href="#NNlib.softshrink"><code>NNlib.softshrink</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">softshrink(x, λ=0.5) =
+16.0f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L600-L635">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.softshrink" href="#NNlib.softshrink"><code>NNlib.softshrink</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">softshrink(x, λ=0.5) =
     (x ≥ λ ? x - λ : (-λ ≥ x ? x + λ : 0))</code></pre><p>See <a href="https://www.gabormelli.com/RKB/Softshrink_Activation_Function">&quot;Softshrink Activation Function&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(softshrink, -2, 2, height=7)
            ┌────────────────────────────────────────┐              
          2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀│ softshrink(x)
@@ -344,7 +344,7 @@
            ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀
 
 julia&gt; softshrink.((-10f0, 10f0))
-(-9.5f0, 9.5f0)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L709-L745">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.softsign" href="#NNlib.softsign"><code>NNlib.softsign</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">softsign(x) = x / (1 + |x|)</code></pre><p>See <a href="http://www.iro.umontreal.ca/~lisa/publications2/index.php/attachments/single/205">&quot;Quadratic Polynomials Learn Better Image Features&quot;</a> (2009).</p><pre><code class="nohighlight hljs">julia&gt; lineplot(softsign, -5, 5, height=7)
+(-9.5f0, 9.5f0)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L709-L745">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.softsign" href="#NNlib.softsign"><code>NNlib.softsign</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">softsign(x) = x / (1 + |x|)</code></pre><p>See <a href="http://www.iro.umontreal.ca/~lisa/publications2/index.php/attachments/single/205">&quot;Quadratic Polynomials Learn Better Image Features&quot;</a> (2009).</p><pre><code class="nohighlight hljs">julia&gt; lineplot(softsign, -5, 5, height=7)
            ┌────────────────────────────────────────┐            
          1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣀⣀⣀⣀⠤⠤⠤⠤⠤│ softsign(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⣀⡤⠖⠒⠋⠉⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀│            
@@ -374,7 +374,7 @@
 0.5f0
 
 julia&gt; softsign(100f0)
-0.990099f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L557-L595">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.swish" href="#NNlib.swish"><code>NNlib.swish</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">swish(x) = x * σ(x)</code></pre><p>Self-gated activation function. See <a href="https://arxiv.org/abs/1710.05941">&quot;Swish: a Self-Gated Activation Function&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(swish, -2, 2, height=7)
+0.990099f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L557-L595">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.swish" href="#NNlib.swish"><code>NNlib.swish</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">swish(x) = x * σ(x)</code></pre><p>Self-gated activation function. See <a href="https://arxiv.org/abs/1710.05941">&quot;Swish: a Self-Gated Activation Function&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(swish, -2, 2, height=7)
            ┌────────────────────────────────────────┐         
          2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤│ swish(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠋⠁⠀│         
@@ -385,7 +385,7 @@
         -1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│         
            └────────────────────────────────────────┘         
            ⠀-2⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀2⠀         
-           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L360-L380">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.tanhshrink" href="#NNlib.tanhshrink"><code>NNlib.tanhshrink</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">tanhshrink(x) = x - tanh(x)</code></pre><p>See <a href="https://www.gabormelli.com/RKB/Tanhshrink_Activation_Function">&quot;Tanhshrink Activation Function&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(tanhshrink, -3, 3, height=7)
+           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L360-L380">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.tanhshrink" href="#NNlib.tanhshrink"><code>NNlib.tanhshrink</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">tanhshrink(x) = x - tanh(x)</code></pre><p>See <a href="https://www.gabormelli.com/RKB/Tanhshrink_Activation_Function">&quot;Tanhshrink Activation Function&quot;</a>.</p><pre><code class="nohighlight hljs">julia&gt; lineplot(tanhshrink, -3, 3, height=7)
            ┌────────────────────────────────────────┐              
          3 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ tanhshrink(x)
            │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡠⠤⠖⠊│              
@@ -399,14 +399,14 @@
            ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀              
 
 julia&gt; tanhshrink.((-10f0, 10f0))
-(-9.0f0, 9.0f0)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L684-L706">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.tanh_fast" href="#NNlib.tanh_fast"><code>NNlib.tanh_fast</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">tanh_fast(x)</code></pre><p>This is a faster but slighly less accurate version of <code>tanh</code>.</p><p>Where Julia&#39;s <code>tanh</code> function has an error under 2 eps, this may be wrong by 5 eps, a reduction by less than one decimal digit. </p><p>For <code>x::Float32</code> this is usually about 10 times faster, with a smaller speedup for <code>x::Float64</code>. For any other number types, it just calls <code>tanh</code>.</p><p>See also <a href="#NNlib.sigmoid_fast"><code>sigmoid_fast</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; tanh(0.5f0)
+(-9.0f0, 9.0f0)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L684-L706">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.tanh_fast" href="#NNlib.tanh_fast"><code>NNlib.tanh_fast</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">tanh_fast(x)</code></pre><p>This is a faster but slighly less accurate version of <code>tanh</code>.</p><p>Where Julia&#39;s <code>tanh</code> function has an error under 2 eps, this may be wrong by 5 eps, a reduction by less than one decimal digit. </p><p>For <code>x::Float32</code> this is usually about 10 times faster, with a smaller speedup for <code>x::Float64</code>. For any other number types, it just calls <code>tanh</code>.</p><p>See also <a href="#NNlib.sigmoid_fast"><code>sigmoid_fast</code></a>.</p><pre><code class="nohighlight hljs">julia&gt; tanh(0.5f0)
 0.46211717f0
 
 julia&gt; tanh_fast(0.5f0)
 0.46211714f0
 
 julia&gt; hard_tanh(0.5f0)
-0.5f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L759-L783">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.trelu" href="#NNlib.trelu"><code>NNlib.trelu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">trelu(x, theta=1) = x &gt; theta ? x : 0</code></pre><p>Threshold gated rectified linear activation function. See <a href="https://arxiv.org/abs/1402.3337">&quot;Zero-bias autoencoders and the benefits of co-adapting features&quot;</a></p><pre><code class="nohighlight hljs">julia&gt; lineplot(trelu, -2, 4, height=7)
+0.5f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L759-L783">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.trelu" href="#NNlib.trelu"><code>NNlib.trelu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">trelu(x, theta=1) = x &gt; theta ? x : 0</code></pre><p>Threshold gated rectified linear activation function. See <a href="https://arxiv.org/abs/1402.3337">&quot;Zero-bias autoencoders and the benefits of co-adapting features&quot;</a></p><pre><code class="nohighlight hljs">julia&gt; lineplot(trelu, -2, 4, height=7)
           ┌────────────────────────────────────────┐         
         4 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠋│ trelu(x)
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠖⠋⠁⠀⠀⠀│         
@@ -417,7 +417,7 @@
         0 │⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣇⣀⣀⣀⣀⣀⣀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│         
           └────────────────────────────────────────┘         
           ⠀-2⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀4⠀         
-          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/activations.jl#L532-L552">source</a></section></article><h3 id="One-More"><a class="docs-heading-anchor" href="#One-More">One More</a><a id="One-More-1"></a><a class="docs-heading-anchor-permalink" href="#One-More" title="Permalink"></a></h3><p>Julia&#39;s <code>Base.Math</code> also provides <code>tanh</code>, which can be used as an activation function.</p><p>Note that many Flux layers will automatically replace this with <a href="#NNlib.tanh_fast"><code>NNlib.tanh_fast</code></a> when called, as Base&#39;s <code>tanh</code> is slow enough to sometimes be a bottleneck.</p><pre><code class="language-julia hljs">julia&gt; using UnicodePlots
+          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀         </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/activations.jl#L532-L552">source</a></section></article><h3 id="One-More"><a class="docs-heading-anchor" href="#One-More">One More</a><a id="One-More-1"></a><a class="docs-heading-anchor-permalink" href="#One-More" title="Permalink"></a></h3><p>Julia&#39;s <code>Base.Math</code> also provides <code>tanh</code>, which can be used as an activation function.</p><p>Note that many Flux layers will automatically replace this with <a href="#NNlib.tanh_fast"><code>NNlib.tanh_fast</code></a> when called, as Base&#39;s <code>tanh</code> is slow enough to sometimes be a bottleneck.</p><pre><code class="language-julia hljs">julia&gt; using UnicodePlots
 
 julia&gt; lineplot(tanh, -3, 3, height=7)
            ┌────────────────────────────────────────┐        
@@ -430,4 +430,4 @@
         -1 │⣀⣀⣀⣀⣀⣀⣀⣀⣀⡤⠤⠔⠒⠉⠁⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│        
            └────────────────────────────────────────┘        
            ⠀-3⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀3⠀        
-           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../layers/">« Built-in Layers</a><a class="docs-footer-nextpage" href="../../utilities/">Weight Initialisation »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀x⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀        </code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../layers/">« Built-in Layers</a><a class="docs-footer-nextpage" href="../../utilities/">Weight Initialisation »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/models/functors/index.html b/dev/reference/models/functors/index.html
index 8c9b7f985a..0dd0aa4db0 100644
--- a/dev/reference/models/functors/index.html
+++ b/dev/reference/models/functors/index.html
@@ -23,7 +23,7 @@
   Dense(2 =&gt; 1, tanh),                  # 3 parameters
   Dense(1 =&gt; 1; bias=false),            # 1 parameters
   Dropout(0.4),
-)                   # Total: 3 arrays, 4 parameters, 224 bytes.</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/macro.jl#L2-L49">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Functors.@functor" href="#Functors.@functor"><code>Functors.@functor</code></a> — <span class="docstring-category">Macro</span></header><section><div><pre><code class="language-julia hljs">@functor T
+)                   # Total: 3 arrays, 4 parameters, 224 bytes.</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/macro.jl#L2-L49">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Functors.@functor" href="#Functors.@functor"><code>Functors.@functor</code></a> — <span class="docstring-category">Macro</span></header><section><div><pre><code class="language-julia hljs">@functor T
 @functor T (x,)</code></pre><p>Adds methods to <a href="#Functors.functor"><code>functor</code></a> allowing recursion into objects of type <code>T</code>, and reconstruction. Assumes that <code>T</code> has a constructor accepting all of its fields, which is true unless you have provided an inner constructor which does not.</p><p>By default all fields of <code>T</code> are considered <a href="#Functors.children"><code>children</code></a>;  this can be restricted be restructed by providing a tuple of field names.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; struct Foo; x; y; end
 
 julia&gt; @functor Foo
@@ -193,7 +193,7 @@
 julia&gt; m.bias
 2-element Vector{Float32}:
  0.0
- 0.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L157-L181">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.gpu-Tuple{Any}" href="#Flux.gpu-Tuple{Any}"><code>Flux.gpu</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">gpu(m)</code></pre><p>Copies <code>m</code> to the current GPU device (using current GPU backend), if one is available. If no GPU is available, it does nothing (but prints a warning the first time).</p><p>On arrays, this calls CUDA&#39;s <code>cu</code>, which also changes arrays with Float64 elements to Float32 while copying them to the device (same for AMDGPU). To act on arrays within a struct, the struct type must be marked with <a href="#Functors.@functor"><code>@functor</code></a>.</p><p>Use <a href="#Flux.cpu"><code>cpu</code></a> to copy back to ordinary <code>Array</code>s. See also <a href="../../utilities/#Flux.f32"><code>f32</code></a> and <a href="../../utilities/#Flux.f16"><code>f16</code></a> to change element type only.</p><p>See the <a href="https://juliagpu.github.io/CUDA.jl/stable/usage/multigpu/">CUDA.jl docs</a>  to help identify the current device.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = Dense(rand(2, 3))  # constructed with Float64 weight matrix
+ 0.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L157-L181">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.gpu-Tuple{Any}" href="#Flux.gpu-Tuple{Any}"><code>Flux.gpu</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">gpu(m)</code></pre><p>Copies <code>m</code> to the current GPU device (using current GPU backend), if one is available. If no GPU is available, it does nothing (but prints a warning the first time).</p><p>On arrays, this calls CUDA&#39;s <code>cu</code>, which also changes arrays with Float64 elements to Float32 while copying them to the device (same for AMDGPU). To act on arrays within a struct, the struct type must be marked with <a href="#Functors.@functor"><code>@functor</code></a>.</p><p>Use <a href="#Flux.cpu"><code>cpu</code></a> to copy back to ordinary <code>Array</code>s. See also <a href="../../utilities/#Flux.f32"><code>f32</code></a> and <a href="../../utilities/#Flux.f16"><code>f16</code></a> to change element type only.</p><p>See the <a href="https://juliagpu.github.io/CUDA.jl/stable/usage/multigpu/">CUDA.jl docs</a>  to help identify the current device.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = Dense(rand(2, 3))  # constructed with Float64 weight matrix
 Dense(3 =&gt; 2)       # 8 parameters
 
 julia&gt; typeof(m.weight)
@@ -203,7 +203,7 @@
 Dense(3 =&gt; 2)       # 8 parameters
 
 julia&gt; typeof(m_gpu.weight)
-CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L226-L256">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.gpu-Tuple{DataLoader}" href="#Flux.gpu-Tuple{DataLoader}"><code>Flux.gpu</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">gpu(data::DataLoader)
+CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L226-L256">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.gpu-Tuple{DataLoader}" href="#Flux.gpu-Tuple{DataLoader}"><code>Flux.gpu</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">gpu(data::DataLoader)
 cpu(data::DataLoader)</code></pre><p>Transforms a given <code>DataLoader</code> to apply <code>gpu</code> or <code>cpu</code> to each batch of data, when iterated over. (If no GPU is available, this does nothing.)</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; dl = Flux.DataLoader((x = ones(2,10), y=&#39;a&#39;:&#39;j&#39;), batchsize=3)
 4-element DataLoader(::NamedTuple{(:x, :y), Tuple{Matrix{Float64}, StepRange{Char, Int64}}}, batchsize=3)
   with first element:
@@ -223,4 +223,4 @@
  1.0  1.0  1.0</code></pre><p>For large datasets, this is preferred over moving all the data to the GPU before creating the <code>DataLoader</code>, like this:</p><pre><code class="language-julia-repl hljs">julia&gt; Flux.DataLoader((x = ones(2,10), y=2:11) |&gt; gpu, batchsize=3)
 4-element DataLoader(::NamedTuple{(:x, :y), Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, UnitRange{Int64}}}, batchsize=3)
   with first element:
-  (; x = 2×3 CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, y = 3-element UnitRange{Int64})</code></pre><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>This only works if <code>gpu</code> is applied directly to the <code>DataLoader</code>. While <code>gpu</code> acts recursively on Flux models and many basic Julia structs, it will not work on (say) a tuple of <code>DataLoader</code>s.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L414-L457">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../nnlib/">« Low-level Operations – NNlib.jl</a><a class="docs-footer-nextpage" href="../../../tutorials/linear_regression/">Linear Regression »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body><div data-docstringscollapsed="true"></div></html>
+  (; x = 2×3 CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, y = 3-element UnitRange{Int64})</code></pre><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>This only works if <code>gpu</code> is applied directly to the <code>DataLoader</code>. While <code>gpu</code> acts recursively on Flux models and many basic Julia structs, it will not work on (say) a tuple of <code>DataLoader</code>s.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L414-L457">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../nnlib/">« Low-level Operations – NNlib.jl</a><a class="docs-footer-nextpage" href="../../../tutorials/linear_regression/">Linear Regression »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body><div data-docstringscollapsed="true"></div></html>
diff --git a/dev/reference/models/layers/index.html b/dev/reference/models/layers/index.html
index 47687fe8ef..ba03c7d3b2 100644
--- a/dev/reference/models/layers/index.html
+++ b/dev/reference/models/layers/index.html
@@ -23,7 +23,7 @@
 
 julia&gt; Flux.trainables(model2)  # no trainable bias
 1-element Vector{AbstractArray}:
- [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L112-L153">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Bilinear" href="#Flux.Bilinear"><code>Flux.Bilinear</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Bilinear((in1, in2) =&gt; out, σ=identity; bias=true, init=glorot_uniform)
+ [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L112-L153">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Bilinear" href="#Flux.Bilinear"><code>Flux.Bilinear</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Bilinear((in1, in2) =&gt; out, σ=identity; bias=true, init=glorot_uniform)
 Bilinear(W::AbstractArray, [bias, σ])</code></pre><p>Creates a layer which is fully connected between two inputs and the output, and otherwise similar to <a href="#Flux.Dense"><code>Dense</code></a>. Its output, given vectors <code>x</code> &amp; <code>y</code>, is another vector <code>z</code> with, for all <code>i ∈ 1:out</code>:</p><pre><code class="nohighlight hljs">z[i] = σ(x&#39; * W[i,:,:] * y + bias[i])</code></pre><p>If <code>x</code> and <code>y</code> are matrices, then each column of the output <code>z = B(x, y)</code> is of this form, with <code>B</code> the Bilinear layer.</p><p>If the second input <code>y</code> is not given, it is taken to be equal to <code>x</code>, i.e. <code>B(x) == B(x, x)</code></p><p>The two inputs may also be provided as a tuple, <code>B((x, y)) == B(x, y)</code>, which is accepted as the input to a <code>Chain</code>.</p><p>If the two input sizes are the same, <code>in1 == in2</code>, then you may write <code>Bilinear(in =&gt; out, σ)</code>.</p><p>The initialisation works as for <a href="#Flux.Dense"><code>Dense</code></a> layer, with <code>W = init(out, in1, in2)</code>. By default the bias vector is <code>zeros(Float32, out)</code>, option <code>bias=false</code> will switch off trainable bias. Either of these may be provided explicitly.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; x, y = randn(Float32, 5, 32), randn(Float32, 5, 32);
 
 julia&gt; B = Flux.Bilinear((5, 5) =&gt; 7)
@@ -44,7 +44,7 @@
 (3, 32)
 
 julia&gt; Flux.Bilinear(rand(4,8,16), false, tanh)  # first dim of weight is the output
-Bilinear((8, 16) =&gt; 4, tanh; bias=false)  # 512 parameters</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L370-L418">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Scale" href="#Flux.Scale"><code>Flux.Scale</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Scale(size::Integer..., σ=identity; bias=true, init=ones32)
+Bilinear((8, 16) =&gt; 4, tanh; bias=false)  # 512 parameters</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L370-L418">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Scale" href="#Flux.Scale"><code>Flux.Scale</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Scale(size::Integer..., σ=identity; bias=true, init=ones32)
 Scale(scale::AbstractArray, [bias, σ])</code></pre><p>Create an element-wise layer, whose forward pass is given by:</p><pre><code class="nohighlight hljs">y = σ.(scale .* x .+ bias)</code></pre><p>This uses <code>.*</code> instead of matrix multiplication <code>*</code> of <a href="#Flux.Dense"><code>Dense</code></a>.</p><p>The learnable scale &amp; bias are initialised <code>init(size...)</code> and <code>zeros32(size...)</code>, with <code>init=ones32</code> by default. You may specify the function <code>init</code>,  turn off trainable bias with <code>bias=false</code>, or provide the array(s) explicitly.</p><p>Used by <a href="#Flux.LayerNorm"><code>LayerNorm</code></a> with <code>affine=true</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; a = Flux.Scale(2)
 Scale(2)            # 4 parameters
 
@@ -68,7 +68,7 @@
 
 julia&gt; Flux.trainables(b)
 1-element Vector{AbstractArray}:
- Float32[1.0 2.0 3.0 4.0]</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L201-L244">source</a></section></article><p>Perhaps <code>Scale</code> isn&#39;t quite fully connected, but it may be thought of as <code>Dense(Diagonal(s.weights), s.bias)</code>, and LinearAlgebra&#39;s <code>Diagonal</code> is a matrix which just happens to contain many zeros.</p><h2 id="Convolution-Models"><a class="docs-heading-anchor" href="#Convolution-Models">Convolution Models</a><a id="Convolution-Models-1"></a><a class="docs-heading-anchor-permalink" href="#Convolution-Models" title="Permalink"></a></h2><p>These layers are used to build convolutional neural networks (CNNs).</p><p>They all expect images in what is called WHCN order: a batch of 32 colour images, each 50 x 50 pixels, will have <code>size(x) == (50, 50, 3, 32)</code>. A single grayscale image might instead have <code>size(x) == (28, 28, 1, 1)</code>.</p><p>Besides images, 2D data, they also work with 1D data, where for instance stereo sound recording with 1000 samples might have <code>size(x) == (1000, 2, 1)</code>. They will also work with 3D data, <code>ndims(x) == 5</code>, where again the last two dimensions are channel and batch.</p><p>To understand how strides and padding work, the article by <a href="https://arxiv.org/abs/1603.07285">Dumoulin &amp; Visin</a> has great illustrations.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Conv" href="#Flux.Conv"><code>Flux.Conv</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Conv(filter, in =&gt; out, σ = identity;
+ Float32[1.0 2.0 3.0 4.0]</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L201-L244">source</a></section></article><p>Perhaps <code>Scale</code> isn&#39;t quite fully connected, but it may be thought of as <code>Dense(Diagonal(s.weights), s.bias)</code>, and LinearAlgebra&#39;s <code>Diagonal</code> is a matrix which just happens to contain many zeros.</p><h2 id="Convolution-Models"><a class="docs-heading-anchor" href="#Convolution-Models">Convolution Models</a><a id="Convolution-Models-1"></a><a class="docs-heading-anchor-permalink" href="#Convolution-Models" title="Permalink"></a></h2><p>These layers are used to build convolutional neural networks (CNNs).</p><p>They all expect images in what is called WHCN order: a batch of 32 colour images, each 50 x 50 pixels, will have <code>size(x) == (50, 50, 3, 32)</code>. A single grayscale image might instead have <code>size(x) == (28, 28, 1, 1)</code>.</p><p>Besides images, 2D data, they also work with 1D data, where for instance stereo sound recording with 1000 samples might have <code>size(x) == (1000, 2, 1)</code>. They will also work with 3D data, <code>ndims(x) == 5</code>, where again the last two dimensions are channel and batch.</p><p>To understand how strides and padding work, the article by <a href="https://arxiv.org/abs/1603.07285">Dumoulin &amp; Visin</a> has great illustrations.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Conv" href="#Flux.Conv"><code>Flux.Conv</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Conv(filter, in =&gt; out, σ = identity;
      stride = 1, pad = 0, dilation = 1, groups = 1, [bias, init])</code></pre><p>Standard convolutional layer. <code>filter</code> is a tuple of integers specifying the size of the convolutional kernel; <code>in</code> and <code>out</code> specify the number of input and output channels.</p><p>Image data should be stored in WHCN order (width, height, channels, batch). In other words, a 100×100 RGB image would be a <code>100×100×3×1</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array. This has <code>N = 2</code> spatial dimensions, and needs a kernel size like <code>(5,5)</code>, a 2-tuple of integers.</p><p>To take convolutions along <code>N</code> feature dimensions, this layer expects as input an array with <code>ndims(x) == N+2</code>, where <code>size(x, N+1) == in</code> is the number of input channels, and <code>size(x, ndims(x))</code> is (as always) the number of observations in a batch. Then:</p><ul><li><code>filter</code> should be a tuple of <code>N</code> integers.</li><li>Keywords <code>stride</code> and <code>dilation</code> should each be either single integer, or a tuple with <code>N</code> integers.</li><li>Keyword <code>pad</code> specifies the number of elements added to the borders of the data array. It can be<ul><li>a single integer for equal padding all around,</li><li>a tuple of <code>N</code> integers, to apply the same padding at begin/end of each spatial dimension,</li><li>a tuple of <code>2*N</code> integers, for asymmetric padding, or</li><li>the singleton <code>SamePad()</code>, to calculate padding such that <code>size(output,d) == size(x,d) / stride</code> (possibly rounded) for each spatial dimension.</li></ul></li><li>Keyword <code>groups</code> is expected to be an <code>Int</code>. It specifies the number of groups to divide a convolution into.</li></ul><p>Keywords to control initialization of the layer:</p><ul><li><code>init</code> - Function used to generate initial weights. Defaults to <code>glorot_uniform</code>.</li><li><code>bias</code> - The initial bias vector is all zero by default. Trainable bias can be disabled entirely by setting this to <code>false</code>, or another vector can be provided such as <code>bias = randn(Float32, out)</code>.</li></ul><p>See also <a href="#Flux.ConvTranspose"><code>ConvTranspose</code></a>, <a href="#Flux.DepthwiseConv"><code>DepthwiseConv</code></a>, <a href="#Flux.CrossCor"><code>CrossCor</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand32(100, 100, 3, 50); # a batch of 50 RGB images
 
 julia&gt; layer = Conv((5,5), 3 =&gt; 7, relu; bias = false)
@@ -87,7 +87,7 @@
 (130, 100, 7, 50)
 
 julia&gt; Conv((5,5), 3 =&gt; 7; stride = 2, dilation = 4)(xs) |&gt; size
-(42, 42, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L60-L119">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Conv-Tuple{AbstractArray}" href="#Flux.Conv-Tuple{AbstractArray}"><code>Flux.Conv</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">Conv(weight::AbstractArray, [bias, activation; stride, pad, dilation])</code></pre><p>Constructs a convolutional layer with the given weight and bias. Accepts the same keywords and has the same defaults as <a href="#Flux.Conv"><code>Conv(k::NTuple{N,Integer}, ch::Pair{&lt;:Integer,&lt;:Integer}, σ; ...)</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; weight = rand(3, 4, 5);
+(42, 42, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L60-L119">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Conv-Tuple{AbstractArray}" href="#Flux.Conv-Tuple{AbstractArray}"><code>Flux.Conv</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">Conv(weight::AbstractArray, [bias, activation; stride, pad, dilation])</code></pre><p>Constructs a convolutional layer with the given weight and bias. Accepts the same keywords and has the same defaults as <a href="#Flux.Conv"><code>Conv(k::NTuple{N,Integer}, ch::Pair{&lt;:Integer,&lt;:Integer}, σ; ...)</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; weight = rand(3, 4, 5);
 
 julia&gt; bias = zeros(5);
 
@@ -98,7 +98,7 @@
 (98, 5, 64)
 
 julia&gt; Flux.params(layer) |&gt; length
-2</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L130-L151">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.ConvTranspose" href="#Flux.ConvTranspose"><code>Flux.ConvTranspose</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">ConvTranspose(filter, in =&gt; out, σ=identity; stride=1, pad=0, outpad=0, dilation=1, [bias, init])</code></pre><p>Standard convolutional transpose layer. <code>filter</code> is a tuple of integers specifying the size of the convolutional kernel, while <code>in</code> and <code>out</code> specify the number of input and output channels.</p><p>Note that <code>pad=SamePad()</code> here tries to ensure <code>size(output,d) == size(x,d) * stride</code>.</p><p>To conserve <a href="#Flux.Conv"><code>Conv</code></a> inversability when <code>stride &gt; 1</code>, <code>outpad</code> can be used to increase the size of the output in the desired dimensions. Whereas <code>pad</code> is used to zero-pad the input, <code>outpad</code> only affects the output shape.</p><p>Parameters are controlled by additional keywords, with defaults <code>init=glorot_uniform</code> and <code>bias=true</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a> for more detailed description of keywords.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand32(100, 100, 3, 50);  # a batch of 50 RGB images
+2</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L130-L151">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.ConvTranspose" href="#Flux.ConvTranspose"><code>Flux.ConvTranspose</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">ConvTranspose(filter, in =&gt; out, σ=identity; stride=1, pad=0, outpad=0, dilation=1, [bias, init])</code></pre><p>Standard convolutional transpose layer. <code>filter</code> is a tuple of integers specifying the size of the convolutional kernel, while <code>in</code> and <code>out</code> specify the number of input and output channels.</p><p>Note that <code>pad=SamePad()</code> here tries to ensure <code>size(output,d) == size(x,d) * stride</code>.</p><p>To conserve <a href="#Flux.Conv"><code>Conv</code></a> inversability when <code>stride &gt; 1</code>, <code>outpad</code> can be used to increase the size of the output in the desired dimensions. Whereas <code>pad</code> is used to zero-pad the input, <code>outpad</code> only affects the output shape.</p><p>Parameters are controlled by additional keywords, with defaults <code>init=glorot_uniform</code> and <code>bias=true</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a> for more detailed description of keywords.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand32(100, 100, 3, 50);  # a batch of 50 RGB images
 
 julia&gt; layer = ConvTranspose((5,5), 3 =&gt; 7, relu)
 ConvTranspose((5, 5), 3 =&gt; 7, relu)  # 532 parameters
@@ -113,7 +113,7 @@
 (204, 204, 7, 50)
 
 julia&gt; ConvTranspose((5,5), 3 =&gt; 7, stride=3, pad=SamePad())(xs) |&gt; size
-(300, 300, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L226-L263">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.ConvTranspose-Tuple{AbstractArray}" href="#Flux.ConvTranspose-Tuple{AbstractArray}"><code>Flux.ConvTranspose</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">ConvTranspose(weight::AbstractArray, [bias, activation; stride, pad, outpad, dilation, groups])</code></pre><p>Constructs a ConvTranspose layer with the given weight and bias. Accepts the same keywords and has the same defaults as <a href="#Flux.ConvTranspose"><code>ConvTranspose(k::NTuple{N,Integer}, ch::Pair{&lt;:Integer,&lt;:Integer}, σ; ...)</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; weight = rand(3, 4, 5);
+(300, 300, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L226-L263">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.ConvTranspose-Tuple{AbstractArray}" href="#Flux.ConvTranspose-Tuple{AbstractArray}"><code>Flux.ConvTranspose</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">ConvTranspose(weight::AbstractArray, [bias, activation; stride, pad, outpad, dilation, groups])</code></pre><p>Constructs a ConvTranspose layer with the given weight and bias. Accepts the same keywords and has the same defaults as <a href="#Flux.ConvTranspose"><code>ConvTranspose(k::NTuple{N,Integer}, ch::Pair{&lt;:Integer,&lt;:Integer}, σ; ...)</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; weight = rand(3, 4, 5);
 
 julia&gt; bias = zeros(4);
 
@@ -124,7 +124,7 @@
 (102, 4, 64)
 
 julia&gt; Flux.params(layer) |&gt; length
-2</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L278-L300">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.CrossCor" href="#Flux.CrossCor"><code>Flux.CrossCor</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">CrossCor(filter, in =&gt; out, σ=identity; stride=1, pad=0, dilation=1, [bias, init])</code></pre><p>Standard cross correlation layer. <code>filter</code> is a tuple of integers specifying the size of the convolutional kernel; <code>in</code> and <code>out</code> specify the number of input and output channels.</p><p>Parameters are controlled by additional keywords, with defaults <code>init=glorot_uniform</code> and <code>bias=true</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a> for more detailed description of keywords.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # a batch of 50 RGB images
+2</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L278-L300">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.CrossCor" href="#Flux.CrossCor"><code>Flux.CrossCor</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">CrossCor(filter, in =&gt; out, σ=identity; stride=1, pad=0, dilation=1, [bias, init])</code></pre><p>Standard cross correlation layer. <code>filter</code> is a tuple of integers specifying the size of the convolutional kernel; <code>in</code> and <code>out</code> specify the number of input and output channels.</p><p>Parameters are controlled by additional keywords, with defaults <code>init=glorot_uniform</code> and <code>bias=true</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a> for more detailed description of keywords.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # a batch of 50 RGB images
 
 julia&gt; layer = CrossCor((5,5), 3 =&gt; 6, relu; bias=false)
 CrossCor((5, 5), 3 =&gt; 6, relu, bias=false)  # 450 parameters
@@ -133,7 +133,7 @@
 (96, 96, 6, 50)
 
 julia&gt; CrossCor((5,5), 3 =&gt; 7, stride=3, pad=(2,0))(xs) |&gt; size
-(34, 32, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L407-L433">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.CrossCor-Tuple{AbstractArray}" href="#Flux.CrossCor-Tuple{AbstractArray}"><code>Flux.CrossCor</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">CrossCor(weight::AbstractArray, [bias, activation; stride, pad, dilation])</code></pre><p>Constructs a CrossCor layer with the given weight and bias. Accepts the same keywords and has the same defaults as <a href="#Flux.CrossCor"><code>CrossCor(k::NTuple{N,Integer}, ch::Pair{&lt;:Integer,&lt;:Integer}, σ; ...)</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; weight = rand(3, 4, 5);
+(34, 32, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L407-L433">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.CrossCor-Tuple{AbstractArray}" href="#Flux.CrossCor-Tuple{AbstractArray}"><code>Flux.CrossCor</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">CrossCor(weight::AbstractArray, [bias, activation; stride, pad, dilation])</code></pre><p>Constructs a CrossCor layer with the given weight and bias. Accepts the same keywords and has the same defaults as <a href="#Flux.CrossCor"><code>CrossCor(k::NTuple{N,Integer}, ch::Pair{&lt;:Integer,&lt;:Integer}, σ; ...)</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; weight = rand(3, 4, 5);
 
 julia&gt; bias = zeros(5);
 
@@ -141,7 +141,7 @@
 CrossCor((3,), 4 =&gt; 5, relu)  # 65 parameters
 
 julia&gt; layer(randn(100, 4, 64)) |&gt; size
-(98, 5, 64)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L445-L464">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.DepthwiseConv" href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">DepthwiseConv(filter, in =&gt; out, σ=identity; stride=1, pad=0, dilation=1, [bias, init])
+(98, 5, 64)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L445-L464">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.DepthwiseConv" href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">DepthwiseConv(filter, in =&gt; out, σ=identity; stride=1, pad=0, dilation=1, [bias, init])
 DepthwiseConv(weight::AbstractArray, [bias, activation; stride, pad, dilation])</code></pre><p>Return a depthwise convolutional layer, that is a <a href="#Flux.Conv"><code>Conv</code></a> layer with number of groups equal to the number of input channels.</p><p>See <a href="#Flux.Conv"><code>Conv</code></a> for a description of the arguments.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # a batch of 50 RGB images
 
 julia&gt; layer = DepthwiseConv((5,5), 3 =&gt; 6, relu; bias=false)
@@ -151,7 +151,7 @@
 (96, 96, 6, 50)
 
 julia&gt; DepthwiseConv((5, 5), 3 =&gt; 9, stride=2, pad=2)(xs) |&gt; size
-(50, 50, 9, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L371-L394">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.SamePad" href="#Flux.SamePad"><code>Flux.SamePad</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">SamePad()</code></pre><p>Passed as an option to convolutional layers (and friends), this causes the padding to be chosen such that the input and output sizes agree (on the first <code>N</code> dimensions, the kernel or window) when <code>stride==1</code>. When <code>stride≠1</code>, the output size equals <code>ceil(input_size/stride)</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a>, <a href="#Flux.MaxPool"><code>MaxPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand32(100, 100, 3, 50);  # a batch of images
+(50, 50, 9, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L371-L394">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.SamePad" href="#Flux.SamePad"><code>Flux.SamePad</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">SamePad()</code></pre><p>Passed as an option to convolutional layers (and friends), this causes the padding to be chosen such that the input and output sizes agree (on the first <code>N</code> dimensions, the kernel or window) when <code>stride==1</code>. When <code>stride≠1</code>, the output size equals <code>ceil(input_size/stride)</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a>, <a href="#Flux.MaxPool"><code>MaxPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand32(100, 100, 3, 50);  # a batch of images
 
 julia&gt; layer = Conv((2,2), 3 =&gt; 7, pad=SamePad())
 Conv((2, 2), 3 =&gt; 7, pad=(1, 0, 1, 0))  # 91 parameters
@@ -169,7 +169,7 @@
 Conv((5, 5), 3 =&gt; 7, pad=2, stride=2)  # 532 parameters
 
 julia&gt; layer3(xs) |&gt; size  # output size = `ceil(input_size/stride)` = 50
-(50, 50, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L13-L45">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.flatten" href="#Flux.flatten"><code>Flux.flatten</code></a> — <span class="docstring-category">Function</span></header><section><div><p>flatten(x)</p><p>Same as <a href="../../data/mlutils/#MLUtils.flatten"><code>MLUtils.flatten</code></a>, which  should be prefered to this method existing  only for backward compatibility.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/stateless.jl#L98-L104">source</a></section></article><h2 id="MultiHeadAttention"><a class="docs-heading-anchor" href="#MultiHeadAttention">MultiHeadAttention</a><a id="MultiHeadAttention-1"></a><a class="docs-heading-anchor-permalink" href="#MultiHeadAttention" title="Permalink"></a></h2><p>The basic blocks needed to implement <a href="https://arxiv.org/abs/1706.03762">Transformer</a> architectures. See also the functional counterparts documented in NNlib&#39;s <a href="../nnlib/#Attention">Attention</a> section.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.MultiHeadAttention" href="#Flux.MultiHeadAttention"><code>Flux.MultiHeadAttention</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MultiHeadAttention(dims; [nheads, bias, init, dropout_prob])</code></pre><p>The multi-head dot-product attention layer used in Transformer architectures [1].</p><p>Returns the transformed input sequence and the attention scores.</p><p>[1] Vaswani et al. &quot;Attention is all you need.&quot; Advances in Neural Information Processing Systems. 2017.</p><p><strong>Arguments</strong></p><ul><li><code>dims</code>: The embedding dimensions of inputs, intermediate tensors and outputs.         In the most general case, it is given as          a) <code>(q_in_dim, k_in_dim, v_in_dim) =&gt; (qk_dim, v_dim) =&gt; out_dim</code>.         Can take also simpler forms as         b) <code>dims::Int</code>;         c) <code>in_dim::Int =&gt; (qk_dim, v_dim) =&gt; out_dim</code>;         d) <code>in_dim::Int =&gt; qkv_dim =&gt; out_dim</code>.</li><li><code>nheads</code>: number of heads. Default <code>8</code>.</li><li><code>init</code>: weight initializer for the Dense layers. Default <code>glorot_uniform</code>.</li><li><code>bias</code> : whether pointwise QKVO dense transforms use bias. Default <code>false</code>.</li><li><code>dropout_prob</code>: dropout probability for the attention scores. Default <code>0.0</code>.</li></ul><p><strong>Forward</strong></p><pre><code class="nohighlight hljs">(mha::MultiHeadAttention)(q_in, k_in, v_in, [bias]; [mask])</code></pre><p>The arguments of the forward pass are:</p><ul><li><code>q_in</code>: Input query array of size <code>(q_in_dim, q_len, batch_size)</code>.</li><li><code>k_in</code>: Input key array of size <code>(k_in_dim, kv_len, batch_size)</code>.</li><li><code>v_in</code>: Input value array of size <code>(v_in_dim, kv_len, batch_size)</code>.</li><li><code>bias</code>: Bias array broadcastable to size <code>(kv_len, q_len, nheads, batch_size)</code>.          It will be added to the attention scores before the softmax.         Default <code>nothing</code>.</li><li><code>mask</code>: Input array broadcastable to size          <code>(kv_len, q_len, nheads, batch_size)</code>.          The mask is applied to the attention scores just before the softmax.          See <a href="../nnlib/#NNlib.make_causal_mask"><code>NNlib.make_causal_mask</code></a> for creating causal masks.          Default <code>nothing</code>.</li></ul><p>Alternative calling signatures are <code>mha(q_in)</code>, equivalent to <code>mha(q_in, q_in, q_in)</code> (self-attention), and <code>mha(q_in, k_in)</code>, equivalent to <code>mha(q_in, k_in, k_in)</code> (key and value are the same).</p><p>See also <a href="../nnlib/#NNlib.dot_product_attention"><code>NNlib.dot_product_attention</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia hljs">mha = MultiHeadAttention(64, nheads = 8)
+(50, 50, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L13-L45">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.flatten" href="#Flux.flatten"><code>Flux.flatten</code></a> — <span class="docstring-category">Function</span></header><section><div><p>flatten(x)</p><p>Same as <a href="../../data/mlutils/#MLUtils.flatten"><code>MLUtils.flatten</code></a>, which  should be prefered to this method existing  only for backward compatibility.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/stateless.jl#L98-L104">source</a></section></article><h2 id="MultiHeadAttention"><a class="docs-heading-anchor" href="#MultiHeadAttention">MultiHeadAttention</a><a id="MultiHeadAttention-1"></a><a class="docs-heading-anchor-permalink" href="#MultiHeadAttention" title="Permalink"></a></h2><p>The basic blocks needed to implement <a href="https://arxiv.org/abs/1706.03762">Transformer</a> architectures. See also the functional counterparts documented in NNlib&#39;s <a href="../nnlib/#Attention">Attention</a> section.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.MultiHeadAttention" href="#Flux.MultiHeadAttention"><code>Flux.MultiHeadAttention</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MultiHeadAttention(dims; [nheads, bias, init, dropout_prob])</code></pre><p>The multi-head dot-product attention layer used in Transformer architectures [1].</p><p>Returns the transformed input sequence and the attention scores.</p><p>[1] Vaswani et al. &quot;Attention is all you need.&quot; Advances in Neural Information Processing Systems. 2017.</p><p><strong>Arguments</strong></p><ul><li><code>dims</code>: The embedding dimensions of inputs, intermediate tensors and outputs.         In the most general case, it is given as          a) <code>(q_in_dim, k_in_dim, v_in_dim) =&gt; (qk_dim, v_dim) =&gt; out_dim</code>.         Can take also simpler forms as         b) <code>dims::Int</code>;         c) <code>in_dim::Int =&gt; (qk_dim, v_dim) =&gt; out_dim</code>;         d) <code>in_dim::Int =&gt; qkv_dim =&gt; out_dim</code>.</li><li><code>nheads</code>: number of heads. Default <code>8</code>.</li><li><code>init</code>: weight initializer for the Dense layers. Default <code>glorot_uniform</code>.</li><li><code>bias</code> : whether pointwise QKVO dense transforms use bias. Default <code>false</code>.</li><li><code>dropout_prob</code>: dropout probability for the attention scores. Default <code>0.0</code>.</li></ul><p><strong>Forward</strong></p><pre><code class="nohighlight hljs">(mha::MultiHeadAttention)(q_in, k_in, v_in, [bias]; [mask])</code></pre><p>The arguments of the forward pass are:</p><ul><li><code>q_in</code>: Input query array of size <code>(q_in_dim, q_len, batch_size)</code>.</li><li><code>k_in</code>: Input key array of size <code>(k_in_dim, kv_len, batch_size)</code>.</li><li><code>v_in</code>: Input value array of size <code>(v_in_dim, kv_len, batch_size)</code>.</li><li><code>bias</code>: Bias array broadcastable to size <code>(kv_len, q_len, nheads, batch_size)</code>.          It will be added to the attention scores before the softmax.         Default <code>nothing</code>.</li><li><code>mask</code>: Input array broadcastable to size          <code>(kv_len, q_len, nheads, batch_size)</code>.          The mask is applied to the attention scores just before the softmax.          See <a href="../nnlib/#NNlib.make_causal_mask"><code>NNlib.make_causal_mask</code></a> for creating causal masks.          Default <code>nothing</code>.</li></ul><p>Alternative calling signatures are <code>mha(q_in)</code>, equivalent to <code>mha(q_in, q_in, q_in)</code> (self-attention), and <code>mha(q_in, k_in)</code>, equivalent to <code>mha(q_in, k_in, k_in)</code> (key and value are the same).</p><p>See also <a href="../nnlib/#NNlib.dot_product_attention"><code>NNlib.dot_product_attention</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia hljs">mha = MultiHeadAttention(64, nheads = 8)
 q = rand(Float32, (64, 10, 32))
 k = rand(Float32, (64, 20, 32))
 v = rand(Float32, (64, 20, 32))
@@ -180,13 +180,13 @@
 mha = MultiHeadAttention(64 =&gt; 1024 =&gt; 1024, nheads = 8)
 y, α = mha(q) # self-attention
 # [y] = [1024, 10, 32]
-# [α] = [10, 10, 8, 32]</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/attention.jl#L5-L67">source</a></section></article><h3 id="Pooling"><a class="docs-heading-anchor" href="#Pooling">Pooling</a><a id="Pooling-1"></a><a class="docs-heading-anchor-permalink" href="#Pooling" title="Permalink"></a></h3><p>These layers are commonly used after a convolution layer, and reduce the size of its output. They have no trainable parameters.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.AdaptiveMaxPool" href="#Flux.AdaptiveMaxPool"><code>Flux.AdaptiveMaxPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">AdaptiveMaxPool(out::NTuple)</code></pre><p>Adaptive max pooling layer. Calculates the necessary window size such that its output has <code>size(y)[1:N] == out</code>.</p><p>Expects as input an array with <code>ndims(x) == N+2</code>, i.e. channel and batch dimensions, after the <code>N</code> feature dimensions, where <code>N = length(out)</code>.</p><p>See also <a href="#Flux.MaxPool"><code>MaxPool</code></a>, <a href="#Flux.AdaptiveMeanPool"><code>AdaptiveMeanPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # batch of 50 RGB images
+# [α] = [10, 10, 8, 32]</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/attention.jl#L5-L67">source</a></section></article><h3 id="Pooling"><a class="docs-heading-anchor" href="#Pooling">Pooling</a><a id="Pooling-1"></a><a class="docs-heading-anchor-permalink" href="#Pooling" title="Permalink"></a></h3><p>These layers are commonly used after a convolution layer, and reduce the size of its output. They have no trainable parameters.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.AdaptiveMaxPool" href="#Flux.AdaptiveMaxPool"><code>Flux.AdaptiveMaxPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">AdaptiveMaxPool(out::NTuple)</code></pre><p>Adaptive max pooling layer. Calculates the necessary window size such that its output has <code>size(y)[1:N] == out</code>.</p><p>Expects as input an array with <code>ndims(x) == N+2</code>, i.e. channel and batch dimensions, after the <code>N</code> feature dimensions, where <code>N = length(out)</code>.</p><p>See also <a href="#Flux.MaxPool"><code>MaxPool</code></a>, <a href="#Flux.AdaptiveMeanPool"><code>AdaptiveMeanPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # batch of 50 RGB images
 
 julia&gt; AdaptiveMaxPool((25, 25))(xs) |&gt; size
 (25, 25, 3, 50)
 
 julia&gt; MaxPool((4,4))(xs) ≈ AdaptiveMaxPool((25, 25))(xs)
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L518-L539">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.MaxPool" href="#Flux.MaxPool"><code>Flux.MaxPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MaxPool(window::NTuple; pad=0, stride=window)</code></pre><p>Max pooling layer, which replaces all pixels in a block of size <code>window</code> with one.</p><p>Expects as input an array with <code>ndims(x) == N+2</code>, i.e. channel and batch dimensions, after the <code>N</code> feature dimensions, where <code>N = length(window)</code>.</p><p>By default the window size is also the stride in each dimension. The keyword <code>pad</code> accepts the same options as for the <code>Conv</code> layer, including <code>SamePad()</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a>, <a href="#Flux.MeanPool"><code>MeanPool</code></a>, <a href="#Flux.AdaptiveMaxPool"><code>AdaptiveMaxPool</code></a>, <a href="#Flux.GlobalMaxPool"><code>GlobalMaxPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # batch of 50 RGB images
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L518-L539">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.MaxPool" href="#Flux.MaxPool"><code>Flux.MaxPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MaxPool(window::NTuple; pad=0, stride=window)</code></pre><p>Max pooling layer, which replaces all pixels in a block of size <code>window</code> with one.</p><p>Expects as input an array with <code>ndims(x) == N+2</code>, i.e. channel and batch dimensions, after the <code>N</code> feature dimensions, where <code>N = length(window)</code>.</p><p>By default the window size is also the stride in each dimension. The keyword <code>pad</code> accepts the same options as for the <code>Conv</code> layer, including <code>SamePad()</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a>, <a href="#Flux.MeanPool"><code>MeanPool</code></a>, <a href="#Flux.AdaptiveMaxPool"><code>AdaptiveMaxPool</code></a>, <a href="#Flux.GlobalMaxPool"><code>GlobalMaxPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # batch of 50 RGB images
 
 julia&gt; m = Chain(Conv((5, 5), 3 =&gt; 7, pad=SamePad()), MaxPool((5, 5), pad=SamePad()))
 Chain(
@@ -204,7 +204,7 @@
 MaxPool((5,), pad=2, stride=3)
 
 julia&gt; layer(rand(Float32, 100, 7, 50)) |&gt; size
-(34, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L675-L713">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GlobalMaxPool" href="#Flux.GlobalMaxPool"><code>Flux.GlobalMaxPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">GlobalMaxPool()</code></pre><p>Global max pooling layer.</p><p>Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing max pooling on the complete (w,h)-shaped feature maps.</p><p>See also <a href="#Flux.MaxPool"><code>MaxPool</code></a>, <a href="#Flux.GlobalMeanPool"><code>GlobalMeanPool</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);
+(34, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L675-L713">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GlobalMaxPool" href="#Flux.GlobalMaxPool"><code>Flux.GlobalMaxPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">GlobalMaxPool()</code></pre><p>Global max pooling layer.</p><p>Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing max pooling on the complete (w,h)-shaped feature maps.</p><p>See also <a href="#Flux.MaxPool"><code>MaxPool</code></a>, <a href="#Flux.GlobalMeanPool"><code>GlobalMeanPool</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);
 
 julia&gt; m = Chain(Conv((3,3), 3 =&gt; 7), GlobalMaxPool());
 
@@ -212,13 +212,13 @@
 (1, 1, 7, 50)
 
 julia&gt; GlobalMaxPool()(rand(3,5,7)) |&gt; size  # preserves 2 dimensions
-(1, 5, 7)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L602-L623">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.AdaptiveMeanPool" href="#Flux.AdaptiveMeanPool"><code>Flux.AdaptiveMeanPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">AdaptiveMeanPool(out::NTuple)</code></pre><p>Adaptive mean pooling layer. Calculates the necessary window size such that its output has <code>size(y)[1:N] == out</code>.</p><p>Expects as input an array with <code>ndims(x) == N+2</code>, i.e. channel and batch dimensions, after the <code>N</code> feature dimensions, where <code>N = length(out)</code>.</p><p>See also <a href="#Flux.MaxPool"><code>MaxPool</code></a>, <a href="#Flux.AdaptiveMaxPool"><code>AdaptiveMaxPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # batch of 50 RGB images
+(1, 5, 7)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L602-L623">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.AdaptiveMeanPool" href="#Flux.AdaptiveMeanPool"><code>Flux.AdaptiveMeanPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">AdaptiveMeanPool(out::NTuple)</code></pre><p>Adaptive mean pooling layer. Calculates the necessary window size such that its output has <code>size(y)[1:N] == out</code>.</p><p>Expects as input an array with <code>ndims(x) == N+2</code>, i.e. channel and batch dimensions, after the <code>N</code> feature dimensions, where <code>N = length(out)</code>.</p><p>See also <a href="#Flux.MaxPool"><code>MaxPool</code></a>, <a href="#Flux.AdaptiveMaxPool"><code>AdaptiveMaxPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);  # batch of 50 RGB images
 
 julia&gt; AdaptiveMeanPool((25, 25))(xs) |&gt; size
 (25, 25, 3, 50)
 
 julia&gt; MeanPool((4,4))(xs) ≈ AdaptiveMeanPool((25, 25))(xs)
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L560-L581">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.MeanPool" href="#Flux.MeanPool"><code>Flux.MeanPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MeanPool(window::NTuple; pad=0, stride=window)</code></pre><p>Mean pooling layer, averaging all pixels in a block of size <code>window</code>.</p><p>Expects as input an array with <code>ndims(x) == N+2</code>, i.e. channel and batch dimensions, after the <code>N</code> feature dimensions, where <code>N = length(window)</code>.</p><p>By default the window size is also the stride in each dimension. The keyword <code>pad</code> accepts the same options as for the <code>Conv</code> layer, including <code>SamePad()</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a>, <a href="#Flux.MaxPool"><code>MaxPool</code></a>, <a href="#Flux.AdaptiveMeanPool"><code>AdaptiveMeanPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L560-L581">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.MeanPool" href="#Flux.MeanPool"><code>Flux.MeanPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MeanPool(window::NTuple; pad=0, stride=window)</code></pre><p>Mean pooling layer, averaging all pixels in a block of size <code>window</code>.</p><p>Expects as input an array with <code>ndims(x) == N+2</code>, i.e. channel and batch dimensions, after the <code>N</code> feature dimensions, where <code>N = length(window)</code>.</p><p>By default the window size is also the stride in each dimension. The keyword <code>pad</code> accepts the same options as for the <code>Conv</code> layer, including <code>SamePad()</code>.</p><p>See also <a href="#Flux.Conv"><code>Conv</code></a>, <a href="#Flux.MaxPool"><code>MaxPool</code></a>, <a href="#Flux.AdaptiveMeanPool"><code>AdaptiveMeanPool</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);
 
 julia&gt; m = Chain(Conv((5,5), 3 =&gt; 7), MeanPool((5,5), pad=SamePad()))
 Chain(
@@ -230,12 +230,12 @@
 (96, 96, 7, 50)
 
 julia&gt; m(xs) |&gt; size
-(20, 20, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L742-L773">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GlobalMeanPool" href="#Flux.GlobalMeanPool"><code>Flux.GlobalMeanPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">GlobalMeanPool()</code></pre><p>Global mean pooling layer.</p><p>Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing mean pooling on the complete (w,h)-shaped feature maps.</p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);
+(20, 20, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L742-L773">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GlobalMeanPool" href="#Flux.GlobalMeanPool"><code>Flux.GlobalMeanPool</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">GlobalMeanPool()</code></pre><p>Global mean pooling layer.</p><p>Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing mean pooling on the complete (w,h)-shaped feature maps.</p><pre><code class="language-julia-repl hljs">julia&gt; xs = rand(Float32, 100, 100, 3, 50);
 
 julia&gt; m = Chain(Conv((3,3), 3 =&gt; 7), GlobalMeanPool());
 
 julia&gt; m(xs) |&gt; size
-(1, 1, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/conv.jl#L641-L657">source</a></section></article><h2 id="Upsampling"><a class="docs-heading-anchor" href="#Upsampling">Upsampling</a><a id="Upsampling-1"></a><a class="docs-heading-anchor-permalink" href="#Upsampling" title="Permalink"></a></h2><p>The opposite of pooling, these layers increase the size of an array. They have no trainable parameters. </p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Upsample" href="#Flux.Upsample"><code>Flux.Upsample</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Upsample(mode = :nearest; [scale, size]) 
+(1, 1, 7, 50)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/conv.jl#L641-L657">source</a></section></article><h2 id="Upsampling"><a class="docs-heading-anchor" href="#Upsampling">Upsampling</a><a id="Upsampling-1"></a><a class="docs-heading-anchor-permalink" href="#Upsampling" title="Permalink"></a></h2><p>The opposite of pooling, these layers increase the size of an array. They have no trainable parameters. </p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Upsample" href="#Flux.Upsample"><code>Flux.Upsample</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Upsample(mode = :nearest; [scale, size]) 
 Upsample(scale, mode = :nearest)</code></pre><p>An upsampling layer. One of two keywords must be given:</p><p>If <code>scale</code> is a number, this applies to all but the last two dimensions (channel and batch) of the input.  It may also be a tuple, to control dimensions individually. Alternatively, keyword  <code>size</code> accepts a tuple, to directly specify the leading dimensions of the output.</p><p>Currently supported upsampling <code>mode</code>s  and corresponding NNlib&#39;s methods are:</p><ul><li><code>:nearest</code> -&gt; <a href="../nnlib/#NNlib.upsample_nearest"><code>NNlib.upsample_nearest</code></a> </li><li><code>:bilinear</code> -&gt; <a href="../nnlib/#NNlib.upsample_bilinear"><code>NNlib.upsample_bilinear</code></a></li><li><code>:trilinear</code> -&gt; <a href="../nnlib/#NNlib.upsample_trilinear"><code>NNlib.upsample_trilinear</code></a></li></ul><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = Upsample(scale = (2, 3))
 Upsample(:nearest, scale = (2, 3))
 
@@ -246,7 +246,7 @@
 Upsample(:bilinear, size = (4, 5))
 
 julia&gt; m(ones(2, 2, 1, 1)) |&gt; size
-(4, 5, 1, 1)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/upsample.jl#L1-L32">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.PixelShuffle" href="#Flux.PixelShuffle"><code>Flux.PixelShuffle</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">PixelShuffle(r::Int)</code></pre><p>Pixel shuffling layer with upscale factor <code>r</code>. Usually used for generating higher resolution images while upscaling them.</p><p>See <a href="../nnlib/#NNlib.pixel_shuffle"><code>NNlib.pixel_shuffle</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; p = PixelShuffle(2);
+(4, 5, 1, 1)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/upsample.jl#L1-L32">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.PixelShuffle" href="#Flux.PixelShuffle"><code>Flux.PixelShuffle</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">PixelShuffle(r::Int)</code></pre><p>Pixel shuffling layer with upscale factor <code>r</code>. Usually used for generating higher resolution images while upscaling them.</p><p>See <a href="../nnlib/#NNlib.pixel_shuffle"><code>NNlib.pixel_shuffle</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; p = PixelShuffle(2);
 
 julia&gt; xs = [2row + col + channel/10 for row in 1:2, col in 1:2, channel in 1:4, n in 1:1]
 2×2×4×1 Array{Float64, 4}:
@@ -298,7 +298,7 @@
  4.1  4.3  5.1  5.3  6.1  6.3
  4.2  4.4  5.2  5.4  6.2  6.4
  7.1  7.3  8.1  8.3  9.1  9.3
- 7.2  7.4  8.2  8.4  9.2  9.4</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/upsample.jl#L75-L139">source</a></section></article><h2 id="Embedding-Vectors"><a class="docs-heading-anchor" href="#Embedding-Vectors">Embedding Vectors</a><a id="Embedding-Vectors-1"></a><a class="docs-heading-anchor-permalink" href="#Embedding-Vectors" title="Permalink"></a></h2><p>These layers accept an index, and return a vector (or several indices, and several vectors). The possible embedding vectors are learned parameters.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Embedding" href="#Flux.Embedding"><code>Flux.Embedding</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Embedding(in =&gt; out; init=randn32)</code></pre><p>A lookup table that stores embeddings of dimension <code>out</code>  for a vocabulary of size <code>in</code>, as a trainable matrix.</p><p>This layer is often used to store word embeddings and retrieve them using indices.  The input to the layer can be a vocabulary index in <code>1:in</code>, an array of indices, or the corresponding <a href="../../data/onehot/#OneHotArrays.onehotbatch"><code>onehot encoding</code></a>.</p><p>For indices <code>x</code>, the result is of size <code>(out, size(x)...)</code>, allowing several batch dimensions. For one-hot <code>ohx</code>, the result is of size <code>(out, size(ohx)[2:end]...)</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; emb = Embedding(26 =&gt; 4, init=Flux.identity_init(gain=22))
+ 7.2  7.4  8.2  8.4  9.2  9.4</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/upsample.jl#L75-L139">source</a></section></article><h2 id="Embedding-Vectors"><a class="docs-heading-anchor" href="#Embedding-Vectors">Embedding Vectors</a><a id="Embedding-Vectors-1"></a><a class="docs-heading-anchor-permalink" href="#Embedding-Vectors" title="Permalink"></a></h2><p>These layers accept an index, and return a vector (or several indices, and several vectors). The possible embedding vectors are learned parameters.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Embedding" href="#Flux.Embedding"><code>Flux.Embedding</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Embedding(in =&gt; out; init=randn32)</code></pre><p>A lookup table that stores embeddings of dimension <code>out</code>  for a vocabulary of size <code>in</code>, as a trainable matrix.</p><p>This layer is often used to store word embeddings and retrieve them using indices.  The input to the layer can be a vocabulary index in <code>1:in</code>, an array of indices, or the corresponding <a href="../../data/onehot/#OneHotArrays.onehotbatch"><code>onehot encoding</code></a>.</p><p>For indices <code>x</code>, the result is of size <code>(out, size(x)...)</code>, allowing several batch dimensions. For one-hot <code>ohx</code>, the result is of size <code>(out, size(ohx)[2:end]...)</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; emb = Embedding(26 =&gt; 4, init=Flux.identity_init(gain=22))
 Embedding(26 =&gt; 4)  # 104 parameters
 
 julia&gt; emb(2)  # one column of e.weight (here not random!)
@@ -319,7 +319,7 @@
 true
 
 julia&gt; emb(rand(1:26, (10, 1, 12))) |&gt; size  # three batch dimensions
-(4, 10, 1, 12)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L665-L703">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.EmbeddingBag" href="#Flux.EmbeddingBag"><code>Flux.EmbeddingBag</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">EmbeddingBag(in =&gt; out, reduction=mean; init=Flux.randn32)</code></pre><p>A lookup table that stores embeddings of dimension <code>out</code> for a vocabulary of size <code>in</code>. Differs from <a href="#Flux.Embedding"><code>Embedding</code></a> in that, instead of acting on a single vocabulary index, it always acts a vector of indices which it calls a &quot;bag&quot;. Their individual embedding vectors are reduced to one, using <code>mean</code> or some other function.</p><p>Instead of acting on one &quot;bag&quot;, such as <code>x::Vector{Int}</code>, the layer can also act on several:</p><ul><li><p>Acting on a vector of &quot;bags&quot;, it produces a matrix whose columns are the reduced vectors. More generally on <code>x::Array{Vector{Int}}</code>, its output is of size <code>(out, size(x)...)</code>.</p></li><li><p>Any higher-rank array of integers is interpreted as a collection of &quot;bags&quot; each along the first dimension. Thus the output is <code>mapslices(e, x; dims=1)</code> when <code>e::EmbeddingBag</code> and <code>x::Array{Int,N}</code>. This method is more efficient, but requires that all &quot;bags&quot; have the same length.</p></li><li><p>A vector of &quot;bags&quot; may also be produced by splitting a vector of indices at specified points. For this case the layer takes two inputs, both vectors of integers. See details below.</p></li></ul><p>The &quot;bag&quot; may equivalently be represented as a <code>OneHotMatrix</code>. A collection of these, or one higher-rank <code>OneHotArray</code>, again produce a stack of embeddings. See details below.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; vocab_size = 26;  # embed into 3 dimensions, with non-random vectors:
+(4, 10, 1, 12)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L665-L703">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.EmbeddingBag" href="#Flux.EmbeddingBag"><code>Flux.EmbeddingBag</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">EmbeddingBag(in =&gt; out, reduction=mean; init=Flux.randn32)</code></pre><p>A lookup table that stores embeddings of dimension <code>out</code> for a vocabulary of size <code>in</code>. Differs from <a href="#Flux.Embedding"><code>Embedding</code></a> in that, instead of acting on a single vocabulary index, it always acts a vector of indices which it calls a &quot;bag&quot;. Their individual embedding vectors are reduced to one, using <code>mean</code> or some other function.</p><p>Instead of acting on one &quot;bag&quot;, such as <code>x::Vector{Int}</code>, the layer can also act on several:</p><ul><li><p>Acting on a vector of &quot;bags&quot;, it produces a matrix whose columns are the reduced vectors. More generally on <code>x::Array{Vector{Int}}</code>, its output is of size <code>(out, size(x)...)</code>.</p></li><li><p>Any higher-rank array of integers is interpreted as a collection of &quot;bags&quot; each along the first dimension. Thus the output is <code>mapslices(e, x; dims=1)</code> when <code>e::EmbeddingBag</code> and <code>x::Array{Int,N}</code>. This method is more efficient, but requires that all &quot;bags&quot; have the same length.</p></li><li><p>A vector of &quot;bags&quot; may also be produced by splitting a vector of indices at specified points. For this case the layer takes two inputs, both vectors of integers. See details below.</p></li></ul><p>The &quot;bag&quot; may equivalently be represented as a <code>OneHotMatrix</code>. A collection of these, or one higher-rank <code>OneHotArray</code>, again produce a stack of embeddings. See details below.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; vocab_size = 26;  # embed into 3 dimensions, with non-random vectors:
 
 julia&gt; eb = EmbeddingBag(vocab_size =&gt; 3, init=Flux.identity_init(gain=100))
 EmbeddingBag(26 =&gt; 3)  # 78 parameters
@@ -368,7 +368,7 @@
 3×2 Matrix{Float32}:
  33.3333    0.0
  66.6667    0.0
-  0.0     100.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L753-L844">source</a></section></article><h2 id="man-dataflow-layers"><a class="docs-heading-anchor" href="#man-dataflow-layers">Dataflow Layers, or Containers</a><a id="man-dataflow-layers-1"></a><a class="docs-heading-anchor-permalink" href="#man-dataflow-layers" title="Permalink"></a></h2><p>The basic <code>Chain(F, G, H)</code> applies the layers it contains in sequence, equivalent to <code>H ∘ G ∘ F</code>. Flux has some other layers which contain layers, but connect them up in a more complicated way: <code>SkipConnection</code> allows ResNet&#39;s residual connection.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Chain" href="#Flux.Chain"><code>Flux.Chain</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Chain(layers...)
+  0.0     100.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L753-L844">source</a></section></article><h2 id="man-dataflow-layers"><a class="docs-heading-anchor" href="#man-dataflow-layers">Dataflow Layers, or Containers</a><a id="man-dataflow-layers-1"></a><a class="docs-heading-anchor-permalink" href="#man-dataflow-layers" title="Permalink"></a></h2><p>The basic <code>Chain(F, G, H)</code> applies the layers it contains in sequence, equivalent to <code>H ∘ G ∘ F</code>. Flux has some other layers which contain layers, but connect them up in a more complicated way: <code>SkipConnection</code> allows ResNet&#39;s residual connection.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Chain" href="#Flux.Chain"><code>Flux.Chain</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Chain(layers...)
 Chain(name = layer, ...)</code></pre><p>Collects multiple layers / functions to be called in sequence on a given input. Supports indexing and slicing, <code>m[2]</code> or <code>m[1:end-1]</code>, and if names are given, <code>m[:name] == m[1]</code> etc.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = Chain(x -&gt; x^2, x -&gt; x+1);
 
 julia&gt; m(5) == 26
@@ -385,12 +385,12 @@
                   dec = Dense(5 =&gt; 2));
 
 julia&gt; m2(x) == (m2[:dec] ∘ m2[:enc])(x)
-true</code></pre><p>For large models, there is a special type-unstable path which can reduce compilation times. This can be used by supplying a vector of layers <code>Chain([layer1, layer2, ...])</code>. This feature is somewhat experimental, beware!</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L1-L34">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.activations" href="#Flux.activations"><code>Flux.activations</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">activations(c::Chain, input)</code></pre><p>Like calling a <code>Chain</code>, but saves the result of each layer as an output.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Flux: activations
+true</code></pre><p>For large models, there is a special type-unstable path which can reduce compilation times. This can be used by supplying a vector of layers <code>Chain([layer1, layer2, ...])</code>. This feature is somewhat experimental, beware!</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L1-L34">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.activations" href="#Flux.activations"><code>Flux.activations</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">activations(c::Chain, input)</code></pre><p>Like calling a <code>Chain</code>, but saves the result of each layer as an output.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Flux: activations
 
 julia&gt; c = Chain(x -&gt; x + 1, x -&gt; x * 2, x -&gt; x ^ 3);
 
 julia&gt; activations(c, 1)
-(2, 4, 64)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L86-L101">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Maxout" href="#Flux.Maxout"><code>Flux.Maxout</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Maxout(layers...)
+(2, 4, 64)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L86-L101">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Maxout" href="#Flux.Maxout"><code>Flux.Maxout</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Maxout(layers...)
 Maxout(f, n_alts)</code></pre><p>This contains a number of internal layers, each of which receives the same input. Its output is the elementwise maximum of the internal layers&#39; outputs.</p><p>Instead of defining layers individually, you can provide a zero-argument function which constructs them, and the number to construct.</p><p>Maxout over linear dense layers satisfies the universal approximation theorem. See Goodfellow, Warde-Farley, Mirza, Courville &amp; Bengio &quot;Maxout Networks&quot;  <a href="https://arxiv.org/abs/1302.4389">https://arxiv.org/abs/1302.4389</a>.</p><p>See also <a href="#Flux.Parallel"><code>Parallel</code></a> to reduce with other operators.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = Maxout(x -&gt; abs2.(x), x -&gt; x .* 3);
 
 julia&gt; m([-2 -1 0 1 2])
@@ -405,7 +405,7 @@
 )                   # Total: 6 arrays, 126 parameters, 888 bytes.
 
 julia&gt; Flux.outputsize(m3, (5, 11))
-(7, 11)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L272-L306">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.SkipConnection" href="#Flux.SkipConnection"><code>Flux.SkipConnection</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">SkipConnection(layer, connection)</code></pre><p>Create a skip connection which consists of a layer or <code>Chain</code> of consecutive layers and a shortcut connection linking the block&#39;s input to the output through a user-supplied 2-argument callable. The first argument to the callable will be propagated through the given <code>layer</code> while the second is the unchanged, &quot;skipped&quot; input.</p><p>The simplest &quot;ResNet&quot;-type connection is just <code>SkipConnection(layer, +)</code>. Here is a more complicated example:</p><pre><code class="language-julia-repl hljs">julia&gt; m = Conv((3,3), 4 =&gt; 7, pad=(1,1));
+(7, 11)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L272-L306">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.SkipConnection" href="#Flux.SkipConnection"><code>Flux.SkipConnection</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">SkipConnection(layer, connection)</code></pre><p>Create a skip connection which consists of a layer or <code>Chain</code> of consecutive layers and a shortcut connection linking the block&#39;s input to the output through a user-supplied 2-argument callable. The first argument to the callable will be propagated through the given <code>layer</code> while the second is the unchanged, &quot;skipped&quot; input.</p><p>The simplest &quot;ResNet&quot;-type connection is just <code>SkipConnection(layer, +)</code>. Here is a more complicated example:</p><pre><code class="language-julia-repl hljs">julia&gt; m = Conv((3,3), 4 =&gt; 7, pad=(1,1));
 
 julia&gt; x = ones(Float32, 5, 5, 4, 10);
 
@@ -415,7 +415,7 @@
 julia&gt; sm = SkipConnection(m, (mx, x) -&gt; cat(mx, x, dims=3));
 
 julia&gt; size(sm(x)) == (5, 5, 11, 10)
-true</code></pre><p>See also <a href="#Flux.Parallel"><code>Parallel</code></a>, <a href="#Flux.Maxout"><code>Maxout</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L328-L354">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Parallel" href="#Flux.Parallel"><code>Flux.Parallel</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Parallel(connection, layers...)
+true</code></pre><p>See also <a href="#Flux.Parallel"><code>Parallel</code></a>, <a href="#Flux.Maxout"><code>Maxout</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L328-L354">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Parallel" href="#Flux.Parallel"><code>Flux.Parallel</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Parallel(connection, layers...)
 Parallel(connection; name = layer, ...)</code></pre><p>Create a layer which passes an input array to each path in <code>layers</code>, before reducing the output with <code>connection</code>.</p><p>Called with one input <code>x</code>, this is equivalent to <code>connection([l(x) for l in layers]...)</code>. If called with multiple inputs, one is passed to each layer, thus <code>Parallel(+, f, g)(x, y) = f(x) + g(y)</code>.</p><p>Like <a href="#Flux.Chain"><code>Chain</code></a>, its sub-layers may be given names using the keyword constructor. These can be accessed by indexing: <code>m[1] == m[:name]</code> is the first layer.</p><p>See also <a href="#Flux.SkipConnection"><code>SkipConnection</code></a> which is <code>Parallel</code> with one <code>identity</code>, and <a href="#Flux.Maxout"><code>Maxout</code></a> which reduces by broadcasting <code>max</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; model = Chain(Dense(3 =&gt; 5),
                      Parallel(vcat, Dense(5 =&gt; 4), Chain(Dense(5 =&gt; 7), Dense(7 =&gt; 4))),
                      Dense(8 =&gt; 17));
@@ -437,7 +437,7 @@
 (2,)
 
 julia&gt; model2[:β] == model2[2]
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L471-L513">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.PairwiseFusion" href="#Flux.PairwiseFusion"><code>Flux.PairwiseFusion</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">PairwiseFusion(connection, layers...)</code></pre><p><strong>Arguments</strong></p><ul><li><code>connection</code>: A function taking 2 inputs and combining them into a single output </li><li><code>layers</code>: The layers whose outputs are combined</li></ul><p><strong>Inputs</strong></p><p>This layer behaves differently based on input type:</p><ol><li>If input <code>x</code> is a tuple of length N (or the input is <code>xs</code> with N <code>x</code>&#39;s), matching the number of <code>layers</code>, </li></ol><p>then each layer receives a new input <code>x[i]</code> combined with the previous output <code>y[i-1]</code> using <code>connection</code>.   Thus <code>(y1, y2, y3) = PairwiseFusion(connection, layer1, layer2, layer3)((x1, x2, x3))</code>   may be drawn as:</p><pre><code class="nohighlight hljs">x1 → layer1 → y1 ↘
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L471-L513">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.PairwiseFusion" href="#Flux.PairwiseFusion"><code>Flux.PairwiseFusion</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">PairwiseFusion(connection, layers...)</code></pre><p><strong>Arguments</strong></p><ul><li><code>connection</code>: A function taking 2 inputs and combining them into a single output </li><li><code>layers</code>: The layers whose outputs are combined</li></ul><p><strong>Inputs</strong></p><p>This layer behaves differently based on input type:</p><ol><li>If input <code>x</code> is a tuple of length N (or the input is <code>xs</code> with N <code>x</code>&#39;s), matching the number of <code>layers</code>, </li></ol><p>then each layer receives a new input <code>x[i]</code> combined with the previous output <code>y[i-1]</code> using <code>connection</code>.   Thus <code>(y1, y2, y3) = PairwiseFusion(connection, layer1, layer2, layer3)((x1, x2, x3))</code>   may be drawn as:</p><pre><code class="nohighlight hljs">x1 → layer1 → y1 ↘
                   connection → layer2 → y2 ↘
               x2 ↗                          connection → layer3 → y3
                                         x3 ↗</code></pre><p>... or written as:</p><pre><code class="language-julia hljs">y1 = layer1(x1)
@@ -445,7 +445,7 @@
 y3 = layer3(connection(y2, x3))</code></pre><ol><li>With just one input, each layer receives the same <code>x</code> combined with the previous output. Thus <code>y = PairwiseFusion(connection, layers...)(x)</code> obeys:</li></ol><pre><code class="language-julia hljs">y[1] == layers[1](x)
 for i in 2:length(layers)
     y[i] == connection(layers[i](y[i-1]), x)
-end</code></pre><p><strong>Returns</strong></p><p>A tuple of length N with the output of each fusion ((<code>y1</code>, <code>y2</code>, ..., <code>yN</code>) in the example above).</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/basic.jl#L561-L603">source</a></section></article><h2 id="Recurrent-Models"><a class="docs-heading-anchor" href="#Recurrent-Models">Recurrent Models</a><a id="Recurrent-Models-1"></a><a class="docs-heading-anchor-permalink" href="#Recurrent-Models" title="Permalink"></a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">RNN(in =&gt; out, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p><p>The arguments <code>in</code> and <code>out</code> describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length <code>in</code> or a batch of vectors represented as a <code>in x B</code> matrix and outputs a vector of length <code>out</code> or a batch of vectors of size <code>out x B</code>.</p><p>This constructor is syntactic sugar for <code>Recur(RNNCell(a...))</code>, and so RNNs are stateful. Note that the state shape can change depending on the inputs, and so it is good to <code>reset!</code> the model between inference calls if the batch size changes. See the examples below.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; r = RNN(3 =&gt; 5)
+end</code></pre><p><strong>Returns</strong></p><p>A tuple of length N with the output of each fusion ((<code>y1</code>, <code>y2</code>, ..., <code>yN</code>) in the example above).</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/basic.jl#L561-L603">source</a></section></article><h2 id="Recurrent-Models"><a class="docs-heading-anchor" href="#Recurrent-Models">Recurrent Models</a><a id="Recurrent-Models-1"></a><a class="docs-heading-anchor-permalink" href="#Recurrent-Models" title="Permalink"></a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">RNN(in =&gt; out, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p><p>The arguments <code>in</code> and <code>out</code> describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length <code>in</code> or a batch of vectors represented as a <code>in x B</code> matrix and outputs a vector of length <code>out</code> or a batch of vectors of size <code>out x B</code>.</p><p>This constructor is syntactic sugar for <code>Recur(RNNCell(a...))</code>, and so RNNs are stateful. Note that the state shape can change depending on the inputs, and so it is good to <code>reset!</code> the model between inference calls if the batch size changes. See the examples below.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; r = RNN(3 =&gt; 5)
 Recur(
   RNNCell(3 =&gt; 5, tanh),                # 50 parameters
 )         # Total: 4 trainable arrays, 50 parameters,
@@ -484,7 +484,7 @@
 julia&gt; r = Flux.Recur(Flux.RNNCell(tanh, rand(5, 4), Tridiagonal(rand(5, 5)), rand(5), rand(5, 1)))
 
 julia&gt; r(rand(4, 10)) |&gt; size # batch size of 10
-(5, 10)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/recurrent.jl#L219-L287">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">LSTM(in =&gt; out)</code></pre><p><a href="https://www.researchgate.net/publication/13853244_Long_Short-term_Memory">Long Short Term Memory</a> recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>The arguments <code>in</code> and <code>out</code> describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length <code>in</code> or a batch of vectors represented as a <code>in x B</code> matrix and outputs a vector of length <code>out</code> or a batch of vectors of size <code>out x B</code>.</p><p>This constructor is syntactic sugar for <code>Recur(LSTMCell(a...))</code>, and so LSTMs are stateful. Note that the state shape can change depending on the inputs, and so it is good to <code>reset!</code> the model between inference calls if the batch size changes. See the examples below.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; l = LSTM(3 =&gt; 5)
+(5, 10)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/recurrent.jl#L219-L287">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">LSTM(in =&gt; out)</code></pre><p><a href="https://www.researchgate.net/publication/13853244_Long_Short-term_Memory">Long Short Term Memory</a> recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>The arguments <code>in</code> and <code>out</code> describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length <code>in</code> or a batch of vectors represented as a <code>in x B</code> matrix and outputs a vector of length <code>out</code> or a batch of vectors of size <code>out x B</code>.</p><p>This constructor is syntactic sugar for <code>Recur(LSTMCell(a...))</code>, and so LSTMs are stateful. Note that the state shape can change depending on the inputs, and so it is good to <code>reset!</code> the model between inference calls if the batch size changes. See the examples below.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; l = LSTM(3 =&gt; 5)
 Recur(
   LSTMCell(3 =&gt; 5),                     # 190 parameters
 )         # Total: 5 trainable arrays, 190 parameters,
@@ -496,7 +496,7 @@
 julia&gt; Flux.reset!(l);
 
 julia&gt; l(rand(Float32, 3, 10)) |&gt; size # batch size of 10
-(5, 10)</code></pre><div class="admonition is-warning"><header class="admonition-header">Batch size changes</header><div class="admonition-body"><p>Failing to call <code>reset!</code> when the input batch size changes can lead to unexpected behavior. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div></div><p><strong>Note:</strong></p><p><code>LSTMCell</code>s can be constructed directly by specifying the non-linear function, the <code>Wi</code> and <code>Wh</code> internal matrices, a bias vector <code>b</code>, and a learnable initial state <code>state0</code>. The  <code>Wi</code> and <code>Wh</code> matrices do not need to be the same type. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/recurrent.jl#L325-L360">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GRU" href="#Flux.GRU"><code>Flux.GRU</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">GRU(in =&gt; out)</code></pre><p><a href="https://arxiv.org/abs/1406.1078v1">Gated Recurrent Unit</a> layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. This implements the variant proposed in v1 of the referenced paper.</p><p>The integer arguments <code>in</code> and <code>out</code> describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length <code>in</code> or a batch of vectors represented as a <code>in x B</code> matrix and outputs a vector of length <code>out</code> or a batch of vectors of size <code>out x B</code>.</p><p>This constructor is syntactic sugar for <code>Recur(GRUCell(a...))</code>, and so GRUs are stateful. Note that the state shape can change depending on the inputs, and so it is good to <code>reset!</code> the model between inference calls if the batch size changes. See the examples below.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; g = GRU(3 =&gt; 5)
+(5, 10)</code></pre><div class="admonition is-warning"><header class="admonition-header">Batch size changes</header><div class="admonition-body"><p>Failing to call <code>reset!</code> when the input batch size changes can lead to unexpected behavior. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div></div><p><strong>Note:</strong></p><p><code>LSTMCell</code>s can be constructed directly by specifying the non-linear function, the <code>Wi</code> and <code>Wh</code> internal matrices, a bias vector <code>b</code>, and a learnable initial state <code>state0</code>. The  <code>Wi</code> and <code>Wh</code> matrices do not need to be the same type. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/recurrent.jl#L325-L360">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GRU" href="#Flux.GRU"><code>Flux.GRU</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">GRU(in =&gt; out)</code></pre><p><a href="https://arxiv.org/abs/1406.1078v1">Gated Recurrent Unit</a> layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. This implements the variant proposed in v1 of the referenced paper.</p><p>The integer arguments <code>in</code> and <code>out</code> describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length <code>in</code> or a batch of vectors represented as a <code>in x B</code> matrix and outputs a vector of length <code>out</code> or a batch of vectors of size <code>out x B</code>.</p><p>This constructor is syntactic sugar for <code>Recur(GRUCell(a...))</code>, and so GRUs are stateful. Note that the state shape can change depending on the inputs, and so it is good to <code>reset!</code> the model between inference calls if the batch size changes. See the examples below.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; g = GRU(3 =&gt; 5)
 Recur(
   GRUCell(3 =&gt; 5),                      # 140 parameters
 )         # Total: 4 trainable arrays, 140 parameters,
@@ -508,7 +508,7 @@
 julia&gt; Flux.reset!(g);
 
 julia&gt; g(rand(Float32, 3, 10)) |&gt; size # batch size of 10
-(5, 10)</code></pre><div class="admonition is-warning"><header class="admonition-header">Batch size changes</header><div class="admonition-body"><p>Failing to call <code>reset!</code> when the input batch size changes can lead to unexpected behavior. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div></div><p><strong>Note:</strong></p><p><code>GRUCell</code>s can be constructed directly by specifying the non-linear function, the <code>Wi</code> and <code>Wh</code> internal matrices, a bias vector <code>b</code>, and a learnable initial state <code>state0</code>. The  <code>Wi</code> and <code>Wh</code> matrices do not need to be the same type. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/recurrent.jl#L398-L434">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GRUv3" href="#Flux.GRUv3"><code>Flux.GRUv3</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">GRUv3(in =&gt; out)</code></pre><p><a href="https://arxiv.org/abs/1406.1078v3">Gated Recurrent Unit</a> layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. This implements the variant proposed in v3 of the referenced paper.</p><p>The arguments <code>in</code> and <code>out</code> describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length <code>in</code> or a batch of vectors represented as a <code>in x B</code> matrix and outputs a vector of length <code>out</code> or a batch of vectors of size <code>out x B</code>.</p><p>This constructor is syntactic sugar for <code>Recur(GRUv3Cell(a...))</code>, and so GRUv3s are stateful. Note that the state shape can change depending on the inputs, and so it is good to <code>reset!</code> the model between inference calls if the batch size changes. See the examples below.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; g = GRUv3(3 =&gt; 5)
+(5, 10)</code></pre><div class="admonition is-warning"><header class="admonition-header">Batch size changes</header><div class="admonition-body"><p>Failing to call <code>reset!</code> when the input batch size changes can lead to unexpected behavior. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div></div><p><strong>Note:</strong></p><p><code>GRUCell</code>s can be constructed directly by specifying the non-linear function, the <code>Wi</code> and <code>Wh</code> internal matrices, a bias vector <code>b</code>, and a learnable initial state <code>state0</code>. The  <code>Wi</code> and <code>Wh</code> matrices do not need to be the same type. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/recurrent.jl#L398-L434">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GRUv3" href="#Flux.GRUv3"><code>Flux.GRUv3</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">GRUv3(in =&gt; out)</code></pre><p><a href="https://arxiv.org/abs/1406.1078v3">Gated Recurrent Unit</a> layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. This implements the variant proposed in v3 of the referenced paper.</p><p>The arguments <code>in</code> and <code>out</code> describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length <code>in</code> or a batch of vectors represented as a <code>in x B</code> matrix and outputs a vector of length <code>out</code> or a batch of vectors of size <code>out x B</code>.</p><p>This constructor is syntactic sugar for <code>Recur(GRUv3Cell(a...))</code>, and so GRUv3s are stateful. Note that the state shape can change depending on the inputs, and so it is good to <code>reset!</code> the model between inference calls if the batch size changes. See the examples below.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; g = GRUv3(3 =&gt; 5)
 Recur(
   GRUv3Cell(3 =&gt; 5),                    # 140 parameters
 )         # Total: 5 trainable arrays, 140 parameters,
@@ -520,7 +520,7 @@
 julia&gt; Flux.reset!(g);
 
 julia&gt; g(rand(Float32, 3, 10)) |&gt; size # batch size of 10
-(5, 10)</code></pre><div class="admonition is-warning"><header class="admonition-header">Batch size changes</header><div class="admonition-body"><p>Failing to call <code>reset!</code> when the input batch size changes can lead to unexpected behavior. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div></div><p><strong>Note:</strong></p><p><code>GRUv3Cell</code>s can be constructed directly by specifying the non-linear function, the <code>Wi</code>, <code>Wh</code>, and <code>Wh_h</code> internal matrices, a bias vector <code>b</code>, and a learnable initial state <code>state0</code>. The  <code>Wi</code>, <code>Wh</code>, and <code>Wh_h</code> matrices do not need to be the same type. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/recurrent.jl#L468-L504">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="nohighlight hljs">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs:</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; accum(h, x) = (h + x, x)
+(5, 10)</code></pre><div class="admonition is-warning"><header class="admonition-header">Batch size changes</header><div class="admonition-body"><p>Failing to call <code>reset!</code> when the input batch size changes can lead to unexpected behavior. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div></div><p><strong>Note:</strong></p><p><code>GRUv3Cell</code>s can be constructed directly by specifying the non-linear function, the <code>Wi</code>, <code>Wh</code>, and <code>Wh_h</code> internal matrices, a bias vector <code>b</code>, and a learnable initial state <code>state0</code>. The  <code>Wi</code>, <code>Wh</code>, and <code>Wh_h</code> matrices do not need to be the same type. See the example in <a href="#Flux.RNN"><code>RNN</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/recurrent.jl#L468-L504">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="nohighlight hljs">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs:</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; accum(h, x) = (h + x, x)
 accum (generic function with 1 method)
 
 julia&gt; rnn = Flux.Recur(accum, 0)
@@ -571,7 +571,7 @@
 
 julia&gt; rnn.state
 1×1 Matrix{Int64}:
- 60</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/recurrent.jl#L56-L127">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.reset!" href="#Flux.reset!"><code>Flux.reset!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">reset!(rnn)</code></pre><p>Reset the hidden state of a recurrent layer back to its original value.</p><p>Assuming you have a <code>Recur</code> layer <code>rnn</code>, this is roughly equivalent to:</p><pre><code class="nohighlight hljs">rnn.state = hidden(rnn.cell)</code></pre><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; r = Flux.RNNCell(relu, ones(1,1), zeros(1,1), ones(1,1), zeros(1,1));  # users should use the RNN wrapper struct instead
+ 60</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/recurrent.jl#L56-L127">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.reset!" href="#Flux.reset!"><code>Flux.reset!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">reset!(rnn)</code></pre><p>Reset the hidden state of a recurrent layer back to its original value.</p><p>Assuming you have a <code>Recur</code> layer <code>rnn</code>, this is roughly equivalent to:</p><pre><code class="nohighlight hljs">rnn.state = hidden(rnn.cell)</code></pre><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; r = Flux.RNNCell(relu, ones(1,1), zeros(1,1), ones(1,1), zeros(1,1));  # users should use the RNN wrapper struct instead
 
 julia&gt; y = Flux.Recur(r, ones(1,1));
 
@@ -593,7 +593,7 @@
 
 julia&gt; y.state
 1×1 Matrix{Float64}:
- 0.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/recurrent.jl#L142-L177">source</a></section></article><h2 id="Normalisation-and-Regularisation"><a class="docs-heading-anchor" href="#Normalisation-and-Regularisation">Normalisation &amp; Regularisation</a><a id="Normalisation-and-Regularisation-1"></a><a class="docs-heading-anchor-permalink" href="#Normalisation-and-Regularisation" title="Permalink"></a></h2><p>These layers don&#39;t affect the structure of the network but may improve training times or reduce overfitting. Some of them contain trainable parameters, while others do not.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.BatchNorm" href="#Flux.BatchNorm"><code>Flux.BatchNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">BatchNorm(channels::Integer, λ=identity;
+ 0.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/recurrent.jl#L142-L177">source</a></section></article><h2 id="Normalisation-and-Regularisation"><a class="docs-heading-anchor" href="#Normalisation-and-Regularisation">Normalisation &amp; Regularisation</a><a id="Normalisation-and-Regularisation-1"></a><a class="docs-heading-anchor-permalink" href="#Normalisation-and-Regularisation" title="Permalink"></a></h2><p>These layers don&#39;t affect the structure of the network but may improve training times or reduce overfitting. Some of them contain trainable parameters, while others do not.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.BatchNorm" href="#Flux.BatchNorm"><code>Flux.BatchNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">BatchNorm(channels::Integer, λ=identity;
           initβ=zeros32, initγ=ones32,
           affine=true, track_stats=true, active=nothing,
           eps=1f-5, momentum= 0.1f0)</code></pre><p><a href="https://arxiv.org/abs/1502.03167">Batch Normalization</a> layer. <code>channels</code> should be the size of the channel dimension in your data (see below).</p><p>Given an array with <code>N</code> dimensions, call the <code>N-1</code>th the channel dimension. For a batch of feature vectors this is just the data dimension, for <code>WHCN</code> images it&#39;s the usual channel dimension.</p><p><code>BatchNorm</code> computes the mean and variance for each <code>D_1×...×D_{N-2}×1×D_N</code> input slice and normalises the input accordingly.</p><p>If <code>affine=true</code>, it also applies  a shift and a rescale to the input through to learnable per-channel bias β and scale γ parameters.</p><p>After normalisation, elementwise activation <code>λ</code> is applied.</p><p>If <code>track_stats=true</code>, accumulates mean and var statistics in training phase that will be used to renormalize the input in test phase.</p><p>Use <a href="#Flux.testmode!-Tuple{Any}"><code>testmode!</code></a> during inference.</p><p><strong>Examples</strong></p><pre><code class="language-julia hljs">julia&gt; using Statistics
@@ -605,7 +605,7 @@
 julia&gt; Flux.trainmode!(m);
 
 julia&gt; isapprox(std(m(xs)), 1, atol=0.1) &amp;&amp; std(xs) != std(m(xs))
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/normalise.jl#L274-L313">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Dropout" href="#Flux.Dropout"><code>Flux.Dropout</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Dropout(p; [dims, rng, active])</code></pre><p>Layer implementing <a href="https://arxiv.org/abs/1207.0580">dropout</a> with the given probability. This is used as a regularisation, i.e. to reduce overfitting.</p><p>While training, it sets each input to <code>0</code> (with probability <code>p</code>) or else scales it by <code>1 / (1 - p)</code>, using the <a href="../nnlib/#NNlib.dropout"><code>NNlib.dropout</code></a> function. While testing, it has no effect.</p><p>By default the mode will switch automatically, but it can also be controlled manually via <a href="#Flux.testmode!-Tuple{Any}"><code>Flux.testmode!</code></a>, or by passing keyword <code>active=true</code> for training mode.</p><p>By default every input is treated independently. With the <code>dims</code> keyword, instead it takes a random choice only along that dimension. For example <code>Dropout(p; dims = 3)</code> will randomly zero out entire channels on WHCN input (also called 2D dropout).</p><p>Keyword <code>rng</code> lets you specify a custom random number generator. (Only supported on the CPU.)</p><p><strong>Examples</strong></p><pre><code class="language-julia hljs">julia&gt; m = Chain(Dense(ones(3,2)), Dropout(0.4))
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/normalise.jl#L274-L313">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Dropout" href="#Flux.Dropout"><code>Flux.Dropout</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Dropout(p; [dims, rng, active])</code></pre><p>Layer implementing <a href="https://arxiv.org/abs/1207.0580">dropout</a> with the given probability. This is used as a regularisation, i.e. to reduce overfitting.</p><p>While training, it sets each input to <code>0</code> (with probability <code>p</code>) or else scales it by <code>1 / (1 - p)</code>, using the <a href="../nnlib/#NNlib.dropout"><code>NNlib.dropout</code></a> function. While testing, it has no effect.</p><p>By default the mode will switch automatically, but it can also be controlled manually via <a href="#Flux.testmode!-Tuple{Any}"><code>Flux.testmode!</code></a>, or by passing keyword <code>active=true</code> for training mode.</p><p>By default every input is treated independently. With the <code>dims</code> keyword, instead it takes a random choice only along that dimension. For example <code>Dropout(p; dims = 3)</code> will randomly zero out entire channels on WHCN input (also called 2D dropout).</p><p>Keyword <code>rng</code> lets you specify a custom random number generator. (Only supported on the CPU.)</p><p><strong>Examples</strong></p><pre><code class="language-julia hljs">julia&gt; m = Chain(Dense(ones(3,2)), Dropout(0.4))
 Chain(
   Dense(2 =&gt; 3),                        # 9 parameters
   Dropout(0.4),
@@ -637,7 +637,7 @@
 1.9989999999999961
 
 julia&gt; mean(iszero, y)  # is about 0.4
-0.4003</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/normalise.jl#L9-L67">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.AlphaDropout" href="#Flux.AlphaDropout"><code>Flux.AlphaDropout</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">AlphaDropout(p; [rng, active])</code></pre><p>A dropout layer. Used in <a href="https://arxiv.org/abs/1706.02515">Self-Normalizing Neural Networks</a>. The AlphaDropout layer ensures that mean and variance of activations remain the same as before.</p><p>Does nothing to the input once <a href="#Flux.testmode!-Tuple{Any}"><code>testmode!</code></a> is true.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
+0.4003</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/normalise.jl#L9-L67">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.AlphaDropout" href="#Flux.AlphaDropout"><code>Flux.AlphaDropout</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">AlphaDropout(p; [rng, active])</code></pre><p>A dropout layer. Used in <a href="https://arxiv.org/abs/1706.02515">Self-Normalizing Neural Networks</a>. The AlphaDropout layer ensures that mean and variance of activations remain the same as before.</p><p>Does nothing to the input once <a href="#Flux.testmode!-Tuple{Any}"><code>testmode!</code></a> is true.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
 
 julia&gt; x = randn32(1000,1);
 
@@ -648,7 +648,7 @@
 julia&gt; y = m(x);
 
 julia&gt; isapprox(std(x), std(y), atol=0.2)
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/normalise.jl#L95-L120">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.LayerNorm" href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">LayerNorm(size..., λ=identity; affine=true, eps=1f-5)</code></pre><p>A <a href="https://arxiv.org/abs/1607.06450">normalisation layer</a> designed to be used with recurrent hidden states. The argument <code>size</code> should be an integer or a tuple of integers.</p><p>In the forward pass, the layer normalises the mean and standard deviation of the input, then applies the elementwise activation <code>λ</code>. The input is normalised along the first <code>length(size)</code> dimensions for tuple <code>size</code>, and along the first dimension for integer <code>size</code>. The input is expected to have first dimensions&#39; size equal to <code>size</code>.</p><p>If <code>affine=true</code>, it also applies a learnable shift and rescaling using the <a href="#Flux.Scale"><code>Scale</code></a> layer.</p><p>See also <a href="#Flux.BatchNorm"><code>BatchNorm</code></a>, <a href="#Flux.InstanceNorm"><code>InstanceNorm</code></a>, <a href="#Flux.GroupNorm"><code>GroupNorm</code></a>, and <a href="#Flux.normalise"><code>normalise</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/normalise.jl#L95-L120">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.LayerNorm" href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">LayerNorm(size..., λ=identity; affine=true, eps=1f-5)</code></pre><p>A <a href="https://arxiv.org/abs/1607.06450">normalisation layer</a> designed to be used with recurrent hidden states. The argument <code>size</code> should be an integer or a tuple of integers.</p><p>In the forward pass, the layer normalises the mean and standard deviation of the input, then applies the elementwise activation <code>λ</code>. The input is normalised along the first <code>length(size)</code> dimensions for tuple <code>size</code>, and along the first dimension for integer <code>size</code>. The input is expected to have first dimensions&#39; size equal to <code>size</code>.</p><p>If <code>affine=true</code>, it also applies a learnable shift and rescaling using the <a href="#Flux.Scale"><code>Scale</code></a> layer.</p><p>See also <a href="#Flux.BatchNorm"><code>BatchNorm</code></a>, <a href="#Flux.InstanceNorm"><code>InstanceNorm</code></a>, <a href="#Flux.GroupNorm"><code>GroupNorm</code></a>, and <a href="#Flux.normalise"><code>normalise</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
 
 julia&gt; xs = rand(3, 3, 3, 2);  # a batch of 2 images, each having 3 channels
 
@@ -657,7 +657,7 @@
 julia&gt; y = m(xs);
 
 julia&gt; isapprox(std(y, dims=1:3), ones(1, 1, 1, 2), atol=0.1) &amp;&amp; std(y, dims=1:3) != std(xs, dims=1:3)
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/normalise.jl#L154-L185">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.InstanceNorm" href="#Flux.InstanceNorm"><code>Flux.InstanceNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">InstanceNorm(channels::Integer, λ=identity;
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/normalise.jl#L154-L185">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.InstanceNorm" href="#Flux.InstanceNorm"><code>Flux.InstanceNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">InstanceNorm(channels::Integer, λ=identity;
              initβ=zeros32, initγ=ones32,
              affine=false, track_stats=false,
              eps=1f-5, momentum=0.1f0)</code></pre><p><a href="https://arxiv.org/abs/1607.08022">Instance Normalization</a> layer. <code>channels</code> should be the size of the channel dimension in your data (see below).</p><p>Given an array with <code>N &gt; 2</code> dimensions, call the <code>N-1</code>th the channel dimension. For <code>WHCN</code> images it&#39;s the usual channel dimension.</p><p><code>InstanceNorm</code> computes the mean and variance for each <code>D_1×...×D_{N-2}×1×1</code> input slice and normalises the input accordingly.</p><p>If <code>affine=true</code>, it also applies  a shift and a rescale to the input through to learnable per-channel bias <code>β</code> and scale <code>γ</code> parameters.</p><p>If <code>track_stats=true</code>, accumulates mean and var statistics in training phase that will be used to renormalize the input in test phase.</p><p><strong>Warning</strong>: the defaults for <code>affine</code> and <code>track_stats</code> used to be <code>true</code> in previous Flux versions (&lt; v0.12).</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
@@ -669,7 +669,7 @@
 julia&gt; y = m(xs);
 
 julia&gt; isapprox(std(y, dims=1:2), ones(1, 1, 3, 2), atol=0.2) &amp;&amp; std(y, dims=1:2) != std(xs, dims=1:2)
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/normalise.jl#L369-L406">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GroupNorm" href="#Flux.GroupNorm"><code>Flux.GroupNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">GroupNorm(channels::Int, G::Int, λ = identity;
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/normalise.jl#L369-L406">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.GroupNorm" href="#Flux.GroupNorm"><code>Flux.GroupNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">GroupNorm(channels::Int, G::Int, λ = identity;
           initβ = zeros32,
           initγ = ones32,
           affine = true,
@@ -686,7 +686,7 @@
 true
 
 julia&gt; isapprox(std(y[:, :, 3:4, 2]), 1, atol=0.1) &amp;&amp; std(xs[:, :, 3:4, 2]) != std(y[:, :, 3:4, 2])
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/normalise.jl#L461-L502">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.normalise" href="#Flux.normalise"><code>Flux.normalise</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">normalise(x; dims=ndims(x), eps=1e-5)</code></pre><p>Normalise <code>x</code> to mean 0 and standard deviation 1 across the dimension(s) given by <code>dims</code>. Per default, <code>dims</code> is the last dimension.  <code>eps</code> is a small term added to the denominator for numerical stability.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/normalise.jl#L461-L502">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.normalise" href="#Flux.normalise"><code>Flux.normalise</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">normalise(x; dims=ndims(x), eps=1e-5)</code></pre><p>Normalise <code>x</code> to mean 0 and standard deviation 1 across the dimension(s) given by <code>dims</code>. Per default, <code>dims</code> is the last dimension.  <code>eps</code> is a small term added to the denominator for numerical stability.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
 
 julia&gt; x = [90, 100, 110, 130, 70];
 
@@ -709,7 +709,7 @@
 julia&gt; y = Flux.normalise(x, dims=1);
 
 julia&gt; isapprox(std(y; dims=1, corrected=false), ones(1, 10), atol=1e-5)
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/layers/stateless.jl#L2-L36">source</a></section></article><h3 id="Test-vs.-Train"><a class="docs-heading-anchor" href="#Test-vs.-Train">Test vs. Train</a><a id="Test-vs.-Train-1"></a><a class="docs-heading-anchor-permalink" href="#Test-vs.-Train" title="Permalink"></a></h3><p>Several normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. </p><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>This automatic train/test detection works best with Zygote, the default automatic differentiation package. It may not work with other packages such as Tracker, Yota, or ForwardDiff.</p></div></div><p>The functions <code>Flux.trainmode!</code> and <code>Flux.testmode!</code> let you manually specify which behaviour you want. When called on a model, they will place all layers within the model into the specified mode.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.testmode!-Tuple{Any}" href="#Flux.testmode!-Tuple{Any}"><code>Flux.testmode!</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">testmode!(model, [mode]) -&gt; model</code></pre><p>Set a layer, or all layers in a model, to test mode. This disables the effect of <a href="../nnlib/#Dropout"><code>Dropout</code></a> and some other regularisation layers.</p><p>If you manually set a model into test mode, you need to manually place it back into train mode during training phase, using <a href="#Flux.trainmode!"><code>trainmode!</code></a>.</p><p>There is an optional second argument, which takes a symbol <code>:auto</code> to reset all layers back to the default automatic mode.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; d = Dropout(0.3)
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/layers/stateless.jl#L2-L36">source</a></section></article><h3 id="Test-vs.-Train"><a class="docs-heading-anchor" href="#Test-vs.-Train">Test vs. Train</a><a id="Test-vs.-Train-1"></a><a class="docs-heading-anchor-permalink" href="#Test-vs.-Train" title="Permalink"></a></h3><p>Several normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. </p><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>This automatic train/test detection works best with Zygote, the default automatic differentiation package. It may not work with other packages such as Tracker, Yota, or ForwardDiff.</p></div></div><p>The functions <code>Flux.trainmode!</code> and <code>Flux.testmode!</code> let you manually specify which behaviour you want. When called on a model, they will place all layers within the model into the specified mode.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.testmode!-Tuple{Any}" href="#Flux.testmode!-Tuple{Any}"><code>Flux.testmode!</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">testmode!(model, [mode]) -&gt; model</code></pre><p>Set a layer, or all layers in a model, to test mode. This disables the effect of <a href="../nnlib/#Dropout"><code>Dropout</code></a> and some other regularisation layers.</p><p>If you manually set a model into test mode, you need to manually place it back into train mode during training phase, using <a href="#Flux.trainmode!"><code>trainmode!</code></a>.</p><p>There is an optional second argument, which takes a symbol <code>:auto</code> to reset all layers back to the default automatic mode.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; d = Dropout(0.3)
 Dropout(0.3)
 
 julia&gt; testmode!(d)   # dropout is now always disabled
@@ -719,4 +719,4 @@
 Dropout(0.3, active=true)
 
 julia&gt; testmode!(d, :auto)  # back to default
-Dropout(0.3)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L7-L35">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.testmode!-Tuple{Any, Any}" href="#Flux.testmode!-Tuple{Any, Any}"><code>Flux.testmode!</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">testmode!(model, inactive)</code></pre><p>This two-argument method is largely internal. It recurses into the <code>model</code>, and until a method like <code>testmode!(d::Dropout, inactive)</code> alters the activity of a layer. Custom layers can support manual <code>testmode!</code> / <code>trainmode!</code> switching by defining such a method.</p><p>Possible values of  <code>inactive</code> are:</p><ul><li><code>true</code> for testing, i.e. <code>active=false</code></li><li><code>false</code> for training, same as <a href="#Flux.trainmode!"><code>trainmode!</code></a><code>(m)</code></li><li><code>:auto</code> or <code>nothing</code> for Flux to detect training automatically.</li></ul><div class="admonition is-compat"><header class="admonition-header">Compat</header><div class="admonition-body"><p>This method may be removed in a future breaking change, to separate the user-facing <code>testmode!</code> from the internal recursion.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L48-L64">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.trainmode!" href="#Flux.trainmode!"><code>Flux.trainmode!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">trainmode!(model) -&gt; model</code></pre><p>Set a layer, or all layers in a model, to training mode. Opposite to <a href="#Flux.testmode!-Tuple{Any}"><code>testmode!</code></a>, see further details there.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L38-L43">source</a></section><section><div><pre><code class="language-julia hljs">trainmode!(m, active)</code></pre><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>This two-argument method is deprecated.</p></div></div><p>Possible values of  <code>active</code> are:</p><ul><li><code>true</code> for training, or </li><li><code>false</code> for testing, same as <a href="#Flux.testmode!-Tuple{Any}"><code>testmode!</code></a><code>(m)</code></li><li><code>:auto</code> or <code>nothing</code> for Flux to detect training automatically.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/deprecations.jl#L182-L193">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../../ecosystem/">« Ecosystem</a><a class="docs-footer-nextpage" href="../activation/">Activation Functions »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+Dropout(0.3)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L7-L35">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.testmode!-Tuple{Any, Any}" href="#Flux.testmode!-Tuple{Any, Any}"><code>Flux.testmode!</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">testmode!(model, inactive)</code></pre><p>This two-argument method is largely internal. It recurses into the <code>model</code>, and until a method like <code>testmode!(d::Dropout, inactive)</code> alters the activity of a layer. Custom layers can support manual <code>testmode!</code> / <code>trainmode!</code> switching by defining such a method.</p><p>Possible values of  <code>inactive</code> are:</p><ul><li><code>true</code> for testing, i.e. <code>active=false</code></li><li><code>false</code> for training, same as <a href="#Flux.trainmode!"><code>trainmode!</code></a><code>(m)</code></li><li><code>:auto</code> or <code>nothing</code> for Flux to detect training automatically.</li></ul><div class="admonition is-compat"><header class="admonition-header">Compat</header><div class="admonition-body"><p>This method may be removed in a future breaking change, to separate the user-facing <code>testmode!</code> from the internal recursion.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L48-L64">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.trainmode!" href="#Flux.trainmode!"><code>Flux.trainmode!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">trainmode!(model) -&gt; model</code></pre><p>Set a layer, or all layers in a model, to training mode. Opposite to <a href="#Flux.testmode!-Tuple{Any}"><code>testmode!</code></a>, see further details there.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L38-L43">source</a></section><section><div><pre><code class="language-julia hljs">trainmode!(m, active)</code></pre><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>This two-argument method is deprecated.</p></div></div><p>Possible values of  <code>active</code> are:</p><ul><li><code>true</code> for training, or </li><li><code>false</code> for testing, same as <a href="#Flux.testmode!-Tuple{Any}"><code>testmode!</code></a><code>(m)</code></li><li><code>:auto</code> or <code>nothing</code> for Flux to detect training automatically.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/deprecations.jl#L182-L193">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../../ecosystem/">« Ecosystem</a><a class="docs-footer-nextpage" href="../activation/">Activation Functions »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/models/losses/index.html b/dev/reference/models/losses/index.html
index 9527e9c9cd..cdfe68e2eb 100644
--- a/dev/reference/models/losses/index.html
+++ b/dev/reference/models/losses/index.html
@@ -10,16 +10,16 @@
 loss(ŷ, y, agg=identity)           # no aggregation.</code></pre><h3 id="Function-listing"><a class="docs-heading-anchor" href="#Function-listing">Function listing</a><a id="Function-listing-1"></a><a class="docs-heading-anchor-permalink" href="#Function-listing" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.mae" href="#Flux.Losses.mae"><code>Flux.Losses.mae</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">mae(ŷ, y; agg = mean)</code></pre><p>Return the loss corresponding to mean absolute error:</p><pre><code class="nohighlight hljs">agg(abs.(ŷ .- y))</code></pre><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_model = [1.1, 1.9, 3.1];
 
 julia&gt; Flux.mae(y_model, 1:3)
-0.10000000000000009</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L6-L20">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.mse" href="#Flux.Losses.mse"><code>Flux.Losses.mse</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">mse(ŷ, y; agg = mean)</code></pre><p>Return the loss corresponding to mean square error:</p><pre><code class="nohighlight hljs">agg((ŷ .- y) .^ 2)</code></pre><p>See also: <a href="#Flux.Losses.mae"><code>mae</code></a>, <a href="#Flux.Losses.msle"><code>msle</code></a>, <a href="#Flux.Losses.crossentropy"><code>crossentropy</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_model = [1.1, 1.9, 3.1];
+0.10000000000000009</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L6-L20">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.mse" href="#Flux.Losses.mse"><code>Flux.Losses.mse</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">mse(ŷ, y; agg = mean)</code></pre><p>Return the loss corresponding to mean square error:</p><pre><code class="nohighlight hljs">agg((ŷ .- y) .^ 2)</code></pre><p>See also: <a href="#Flux.Losses.mae"><code>mae</code></a>, <a href="#Flux.Losses.msle"><code>msle</code></a>, <a href="#Flux.Losses.crossentropy"><code>crossentropy</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_model = [1.1, 1.9, 3.1];
 
 julia&gt; y_true = 1:3;
 
 julia&gt; Flux.mse(y_model, y_true)
-0.010000000000000018</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L26-L44">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.msle" href="#Flux.Losses.msle"><code>Flux.Losses.msle</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">msle(ŷ, y; agg = mean, eps = eps(eltype(ŷ)))</code></pre><p>The loss corresponding to mean squared logarithmic errors, calculated as</p><pre><code class="nohighlight hljs">agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)) .^ 2)</code></pre><p>The <code>ϵ == eps</code> term provides numerical stability. Penalizes an under-estimation more than an over-estimatation.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; Flux.msle(Float32[1.1, 2.2, 3.3], 1:3)
+0.010000000000000018</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L26-L44">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.msle" href="#Flux.Losses.msle"><code>Flux.Losses.msle</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">msle(ŷ, y; agg = mean, eps = eps(eltype(ŷ)))</code></pre><p>The loss corresponding to mean squared logarithmic errors, calculated as</p><pre><code class="nohighlight hljs">agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)) .^ 2)</code></pre><p>The <code>ϵ == eps</code> term provides numerical stability. Penalizes an under-estimation more than an over-estimatation.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; Flux.msle(Float32[1.1, 2.2, 3.3], 1:3)
 0.009084041f0
 
 julia&gt; Flux.msle(Float32[0.9, 1.8, 2.7], 1:3)
-0.011100831f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L50-L68">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.huber_loss" href="#Flux.Losses.huber_loss"><code>Flux.Losses.huber_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">huber_loss(ŷ, y; delta = 1, agg = mean)</code></pre><p>Return the mean of the <a href="https://en.wikipedia.org/wiki/Huber_loss">Huber loss</a> given the prediction <code>ŷ</code> and true values <code>y</code>.</p><pre><code class="nohighlight hljs">             | 0.5 * |ŷ - y|^2,            for |ŷ - y| &lt;= δ
+0.011100831f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L50-L68">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.huber_loss" href="#Flux.Losses.huber_loss"><code>Flux.Losses.huber_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">huber_loss(ŷ, y; delta = 1, agg = mean)</code></pre><p>Return the mean of the <a href="https://en.wikipedia.org/wiki/Huber_loss">Huber loss</a> given the prediction <code>ŷ</code> and true values <code>y</code>.</p><pre><code class="nohighlight hljs">             | 0.5 * |ŷ - y|^2,            for |ŷ - y| &lt;= δ
 Huber loss = |
              |  δ * (|ŷ - y| - 0.5 * δ), otherwise</code></pre><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; ŷ = [1.1, 2.1, 3.1];
 
@@ -27,7 +27,7 @@
 0.005000000000000009
 
 julia&gt; Flux.huber_loss(ŷ, 1:3, delta=0.05)  # changes behaviour as |ŷ - y| &gt; δ
-0.003750000000000005</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L82-L103">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.label_smoothing" href="#Flux.Losses.label_smoothing"><code>Flux.Losses.label_smoothing</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">label_smoothing(y::Union{Number, AbstractArray}, α; dims::Int=1)</code></pre><p>Returns smoothed labels, meaning the confidence on label values are relaxed.</p><p>When <code>y</code> is given as one-hot vector or batch of one-hot, its calculated as</p><pre><code class="nohighlight hljs">y .* (1 - α) .+ α / size(y, dims)</code></pre><p>when <code>y</code> is given as a number or batch of numbers for binary classification, its calculated as</p><pre><code class="nohighlight hljs">y .* (1 - α) .+ α / 2</code></pre><p>in which case the labels are squeezed towards <code>0.5</code>.</p><p>α is a number in interval (0, 1) called the smoothing factor. Higher the value of α larger the smoothing of <code>y</code>.</p><p><code>dims</code> denotes the one-hot dimension, unless <code>dims=0</code> which denotes the application of label smoothing to binary distributions encoded in a single number.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y = Flux.onehotbatch([1, 1, 1, 0, 1, 0], 0:1)
+0.003750000000000005</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L82-L103">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.label_smoothing" href="#Flux.Losses.label_smoothing"><code>Flux.Losses.label_smoothing</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">label_smoothing(y::Union{Number, AbstractArray}, α; dims::Int=1)</code></pre><p>Returns smoothed labels, meaning the confidence on label values are relaxed.</p><p>When <code>y</code> is given as one-hot vector or batch of one-hot, its calculated as</p><pre><code class="nohighlight hljs">y .* (1 - α) .+ α / size(y, dims)</code></pre><p>when <code>y</code> is given as a number or batch of numbers for binary classification, its calculated as</p><pre><code class="nohighlight hljs">y .* (1 - α) .+ α / 2</code></pre><p>in which case the labels are squeezed towards <code>0.5</code>.</p><p>α is a number in interval (0, 1) called the smoothing factor. Higher the value of α larger the smoothing of <code>y</code>.</p><p><code>dims</code> denotes the one-hot dimension, unless <code>dims=0</code> which denotes the application of label smoothing to binary distributions encoded in a single number.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y = Flux.onehotbatch([1, 1, 1, 0, 1, 0], 0:1)
 2×6 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
  ⋅  ⋅  ⋅  1  ⋅  1
  1  1  1  ⋅  1  ⋅
@@ -51,7 +51,7 @@
 true
 
 julia&gt; Flux.crossentropy(y_dis, y) &gt; Flux.crossentropy(y_dis, y_smoothed)
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L112-L162">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.crossentropy" href="#Flux.Losses.crossentropy"><code>Flux.Losses.crossentropy</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">crossentropy(ŷ, y; dims = 1, eps = eps(eltype(ŷ)), agg = mean)</code></pre><p>Return the cross entropy between the given probability distributions; calculated as</p><pre><code class="nohighlight hljs">agg(-sum(y .* log.(ŷ .+ ϵ); dims))</code></pre><p>Cross entropy is typically used as a loss in multi-class classification, in which case the labels <code>y</code> are given in a one-hot format. <code>dims</code> specifies the dimension (or the dimensions) containing the class probabilities. The prediction <code>ŷ</code> is supposed to sum to one across <code>dims</code>, as would be the case with the output of a <a href="../nnlib/#Softmax">softmax</a> operation.</p><p>For numerical stability, it is recommended to use <a href="#Flux.Losses.logitcrossentropy"><code>logitcrossentropy</code></a> rather than <code>softmax</code> followed by <code>crossentropy</code> .</p><p>Use <a href="#Flux.Losses.label_smoothing"><code>label_smoothing</code></a> to smooth the true labels as preprocessing before computing the loss.</p><p>See also: <a href="#Flux.Losses.logitcrossentropy"><code>logitcrossentropy</code></a>, <a href="#Flux.Losses.binarycrossentropy"><code>binarycrossentropy</code></a>, <a href="#Flux.Losses.logitbinarycrossentropy"><code>logitbinarycrossentropy</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_label = Flux.onehotbatch([0, 1, 2, 1, 0], 0:2)
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L112-L162">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.crossentropy" href="#Flux.Losses.crossentropy"><code>Flux.Losses.crossentropy</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">crossentropy(ŷ, y; dims = 1, eps = eps(eltype(ŷ)), agg = mean)</code></pre><p>Return the cross entropy between the given probability distributions; calculated as</p><pre><code class="nohighlight hljs">agg(-sum(y .* log.(ŷ .+ ϵ); dims))</code></pre><p>Cross entropy is typically used as a loss in multi-class classification, in which case the labels <code>y</code> are given in a one-hot format. <code>dims</code> specifies the dimension (or the dimensions) containing the class probabilities. The prediction <code>ŷ</code> is supposed to sum to one across <code>dims</code>, as would be the case with the output of a <a href="../nnlib/#Softmax">softmax</a> operation.</p><p>For numerical stability, it is recommended to use <a href="#Flux.Losses.logitcrossentropy"><code>logitcrossentropy</code></a> rather than <code>softmax</code> followed by <code>crossentropy</code> .</p><p>Use <a href="#Flux.Losses.label_smoothing"><code>label_smoothing</code></a> to smooth the true labels as preprocessing before computing the loss.</p><p>See also: <a href="#Flux.Losses.logitcrossentropy"><code>logitcrossentropy</code></a>, <a href="#Flux.Losses.binarycrossentropy"><code>binarycrossentropy</code></a>, <a href="#Flux.Losses.logitbinarycrossentropy"><code>logitbinarycrossentropy</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_label = Flux.onehotbatch([0, 1, 2, 1, 0], 0:2)
 3×5 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
  1  ⋅  ⋅  ⋅  1
  ⋅  1  ⋅  1  ⋅
@@ -80,7 +80,7 @@
  0.05  0.05  0.9   0.05  0.05
 
 julia&gt; Flux.crossentropy(y_model, y_smooth)
-1.5776052f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L177-L232">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.logitcrossentropy" href="#Flux.Losses.logitcrossentropy"><code>Flux.Losses.logitcrossentropy</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logitcrossentropy(ŷ, y; dims = 1, agg = mean)</code></pre><p>Return the cross entropy calculated by</p><pre><code class="nohighlight hljs">agg(-sum(y .* logsoftmax(ŷ; dims); dims))</code></pre><p>This is mathematically equivalent to <code>crossentropy(softmax(ŷ), y)</code>, but is more numerically stable than using functions <a href="#Flux.Losses.crossentropy"><code>crossentropy</code></a> and <a href="../nnlib/#Softmax">softmax</a> separately.</p><p>See also: <a href="#Flux.Losses.binarycrossentropy"><code>binarycrossentropy</code></a>, <a href="#Flux.Losses.logitbinarycrossentropy"><code>logitbinarycrossentropy</code></a>, <a href="#Flux.Losses.label_smoothing"><code>label_smoothing</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_label = Flux.onehotbatch(collect(&quot;abcabaa&quot;), &#39;a&#39;:&#39;c&#39;)
+1.5776052f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L177-L232">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.logitcrossentropy" href="#Flux.Losses.logitcrossentropy"><code>Flux.Losses.logitcrossentropy</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logitcrossentropy(ŷ, y; dims = 1, agg = mean)</code></pre><p>Return the cross entropy calculated by</p><pre><code class="nohighlight hljs">agg(-sum(y .* logsoftmax(ŷ; dims); dims))</code></pre><p>This is mathematically equivalent to <code>crossentropy(softmax(ŷ), y)</code>, but is more numerically stable than using functions <a href="#Flux.Losses.crossentropy"><code>crossentropy</code></a> and <a href="../nnlib/#Softmax">softmax</a> separately.</p><p>See also: <a href="#Flux.Losses.binarycrossentropy"><code>binarycrossentropy</code></a>, <a href="#Flux.Losses.logitbinarycrossentropy"><code>logitbinarycrossentropy</code></a>, <a href="#Flux.Losses.label_smoothing"><code>label_smoothing</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_label = Flux.onehotbatch(collect(&quot;abcabaa&quot;), &#39;a&#39;:&#39;c&#39;)
 3×7 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
  1  ⋅  ⋅  1  ⋅  1  1
  ⋅  1  ⋅  ⋅  1  ⋅  ⋅
@@ -96,7 +96,7 @@
 1.5791205f0
 
 julia&gt; Flux.crossentropy(softmax(y_model), y_label)
-1.5791197f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L239-L272">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.binarycrossentropy" href="#Flux.Losses.binarycrossentropy"><code>Flux.Losses.binarycrossentropy</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">binarycrossentropy(ŷ, y; agg = mean, eps = eps(eltype(ŷ)))</code></pre><p>Return the binary cross-entropy loss, computed as</p><pre><code class="nohighlight hljs">agg(@.(-y * log(ŷ + ϵ) - (1 - y) * log(1 - ŷ + ϵ)))</code></pre><p>Where typically, the prediction <code>ŷ</code> is given by the output of a <a href="../activation/#man-activations">sigmoid</a> activation. The <code>ϵ == eps</code> term is included to avoid infinity. Using <a href="#Flux.Losses.logitbinarycrossentropy"><code>logitbinarycrossentropy</code></a> is recomended over <code>binarycrossentropy</code> for numerical stability.</p><p>Use <a href="#Flux.Losses.label_smoothing"><code>label_smoothing</code></a> to smooth the <code>y</code> value as preprocessing before computing the loss.</p><p>See also: <a href="#Flux.Losses.crossentropy"><code>crossentropy</code></a>, <a href="#Flux.Losses.logitcrossentropy"><code>logitcrossentropy</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_bin = Bool[1,0,1]
+1.5791197f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L239-L272">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.binarycrossentropy" href="#Flux.Losses.binarycrossentropy"><code>Flux.Losses.binarycrossentropy</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">binarycrossentropy(ŷ, y; agg = mean, eps = eps(eltype(ŷ)))</code></pre><p>Return the binary cross-entropy loss, computed as</p><pre><code class="nohighlight hljs">agg(@.(-y * log(ŷ + ϵ) - (1 - y) * log(1 - ŷ + ϵ)))</code></pre><p>Where typically, the prediction <code>ŷ</code> is given by the output of a <a href="../activation/#man-activations">sigmoid</a> activation. The <code>ϵ == eps</code> term is included to avoid infinity. Using <a href="#Flux.Losses.logitbinarycrossentropy"><code>logitbinarycrossentropy</code></a> is recomended over <code>binarycrossentropy</code> for numerical stability.</p><p>Use <a href="#Flux.Losses.label_smoothing"><code>label_smoothing</code></a> to smooth the <code>y</code> value as preprocessing before computing the loss.</p><p>See also: <a href="#Flux.Losses.crossentropy"><code>crossentropy</code></a>, <a href="#Flux.Losses.logitcrossentropy"><code>logitcrossentropy</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_bin = Bool[1,0,1]
 3-element Vector{Bool}:
  1
  0
@@ -119,7 +119,7 @@
  1  ⋅  1
 
 julia&gt; Flux.crossentropy(y_prob, y_hot)
-0.43989f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L278-L321">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.logitbinarycrossentropy" href="#Flux.Losses.logitbinarycrossentropy"><code>Flux.Losses.logitbinarycrossentropy</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logitbinarycrossentropy(ŷ, y; agg = mean)</code></pre><p>Mathematically equivalent to <a href="#Flux.Losses.binarycrossentropy"><code>binarycrossentropy(σ(ŷ), y)</code></a> but is more numerically stable.</p><p>See also: <a href="#Flux.Losses.crossentropy"><code>crossentropy</code></a>, <a href="#Flux.Losses.logitcrossentropy"><code>logitcrossentropy</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_bin = Bool[1,0,1];
+0.43989f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L278-L321">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.logitbinarycrossentropy" href="#Flux.Losses.logitbinarycrossentropy"><code>Flux.Losses.logitbinarycrossentropy</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logitbinarycrossentropy(ŷ, y; agg = mean)</code></pre><p>Mathematically equivalent to <a href="#Flux.Losses.binarycrossentropy"><code>binarycrossentropy(σ(ŷ), y)</code></a> but is more numerically stable.</p><p>See also: <a href="#Flux.Losses.crossentropy"><code>crossentropy</code></a>, <a href="#Flux.Losses.logitcrossentropy"><code>logitcrossentropy</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_bin = Bool[1,0,1];
 
 julia&gt; y_model = Float32[2, -1, pi]
 3-element Vector{Float32}:
@@ -131,7 +131,7 @@
 0.160832f0
 
 julia&gt; Flux.binarycrossentropy(sigmoid.(y_model), y_bin)
-0.16083185f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L328-L352">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.kldivergence" href="#Flux.Losses.kldivergence"><code>Flux.Losses.kldivergence</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">kldivergence(ŷ, y; agg = mean, eps = eps(eltype(ŷ)))</code></pre><p>Return the <a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback-Leibler divergence</a> between the given probability distributions.</p><p>The KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative, and zero only when both the distributions are equal.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; p1 = [1 0; 0 1]
+0.16083185f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L328-L352">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.kldivergence" href="#Flux.Losses.kldivergence"><code>Flux.Losses.kldivergence</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">kldivergence(ŷ, y; agg = mean, eps = eps(eltype(ŷ)))</code></pre><p>Return the <a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback-Leibler divergence</a> between the given probability distributions.</p><p>The KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative, and zero only when both the distributions are equal.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; p1 = [1 0; 0 1]
 2×2 Matrix{Int64}:
  1  0
  0  1
@@ -151,10 +151,10 @@
 0.0
 
 julia&gt; Flux.kldivergence(p1, p2; eps = 0)  # about 17.3 with the regulator
-Inf</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L358-L392">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.poisson_loss" href="#Flux.Losses.poisson_loss"><code>Flux.Losses.poisson_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">poisson_loss(ŷ, y; agg = mean)</code></pre><p>Return how much the predicted distribution <code>ŷ</code> diverges from the expected Poisson distribution <code>y</code>; calculated as -</p><pre><code class="nohighlight hljs">sum(ŷ .- y .* log.(ŷ)) / size(y, 2)</code></pre><p><a href="https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/poisson">More information.</a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_model = [1, 3, 3];  # data should only take integral values
+Inf</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L358-L392">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.poisson_loss" href="#Flux.Losses.poisson_loss"><code>Flux.Losses.poisson_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">poisson_loss(ŷ, y; agg = mean)</code></pre><p>Return how much the predicted distribution <code>ŷ</code> diverges from the expected Poisson distribution <code>y</code>; calculated as -</p><pre><code class="nohighlight hljs">sum(ŷ .- y .* log.(ŷ)) / size(y, 2)</code></pre><p><a href="https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/poisson">More information.</a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_model = [1, 3, 3];  # data should only take integral values
 
 julia&gt; Flux.poisson_loss(y_model, 1:3)
-0.5023128522198171</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L401-L418">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.hinge_loss" href="#Flux.Losses.hinge_loss"><code>Flux.Losses.hinge_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">hinge_loss(ŷ, y; agg = mean)</code></pre><p>Return the <a href="https://en.wikipedia.org/wiki/Hinge_loss">hinge_loss</a> given the prediction <code>ŷ</code> and true labels <code>y</code> (containing 1 or -1); calculated as</p><pre><code class="nohighlight hljs">sum(max.(0, 1 .- ŷ .* y)) / size(y, 2)</code></pre><p>Usually used with classifiers like Support Vector Machines. See also: <a href="#Flux.Losses.squared_hinge_loss"><code>squared_hinge_loss</code></a></p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_true = [1, -1, 1, 1];
+0.5023128522198171</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L401-L418">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.hinge_loss" href="#Flux.Losses.hinge_loss"><code>Flux.Losses.hinge_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">hinge_loss(ŷ, y; agg = mean)</code></pre><p>Return the <a href="https://en.wikipedia.org/wiki/Hinge_loss">hinge_loss</a> given the prediction <code>ŷ</code> and true labels <code>y</code> (containing 1 or -1); calculated as</p><pre><code class="nohighlight hljs">sum(max.(0, 1 .- ŷ .* y)) / size(y, 2)</code></pre><p>Usually used with classifiers like Support Vector Machines. See also: <a href="#Flux.Losses.squared_hinge_loss"><code>squared_hinge_loss</code></a></p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_true = [1, -1, 1, 1];
 
 julia&gt; y_pred = [0.1, 0.3, 1, 1.5];
 
@@ -168,7 +168,7 @@
 true
 
 julia&gt; Flux.hinge_loss(y_pred[2], y_true[2]) != 0 # opposite signs
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L424-L453">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.squared_hinge_loss" href="#Flux.Losses.squared_hinge_loss"><code>Flux.Losses.squared_hinge_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">squared_hinge_loss(ŷ, y)</code></pre><p>Return the squared hinge_loss loss given the prediction <code>ŷ</code> and true labels <code>y</code> (containing 1 or -1); calculated as</p><pre><code class="nohighlight hljs">sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2)</code></pre><p>Usually used with classifiers like Support Vector Machines. See also: <a href="#Flux.Losses.hinge_loss"><code>hinge_loss</code></a></p><p><strong>Example</strong></p><pre><code class="language-jldoctes hljs">julia&gt; y_true = [1, -1, 1, 1];
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L424-L453">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.squared_hinge_loss" href="#Flux.Losses.squared_hinge_loss"><code>Flux.Losses.squared_hinge_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">squared_hinge_loss(ŷ, y)</code></pre><p>Return the squared hinge_loss loss given the prediction <code>ŷ</code> and true labels <code>y</code> (containing 1 or -1); calculated as</p><pre><code class="nohighlight hljs">sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2)</code></pre><p>Usually used with classifiers like Support Vector Machines. See also: <a href="#Flux.Losses.hinge_loss"><code>hinge_loss</code></a></p><p><strong>Example</strong></p><pre><code class="language-jldoctes hljs">julia&gt; y_true = [1, -1, 1, 1];
 
 julia&gt; y_pred = [0.1, 0.3, 1, 1.5];
 
@@ -182,13 +182,13 @@
 true
 
 julia&gt; Flux.squared_hinge_loss(y_pred[2], y_true[2]) != 0
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L459-L488">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.dice_coeff_loss" href="#Flux.Losses.dice_coeff_loss"><code>Flux.Losses.dice_coeff_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dice_coeff_loss(ŷ, y; smooth = 1)</code></pre><p>Return a loss based on the dice coefficient. Used in the <a href="https://arxiv.org/abs/1606.04797">V-Net</a> image segmentation architecture. The dice coefficient is similar to the F1_score. Loss calculated as:</p><pre><code class="nohighlight hljs">1 - 2*sum(|ŷ .* y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)</code></pre><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_pred = [1.1, 2.1, 3.1];
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L459-L488">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.dice_coeff_loss" href="#Flux.Losses.dice_coeff_loss"><code>Flux.Losses.dice_coeff_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dice_coeff_loss(ŷ, y; smooth = 1)</code></pre><p>Return a loss based on the dice coefficient. Used in the <a href="https://arxiv.org/abs/1606.04797">V-Net</a> image segmentation architecture. The dice coefficient is similar to the F1_score. Loss calculated as:</p><pre><code class="nohighlight hljs">1 - 2*sum(|ŷ .* y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)</code></pre><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y_pred = [1.1, 2.1, 3.1];
 
 julia&gt; Flux.dice_coeff_loss(y_pred, 1:3)
 0.000992391663909964
 
 julia&gt; 1 - Flux.dice_coeff_loss(y_pred, 1:3)  # ~ F1 score for image segmentation
-0.99900760833609</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L494-L514">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.tversky_loss" href="#Flux.Losses.tversky_loss"><code>Flux.Losses.tversky_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">tversky_loss(ŷ, y; beta = 0.7)</code></pre><p>Return the <a href="https://arxiv.org/abs/1706.05721">Tversky loss</a>. Used with imbalanced data to give more weight to false negatives. Larger <code>β == beta</code> weigh recall more than precision (by placing more emphasis on false negatives). Calculated as:</p><pre><code class="nohighlight hljs">1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + (1 - β)*(1 .- y) .* ŷ + β*y .* (1 .- ŷ)) + 1)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L522-L532">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.binary_focal_loss" href="#Flux.Losses.binary_focal_loss"><code>Flux.Losses.binary_focal_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">binary_focal_loss(ŷ, y; agg=mean, gamma=2, eps=eps(eltype(ŷ)))</code></pre><p>Return the <a href="https://arxiv.org/pdf/1708.02002.pdf">binary<em>focal</em>loss</a> The input, &#39;ŷ&#39;, is expected to be normalized (i.e. <a href="../nnlib/#Softmax">softmax</a> output).</p><p>For <code>gamma = 0</code>, the loss is mathematically equivalent to <a href="#Flux.Losses.binarycrossentropy"><code>Losses.binarycrossentropy</code></a>.</p><p>See also: <a href="#Flux.Losses.focal_loss"><code>Losses.focal_loss</code></a> for multi-class setting</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y = [0  1  0
+0.99900760833609</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L494-L514">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.tversky_loss" href="#Flux.Losses.tversky_loss"><code>Flux.Losses.tversky_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">tversky_loss(ŷ, y; beta = 0.7)</code></pre><p>Return the <a href="https://arxiv.org/abs/1706.05721">Tversky loss</a>. Used with imbalanced data to give more weight to false negatives. Larger <code>β == beta</code> weigh recall more than precision (by placing more emphasis on false negatives). Calculated as:</p><pre><code class="nohighlight hljs">1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + (1 - β)*(1 .- y) .* ŷ + β*y .* (1 .- ŷ)) + 1)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L522-L532">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.binary_focal_loss" href="#Flux.Losses.binary_focal_loss"><code>Flux.Losses.binary_focal_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">binary_focal_loss(ŷ, y; agg=mean, gamma=2, eps=eps(eltype(ŷ)))</code></pre><p>Return the <a href="https://arxiv.org/pdf/1708.02002.pdf">binary<em>focal</em>loss</a> The input, &#39;ŷ&#39;, is expected to be normalized (i.e. <a href="../nnlib/#Softmax">softmax</a> output).</p><p>For <code>gamma = 0</code>, the loss is mathematically equivalent to <a href="#Flux.Losses.binarycrossentropy"><code>Losses.binarycrossentropy</code></a>.</p><p>See also: <a href="#Flux.Losses.focal_loss"><code>Losses.focal_loss</code></a> for multi-class setting</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y = [0  1  0
             1  0  1]
 2×3 Matrix{Int64}:
  0  1  0
@@ -201,7 +201,7 @@
  0.731059  0.5  0.731059
 
 julia&gt; Flux.binary_focal_loss(ŷ, y) ≈ 0.0728675615927385
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L543-L570">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.focal_loss" href="#Flux.Losses.focal_loss"><code>Flux.Losses.focal_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">focal_loss(ŷ, y; dims=1, agg=mean, gamma=2, eps=eps(eltype(ŷ)))</code></pre><p>Return the <a href="https://arxiv.org/pdf/1708.02002.pdf">focal_loss</a> which can be used in classification tasks with highly imbalanced classes. It down-weights well-classified examples and focuses on hard examples. The input, &#39;ŷ&#39;, is expected to be normalized (i.e. <a href="../nnlib/#Softmax">softmax</a> output).</p><p>The modulating factor, <code>γ == gamma</code>, controls the down-weighting strength. For <code>γ == 0</code>, the loss is mathematically equivalent to <a href="#Flux.Losses.crossentropy"><code>Losses.crossentropy</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y = [1  0  0  0  1
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L543-L570">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.focal_loss" href="#Flux.Losses.focal_loss"><code>Flux.Losses.focal_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">focal_loss(ŷ, y; dims=1, agg=mean, gamma=2, eps=eps(eltype(ŷ)))</code></pre><p>Return the <a href="https://arxiv.org/pdf/1708.02002.pdf">focal_loss</a> which can be used in classification tasks with highly imbalanced classes. It down-weights well-classified examples and focuses on hard examples. The input, &#39;ŷ&#39;, is expected to be normalized (i.e. <a href="../nnlib/#Softmax">softmax</a> output).</p><p>The modulating factor, <code>γ == gamma</code>, controls the down-weighting strength. For <code>γ == 0</code>, the loss is mathematically equivalent to <a href="#Flux.Losses.crossentropy"><code>Losses.crossentropy</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; y = [1  0  0  0  1
             0  1  0  1  0
             0  0  1  0  0]
 3×5 Matrix{Int64}:
@@ -216,10 +216,10 @@
  0.665241   0.665241   0.665241   0.665241   0.665241
 
 julia&gt; Flux.focal_loss(ŷ, y) ≈ 1.1277571935622628
-true</code></pre><p>See also: <a href="#Flux.Losses.binary_focal_loss"><code>Losses.binary_focal_loss</code></a> for binary (not one-hot) labels</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L584-L617">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.siamese_contrastive_loss" href="#Flux.Losses.siamese_contrastive_loss"><code>Flux.Losses.siamese_contrastive_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">siamese_contrastive_loss(ŷ, y; margin = 1, agg = mean)</code></pre><p>Return the <a href="http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf">contrastive loss</a> which can be useful for training Siamese Networks. It is given by</p><pre><code class="nohighlight hljs">agg(@. (1 - y) * ŷ^2 + y * max(0, margin - ŷ)^2)</code></pre><p>Specify <code>margin</code> to set the baseline for distance at which pairs are dissimilar.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; ŷ = [0.5, 1.5, 2.5];
+true</code></pre><p>See also: <a href="#Flux.Losses.binary_focal_loss"><code>Losses.binary_focal_loss</code></a> for binary (not one-hot) labels</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L584-L617">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Losses.siamese_contrastive_loss" href="#Flux.Losses.siamese_contrastive_loss"><code>Flux.Losses.siamese_contrastive_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">siamese_contrastive_loss(ŷ, y; margin = 1, agg = mean)</code></pre><p>Return the <a href="http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf">contrastive loss</a> which can be useful for training Siamese Networks. It is given by</p><pre><code class="nohighlight hljs">agg(@. (1 - y) * ŷ^2 + y * max(0, margin - ŷ)^2)</code></pre><p>Specify <code>margin</code> to set the baseline for distance at which pairs are dissimilar.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; ŷ = [0.5, 1.5, 2.5];
 
 julia&gt; Flux.siamese_contrastive_loss(ŷ, 1:3)
 -4.833333333333333
 
 julia&gt; Flux.siamese_contrastive_loss(ŷ, 1:3, margin = 2)
--4.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/losses/functions.jl#L627-L647">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../utilities/">« Weight Initialisation</a><a class="docs-footer-nextpage" href="../../training/reference/">Training API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+-4.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/losses/functions.jl#L627-L647">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../utilities/">« Weight Initialisation</a><a class="docs-footer-nextpage" href="../../training/reference/">Training API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/models/nnlib/index.html b/dev/reference/models/nnlib/index.html
index 750dc83275..695b9daf0a 100644
--- a/dev/reference/models/nnlib/index.html
+++ b/dev/reference/models/nnlib/index.html
@@ -4,7 +4,7 @@
   gtag('js', new Date());
   gtag('config', 'UA-36890222-9', {'page_path': location.pathname + location.search + location.hash});
 </script><script data-outdated-warner src="../../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../../assets/documenter.js"></script><script src="../../../search_index.js"></script><script src="../../../siteinfo.js"></script><script src="../../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../../assets/themeswap.js"></script><link href="../../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../../"><img class="docs-light-only" src="../../../assets/logo.png" alt="Flux logo"/><img class="docs-dark-only" src="../../../assets/logo-dark.png" alt="Flux logo"/></a><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../../">Welcome</a></li><li><span class="tocitem">Guide</span><ul><li><a class="tocitem" href="../../../guide/models/quickstart/">Quick Start</a></li><li><a class="tocitem" href="../../../guide/models/overview/">Fitting a Line</a></li><li><a class="tocitem" href="../../../guide/models/basics/">Gradients and Layers</a></li><li><a class="tocitem" href="../../../guide/models/custom_layers/">Custom Layers</a></li><li><a class="tocitem" href="../../../guide/training/training/">Training</a></li><li><a class="tocitem" href="../../../guide/models/recurrence/">Recurrence</a></li><li><a class="tocitem" href="../../../guide/gpu/">GPU Support</a></li><li><a class="tocitem" href="../../../guide/saving/">Saving &amp; Loading</a></li><li><a class="tocitem" href="../../../guide/performance/">Performance Tips</a></li></ul></li><li><a class="tocitem" href="../../../ecosystem/">Ecosystem</a></li><li><span class="tocitem">Reference</span><ul><li><a class="tocitem" href="../layers/">Built-in Layers</a></li><li><a class="tocitem" href="../activation/">Activation Functions</a></li><li><a class="tocitem" href="../../utilities/">Weight Initialisation</a></li><li><a class="tocitem" href="../losses/">Loss Functions</a></li><li><a class="tocitem" href="../../training/reference/">Training API</a></li><li><a class="tocitem" href="../../training/optimisers/">Optimisation Rules</a></li><li><a class="tocitem" href="../../outputsize/">Shape Inference</a></li><li><a class="tocitem" href="../../destructure/">Flat vs. Nested</a></li><li><a class="tocitem" href="../../training/callbacks/">Callback Helpers</a></li><li><a class="tocitem" href="../../training/zygote/">Gradients – Zygote.jl</a></li><li><a class="tocitem" href="../../data/mlutils/">Batching Data – MLUtils.jl</a></li><li><a class="tocitem" href="../../data/onehot/">OneHotArrays.jl</a></li><li class="is-active"><a class="tocitem" href>Low-level Operations – NNlib.jl</a><ul class="internal"><li><a class="tocitem" href="#Attention"><span>Attention</span></a></li><li><a class="tocitem" href="#Softmax"><span>Softmax</span></a></li><li><a class="tocitem" href="#Pooling"><span>Pooling</span></a></li><li><a class="tocitem" href="#Padding"><span>Padding</span></a></li><li><a class="tocitem" href="#Convolution"><span>Convolution</span></a></li><li><a class="tocitem" href="#Dropout"><span>Dropout</span></a></li><li><a class="tocitem" href="#Upsampling"><span>Upsampling</span></a></li><li><a class="tocitem" href="#Batched-Operations"><span>Batched Operations</span></a></li><li><a class="tocitem" href="#Gather-and-Scatter"><span>Gather and Scatter</span></a></li><li><a class="tocitem" href="#Sampling"><span>Sampling</span></a></li><li><a class="tocitem" href="#Losses"><span>Losses</span></a></li><li><a class="tocitem" href="#Miscellaneous"><span>Miscellaneous</span></a></li></ul></li><li><a class="tocitem" href="../functors/">Nested Structures – Functors.jl</a></li></ul></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../../../tutorials/linear_regression/">Linear Regression</a></li><li><a class="tocitem" href="../../../tutorials/logistic_regression/">Logistic Regression</a></li><li><a class="tocitem" href="../../../tutorials/model_zoo/">Model Zoo</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">Reference</a></li><li class="is-active"><a href>Low-level Operations – NNlib.jl</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Low-level Operations – NNlib.jl</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl" title="View the repository on GitHub"><span class="docs-icon fa-brands"></span><span class="docs-label is-hidden-touch">GitHub</span></a><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/reference/models/nnlib.md" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Neural-Network-primitives-from-NNlib.jl"><a class="docs-heading-anchor" href="#Neural-Network-primitives-from-NNlib.jl">Neural Network primitives from NNlib.jl</a><a id="Neural-Network-primitives-from-NNlib.jl-1"></a><a class="docs-heading-anchor-permalink" href="#Neural-Network-primitives-from-NNlib.jl" title="Permalink"></a></h1><p>Flux re-exports all of the functions exported by the <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> package. This includes activation functions, described on <a href="../activation/#man-activations">their own page</a>. Many of the functions on this page exist primarily as the internal implementation of Flux layer, but can also be used independently.</p><h2 id="Attention"><a class="docs-heading-anchor" href="#Attention">Attention</a><a id="Attention-1"></a><a class="docs-heading-anchor-permalink" href="#Attention" title="Permalink"></a></h2><p>Primitives for the <a href="../layers/#MultiHeadAttention"><code>MultiHeadAttention</code></a> layer.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.dot_product_attention" href="#NNlib.dot_product_attention"><code>NNlib.dot_product_attention</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dot_product_attention(query, key, value, [bias]; [fdrop, mask, nheads])</code></pre><p>Multihead dot product attention used in transformer architectures.</p><p>The input arrays must have the first two dimensions given by the number of features and the sequence length, then an arbitrary number of batch dimensions or none.</p><p>Returns the attention output array of size <code>(v_dim, q_len, batch_size...)</code> and the attention scores of size <code>(kv_len, q_len, nheads, batch_size...)</code>.</p><p>See also <a href="#NNlib.dot_product_attention_scores"><code>dot_product_attention_scores</code></a> if you only need the attention scores.</p><p><strong>Arguments</strong></p><ul><li><code>query</code>: Query array of size <code>(qk_dim, q_len, batch_size...)</code>.</li><li><code>key</code>: Key array of size <code>(qk_dim, kv_len, batch_size...)</code>.</li><li><code>value</code>: Value array of size <code>(v_dim, kv_len, batch_size...)</code>.</li><li><code>bias</code>: Either <code>nothing</code> or an array broadcastable to size <code>(kv_len, q_len, nheads, batch_size)</code>.         It will be added to the attention scores before applying the softmax. Default <code>nothing</code>.</li><li><code>fdrop</code>: A dropout function or layer to be applied on the attention scores right after the softmax.          Default <code>identity</code> (no dropout).</li><li><code>mask</code>: Either <code>nothing</code> or a boolean array broadcastable to size <code>(kv_len, q_len, nheads, batch_size)</code>.         The mask is applied to the attention scores just before the softmax.         See <a href="#NNlib.make_causal_mask"><code>make_causal_mask</code></a> fore creating causal masks. Default <code>nothing</code>.</li><li><code>nheads</code>: Number of heads to split the input arrays into. Default <code>1</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia hljs">q, k, v = rand(10, 20, 2), rand(10, 30, 2), rand(20, 30, 2)
-y, α = dot_product_attention(q, k, v)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/attention.jl#L5-L39">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.dot_product_attention_scores" href="#NNlib.dot_product_attention_scores"><code>NNlib.dot_product_attention_scores</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dot_product_attention_scores(query, key, [bias]; [fdrop, mask])</code></pre><p>Return the attention scores for the <a href="#NNlib.dot_product_attention"><code>dot_product_attention</code></a>. Input arrays must have dimensions <code>(num_features ÷ nheads, nheads, sequence_length, batch_size)</code>.</p><p>See <a href="#NNlib.dot_product_attention"><code>dot_product_attention</code></a> for more details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/attention.jl#L103-L111">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.make_causal_mask" href="#NNlib.make_causal_mask"><code>NNlib.make_causal_mask</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">make_causal_mask(x, dims=2)</code></pre><p>Return a boolean square matrix <code>m</code> of the same type as <code>x</code> and of side <code>size(x, dims)</code>. Its elements are set such that <code>m[i, j] == i ≤ j</code>.</p><p>Can be used to mask the attention scores in <a href="#NNlib.dot_product_attention"><code>dot_product_attention</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/attention.jl#L141-L148">source</a></section></article><h2 id="Softmax"><a class="docs-heading-anchor" href="#Softmax">Softmax</a><a id="Softmax-1"></a><a class="docs-heading-anchor-permalink" href="#Softmax" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../losses/#Flux.Losses.logitcrossentropy"><code>Flux.logitcrossentropy</code></a> uses <a href="#NNlib.logsoftmax"><code>NNlib.logsoftmax</code></a> internally.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.softmax" href="#NNlib.softmax"><code>NNlib.softmax</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">softmax(x; dims = 1)</code></pre><p><a href="https://en.wikipedia.org/wiki/Softmax_function">Softmax</a> turns input array <code>x</code> into probability distributions that sum to 1 along the dimensions specified by <code>dims</code>. It is semantically equivalent to the following:</p><pre><code class="nohighlight hljs">softmax(x; dims = 1) = exp.(x) ./ sum(exp.(x), dims = dims)</code></pre><p>with additional manipulations enhancing numerical stability.</p><p>For a matrix input <code>x</code> it will by default (<code>dims = 1</code>) treat it as a batch of vectors, with each column independent. Keyword <code>dims = 2</code> will instead treat rows independently, and so on.</p><p>See also <a href="#NNlib.logsoftmax"><code>logsoftmax</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; softmax([1, 2, 3])
+y, α = dot_product_attention(q, k, v)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/attention.jl#L5-L39">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.dot_product_attention_scores" href="#NNlib.dot_product_attention_scores"><code>NNlib.dot_product_attention_scores</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dot_product_attention_scores(query, key, [bias]; [fdrop, mask])</code></pre><p>Return the attention scores for the <a href="#NNlib.dot_product_attention"><code>dot_product_attention</code></a>. Input arrays must have dimensions <code>(num_features ÷ nheads, nheads, sequence_length, batch_size)</code>.</p><p>See <a href="#NNlib.dot_product_attention"><code>dot_product_attention</code></a> for more details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/attention.jl#L103-L111">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.make_causal_mask" href="#NNlib.make_causal_mask"><code>NNlib.make_causal_mask</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">make_causal_mask(x, dims=2)</code></pre><p>Return a boolean square matrix <code>m</code> of the same type as <code>x</code> and of side <code>size(x, dims)</code>. Its elements are set such that <code>m[i, j] == i ≤ j</code>.</p><p>Can be used to mask the attention scores in <a href="#NNlib.dot_product_attention"><code>dot_product_attention</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/attention.jl#L141-L148">source</a></section></article><h2 id="Softmax"><a class="docs-heading-anchor" href="#Softmax">Softmax</a><a id="Softmax-1"></a><a class="docs-heading-anchor-permalink" href="#Softmax" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../losses/#Flux.Losses.logitcrossentropy"><code>Flux.logitcrossentropy</code></a> uses <a href="#NNlib.logsoftmax"><code>NNlib.logsoftmax</code></a> internally.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.softmax" href="#NNlib.softmax"><code>NNlib.softmax</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">softmax(x; dims = 1)</code></pre><p><a href="https://en.wikipedia.org/wiki/Softmax_function">Softmax</a> turns input array <code>x</code> into probability distributions that sum to 1 along the dimensions specified by <code>dims</code>. It is semantically equivalent to the following:</p><pre><code class="nohighlight hljs">softmax(x; dims = 1) = exp.(x) ./ sum(exp.(x), dims = dims)</code></pre><p>with additional manipulations enhancing numerical stability.</p><p>For a matrix input <code>x</code> it will by default (<code>dims = 1</code>) treat it as a batch of vectors, with each column independent. Keyword <code>dims = 2</code> will instead treat rows independently, and so on.</p><p>See also <a href="#NNlib.logsoftmax"><code>logsoftmax</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; softmax([1, 2, 3])
 3-element Vector{Float64}:
  0.09003057317038046
  0.24472847105479764
@@ -28,8 +28,8 @@
 (7, 13)
 
 julia&gt; Dense(4 =&gt; 7, softmax)(x)
-ERROR: `softmax(x)` called with a number, but it expects an array. </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/softmax.jl#L2-L55">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.logsoftmax" href="#NNlib.logsoftmax"><code>NNlib.logsoftmax</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logsoftmax(x; dims = 1)</code></pre><p>Computes the log of softmax in a more numerically stable way than directly taking <code>log.(softmax(xs))</code>. Commonly used in computing cross entropy loss.</p><p>It is semantically equivalent to the following:</p><pre><code class="nohighlight hljs">logsoftmax(x; dims = 1) = x .- log.(sum(exp.(x), dims = dims))</code></pre><p>See also <a href="#NNlib.softmax"><code>softmax</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/softmax.jl#L94-L106">source</a></section></article><h2 id="Pooling"><a class="docs-heading-anchor" href="#Pooling">Pooling</a><a id="Pooling-1"></a><a class="docs-heading-anchor-permalink" href="#Pooling" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.AdaptiveMaxPool"><code>AdaptiveMaxPool</code></a>, <a href="../layers/#Flux.AdaptiveMeanPool"><code>AdaptiveMeanPool</code></a>, <a href="../layers/#Flux.GlobalMaxPool"><code>GlobalMaxPool</code></a>, <a href="../layers/#Flux.GlobalMeanPool"><code>GlobalMeanPool</code></a>,  <a href="../layers/#Flux.MaxPool"><code>MaxPool</code></a>, and <a href="../layers/#Flux.MeanPool"><code>MeanPool</code></a> use <a href="#NNlib.PoolDims"><code>NNlib.PoolDims</code></a>, <a href="#NNlib.maxpool"><code>NNlib.maxpool</code></a>, and <a href="#NNlib.meanpool"><code>NNlib.meanpool</code></a> as their backend.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.PoolDims" href="#NNlib.PoolDims"><code>NNlib.PoolDims</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">PoolDims(x_size::NTuple{M}, k::Union{NTuple{L, Int}, Int};
-        stride=k, padding=0, dilation=1)  where {M, L}</code></pre><p>Dimensions for a &quot;pooling&quot; operation that can have an arbitrary input size, kernel size, stride, dilation, and channel count.  Used to dispatch onto efficient implementations at compile-time.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/dim_helpers/PoolDims.jl#L1-L8">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.lpnormpool" href="#NNlib.lpnormpool"><code>NNlib.lpnormpool</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">lpnormpool(x, p::Real, k::NTuple{N, Integer}; pad=0, stride=k)</code></pre><p>Perform Lp pool operation with value of the Lp norm <code>p</code> and window size <code>k</code> on input tensor <code>x</code>, also known as LPPool in pytorch. This pooling operator from <a href="https://arxiv.org/abs/1311.1780">Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks</a>.</p><p>Arguments:</p><ul><li><code>x</code> and <code>k</code>: Expects <code>ndim(x) ∈ 3:5</code><code>, and always</code>length(k) == ndim(x) - 2`</li><li><code>p</code> is restricted to <code>0 &lt; p &lt; Inf</code>.</li><li><code>pad</code>: See <a href="#NNlib.pad_zeros"><code>pad_zeros</code></a> for details.</li><li><code>stride</code>: Either a tuple with the same length as <code>k</code>, or one integer for all directions. Default is <code>k</code>.</li></ul><p>For all elements <code>x</code> in a size <code>k</code> window, lpnormpool computes <code>(∑ᵢ xᵢ^p)^(1 / p)</code> as an element of the output.</p><p>Thus <code>lpnormpool(x, 1, k) ./ prod(k) ≈ meanpool(x, k)</code> and <code>lpnormpool(x, 2, k).^2 ./ prod(k) ≈ meanpool(x.^2, k)</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/pooling.jl#L187-L203">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.maxpool" href="#NNlib.maxpool"><code>NNlib.maxpool</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">maxpool(x, k::NTuple{N, Integer}; pad=0, stride=k)</code></pre><p>Perform max pool operation with window size <code>k</code> on input tensor <code>x</code>.</p><p>Arguments:</p><ul><li><code>x</code> and <code>k</code>: Expects <code>ndim(x) ∈ 3:5</code>, and always <code>length(k) == ndim(x) - 2</code></li><li><code>pad</code>: See <a href="#NNlib.pad_zeros"><code>pad_zeros</code></a> for details.</li><li><code>stride</code>: Either a tuple with the same length as <code>k</code>, or one integer for all directions. Default is <code>k</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/pooling.jl#L149-L159">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.meanpool" href="#NNlib.meanpool"><code>NNlib.meanpool</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">meanpool(x, k::NTuple{N, Integer}; pad=0, stride=k)</code></pre><p>Perform mean pool operation with window size <code>k</code> on input tensor <code>x</code>.</p><p>Arguments:</p><ul><li><code>x</code> and <code>k</code>: Expects <code>ndim(x) ∈ 3:5</code><code>, and always</code>length(k) == ndim(x) - 2`</li><li><code>pad</code>: See <a href="#NNlib.pad_zeros"><code>pad_zeros</code></a> for details.</li><li><code>stride</code>: Either a tuple with the same length as <code>k</code>, or one integer for all directions. Default is <code>k</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/pooling.jl#L168-L178">source</a></section></article><h2 id="Padding"><a class="docs-heading-anchor" href="#Padding">Padding</a><a id="Padding-1"></a><a class="docs-heading-anchor-permalink" href="#Padding" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_circular" href="#NNlib.pad_circular"><code>NNlib.pad_circular</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_circular(x, pad::Tuple; [dims])
+ERROR: `softmax(x)` called with a number, but it expects an array. </code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/softmax.jl#L2-L55">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.logsoftmax" href="#NNlib.logsoftmax"><code>NNlib.logsoftmax</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logsoftmax(x; dims = 1)</code></pre><p>Computes the log of softmax in a more numerically stable way than directly taking <code>log.(softmax(xs))</code>. Commonly used in computing cross entropy loss.</p><p>It is semantically equivalent to the following:</p><pre><code class="nohighlight hljs">logsoftmax(x; dims = 1) = x .- log.(sum(exp.(x), dims = dims))</code></pre><p>See also <a href="#NNlib.softmax"><code>softmax</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/softmax.jl#L94-L106">source</a></section></article><h2 id="Pooling"><a class="docs-heading-anchor" href="#Pooling">Pooling</a><a id="Pooling-1"></a><a class="docs-heading-anchor-permalink" href="#Pooling" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.AdaptiveMaxPool"><code>AdaptiveMaxPool</code></a>, <a href="../layers/#Flux.AdaptiveMeanPool"><code>AdaptiveMeanPool</code></a>, <a href="../layers/#Flux.GlobalMaxPool"><code>GlobalMaxPool</code></a>, <a href="../layers/#Flux.GlobalMeanPool"><code>GlobalMeanPool</code></a>,  <a href="../layers/#Flux.MaxPool"><code>MaxPool</code></a>, and <a href="../layers/#Flux.MeanPool"><code>MeanPool</code></a> use <a href="#NNlib.PoolDims"><code>NNlib.PoolDims</code></a>, <a href="#NNlib.maxpool"><code>NNlib.maxpool</code></a>, and <a href="#NNlib.meanpool"><code>NNlib.meanpool</code></a> as their backend.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.PoolDims" href="#NNlib.PoolDims"><code>NNlib.PoolDims</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">PoolDims(x_size::NTuple{M}, k::Union{NTuple{L, Int}, Int};
+        stride=k, padding=0, dilation=1)  where {M, L}</code></pre><p>Dimensions for a &quot;pooling&quot; operation that can have an arbitrary input size, kernel size, stride, dilation, and channel count.  Used to dispatch onto efficient implementations at compile-time.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/dim_helpers/PoolDims.jl#L1-L8">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.lpnormpool" href="#NNlib.lpnormpool"><code>NNlib.lpnormpool</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">lpnormpool(x, p::Real, k::NTuple{N, Integer}; pad=0, stride=k)</code></pre><p>Perform Lp pool operation with value of the Lp norm <code>p</code> and window size <code>k</code> on input tensor <code>x</code>, also known as LPPool in pytorch. This pooling operator from <a href="https://arxiv.org/abs/1311.1780">Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks</a>.</p><p>Arguments:</p><ul><li><code>x</code> and <code>k</code>: Expects <code>ndim(x) ∈ 3:5</code><code>, and always</code>length(k) == ndim(x) - 2`</li><li><code>p</code> is restricted to <code>0 &lt; p &lt; Inf</code>.</li><li><code>pad</code>: See <a href="#NNlib.pad_zeros"><code>pad_zeros</code></a> for details.</li><li><code>stride</code>: Either a tuple with the same length as <code>k</code>, or one integer for all directions. Default is <code>k</code>.</li></ul><p>For all elements <code>x</code> in a size <code>k</code> window, lpnormpool computes <code>(∑ᵢ xᵢ^p)^(1 / p)</code> as an element of the output.</p><p>Thus <code>lpnormpool(x, 1, k) ./ prod(k) ≈ meanpool(x, k)</code> and <code>lpnormpool(x, 2, k).^2 ./ prod(k) ≈ meanpool(x.^2, k)</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/pooling.jl#L187-L203">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.maxpool" href="#NNlib.maxpool"><code>NNlib.maxpool</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">maxpool(x, k::NTuple{N, Integer}; pad=0, stride=k)</code></pre><p>Perform max pool operation with window size <code>k</code> on input tensor <code>x</code>.</p><p>Arguments:</p><ul><li><code>x</code> and <code>k</code>: Expects <code>ndim(x) ∈ 3:5</code>, and always <code>length(k) == ndim(x) - 2</code></li><li><code>pad</code>: See <a href="#NNlib.pad_zeros"><code>pad_zeros</code></a> for details.</li><li><code>stride</code>: Either a tuple with the same length as <code>k</code>, or one integer for all directions. Default is <code>k</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/pooling.jl#L149-L159">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.meanpool" href="#NNlib.meanpool"><code>NNlib.meanpool</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">meanpool(x, k::NTuple{N, Integer}; pad=0, stride=k)</code></pre><p>Perform mean pool operation with window size <code>k</code> on input tensor <code>x</code>.</p><p>Arguments:</p><ul><li><code>x</code> and <code>k</code>: Expects <code>ndim(x) ∈ 3:5</code><code>, and always</code>length(k) == ndim(x) - 2`</li><li><code>pad</code>: See <a href="#NNlib.pad_zeros"><code>pad_zeros</code></a> for details.</li><li><code>stride</code>: Either a tuple with the same length as <code>k</code>, or one integer for all directions. Default is <code>k</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/pooling.jl#L168-L178">source</a></section></article><h2 id="Padding"><a class="docs-heading-anchor" href="#Padding">Padding</a><a id="Padding-1"></a><a class="docs-heading-anchor-permalink" href="#Padding" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_circular" href="#NNlib.pad_circular"><code>NNlib.pad_circular</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_circular(x, pad::Tuple; [dims])
 pad_circular(x, pad::Int; [dims])</code></pre><p>Pad the array <code>x</code> &quot;circularly&quot; across the border by wrapping around values from the opposite side of <code>x</code>. </p><p><code>pad</code> can a tuple of integers <code>(l1, r1, ..., ln, rn)</code> of some length <code>2n</code> that specifies the left and right padding size for each of the dimensions in <code>dims</code>. If <code>dims</code> is not given,  it defaults to the first <code>n</code> dimensions.</p><p>If <code>pad</code> is an integer, it is applied on both sides on every dimension in <code>dims</code>. In this case, <code>dims</code>  defaults to the first <code>ndims(x)-2</code> dimensions  (i.e. excludes the channel and batch dimension). </p><p>The pad length on either side in any dimension must not exceed the size of <code>x</code> in that dimension, i.e. <code>pad_circular</code> is not able to create abitrary sized tilings of <code>x</code>.</p><p>See also <a href="#NNlib.pad_repeat"><code>pad_repeat</code></a>, <a href="#NNlib.pad_reflect"><code>pad_reflect</code></a>, <a href="#NNlib.pad_symmetric"><code>pad_symmetric</code></a>, and <a href="#NNlib.pad_constant"><code>pad_constant</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; r = reshape(1:9, 3, 3)
 3×3 reshape(::UnitRange{Int64}, 3, 3) with eltype Int64:
  1  4  7
@@ -43,7 +43,7 @@
  8  2  5  8  2  5
  9  3  6  9  3  6
  7  1  4  7  1  4
- 8  2  5  8  2  5</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/padding.jl#L342-L379">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_constant" href="#NNlib.pad_constant"><code>NNlib.pad_constant</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_constant(x, pad::Tuple, val = 0; [dims = :])
+ 8  2  5  8  2  5</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/padding.jl#L342-L379">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_constant" href="#NNlib.pad_constant"><code>NNlib.pad_constant</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_constant(x, pad::Tuple, val = 0; [dims = :])
 pad_constant(x, pad::Int, val = 0; [dims = :])</code></pre><p>Pad the array <code>x</code> with the constant value <code>val</code>.</p><p><code>pad</code> can be a tuple of integers. If it is of some length <code>2 * length(dims)</code> that specifies the left and right padding size for each of the dimensions in <code>dims</code> as <code>(l1, r1, ..., ln, rn)</code>.  If supplied with a tuple of length <code>length(dims)</code> instead, it applies symmetric padding. If <code>dims</code> is not given, it defaults to all dimensions.</p><p>For integer <code>pad</code> input, it is applied on both sides on every dimension in <code>dims</code>.</p><p>See also <a href="#NNlib.pad_zeros"><code>pad_zeros</code></a>, <a href="#NNlib.pad_repeat"><code>pad_repeat</code></a>, <a href="#NNlib.pad_reflect"><code>pad_reflect</code></a>, <a href="#NNlib.pad_symmetric"><code>pad_symmetric</code></a>, and <a href="#NNlib.pad_circular"><code>pad_circular</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; r = reshape(1:4, 2, 2)
 2×2 reshape(::UnitRange{Int64}, 2, 2) with eltype Int64:
  1  3
@@ -110,7 +110,7 @@
 julia&gt; pad_constant(r, (2,1, 3), dims = (1,2)) # padding must always be either the same length as dims, or double it
 ERROR: ArgumentError: Could not parse padding (2, 1, 3) and dims (1, 2)
 Stacktrace:
-[...]</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/padding.jl#L11-L97">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_reflect" href="#NNlib.pad_reflect"><code>NNlib.pad_reflect</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_reflect(x, pad::Tuple; [dims])
+[...]</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/padding.jl#L11-L97">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_reflect" href="#NNlib.pad_reflect"><code>NNlib.pad_reflect</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_reflect(x, pad::Tuple; [dims])
 pad_reflect(x, pad::Int; [dims])</code></pre><p>Pad the array <code>x</code> reflecting its values across the border.</p><p><code>pad</code> can a tuple of integers <code>(l1, r1, ..., ln, rn)</code> of some length <code>2n</code> that specifies the left and right padding size for each of the dimensions in <code>dims</code>. If <code>dims</code> is not given,  it defaults to the first <code>n</code> dimensions.</p><p>If <code>pad</code> is an integer, it is applied on both sides on every dimension in <code>dims</code>. In this case, <code>dims</code>  defaults to the first <code>ndims(x)-2</code> dimensions  (i.e. excludes the channel and batch dimension). </p><p>See also <a href="#NNlib.pad_repeat"><code>pad_repeat</code></a>, <a href="#NNlib.pad_symmetric"><code>pad_symmetric</code></a>, <a href="#NNlib.pad_circular"><code>pad_circular</code></a>, and <a href="#NNlib.pad_constant"><code>pad_constant</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; r = reshape(1:9, 3, 3)
 3×3 reshape(::UnitRange{Int64}, 3, 3) with eltype Int64:
  1  4  7
@@ -124,7 +124,7 @@
  5  2  5  8  5  2
  6  3  6  9  6  3
  5  2  5  8  5  2
- 4  1  4  7  4  1</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/padding.jl#L223-L257">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_repeat" href="#NNlib.pad_repeat"><code>NNlib.pad_repeat</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_repeat(x, pad::Tuple; [dims])
+ 4  1  4  7  4  1</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/padding.jl#L223-L257">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_repeat" href="#NNlib.pad_repeat"><code>NNlib.pad_repeat</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_repeat(x, pad::Tuple; [dims])
 pad_repeat(x, pad::Int; [dims])</code></pre><p>Pad the array <code>x</code> repeating the values on the border.</p><p><code>pad</code> can a tuple of integers <code>(l1, r1, ..., ln, rn)</code> of some length <code>2n</code> that specifies the left and right padding size for each of the dimensions in <code>dims</code>. If <code>dims</code> is not given,  it defaults to the first <code>n</code> dimensions.</p><p>If <code>pad</code> is an integer, it is applied on both sides on every dimension in <code>dims</code>. In this case, <code>dims</code>  defaults to the first <code>ndims(x)-2</code> dimensions  (i.e. excludes the channel and batch dimension). </p><p>See also <a href="#NNlib.pad_reflect"><code>pad_reflect</code></a>, <a href="#NNlib.pad_symmetric"><code>pad_symmetric</code></a>, <a href="#NNlib.pad_circular"><code>pad_circular</code></a>, and <a href="#NNlib.pad_constant"><code>pad_constant</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; r = reshape(1:9, 3, 3)
 3×3 reshape(::UnitRange{Int64}, 3, 3) with eltype Int64:
  1  4  7
@@ -138,7 +138,7 @@
  2  2  2  2  5  8  8  8  8  8
  3  3  3  3  6  9  9  9  9  9
  3  3  3  3  6  9  9  9  9  9
- 3  3  3  3  6  9  9  9  9  9</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/padding.jl#L162-L196">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_symmetric" href="#NNlib.pad_symmetric"><code>NNlib.pad_symmetric</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_symmetric(x, pad::Tuple; [dims])
+ 3  3  3  3  6  9  9  9  9  9</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/padding.jl#L162-L196">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_symmetric" href="#NNlib.pad_symmetric"><code>NNlib.pad_symmetric</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_symmetric(x, pad::Tuple; [dims])
 pad_symmetric(x, pad::Int; [dims])</code></pre><p>Pad the array <code>x</code> reflecting its values symmetrically across the border, i.e. the border values of <code>x</code> are present in the padding values, in contrast to <a href="#NNlib.pad_reflect"><code>pad_reflect</code></a>.</p><p><code>pad</code> can a tuple of integers <code>(l1, r1, ..., ln, rn)</code> of some length <code>2n</code> that specifies the left and right padding size for each of the dimensions in <code>dims</code>. If <code>dims</code> is not given,  it defaults to the first <code>n</code> dimensions.</p><p>If <code>pad</code> is an integer, it is applied on both sides on every dimension in <code>dims</code>. In this case, <code>dims</code>  defaults to the first <code>ndims(x)-2</code> dimensions  (i.e. excludes the channel and batch dimension). </p><p>See also <a href="#NNlib.pad_repeat"><code>pad_repeat</code></a>, <a href="#NNlib.pad_reflect"><code>pad_reflect</code></a>, <a href="#NNlib.pad_circular"><code>pad_circular</code></a>, and <a href="#NNlib.pad_constant"><code>pad_constant</code></a>.</p><pre><code class="language-julia-repl hljs">julia&gt; r = reshape(1:9, 3, 3)
 3×3 reshape(::UnitRange{Int64}, 3, 3) with eltype Int64:
  1  4  7
@@ -152,8 +152,8 @@
  2  2  5  8  8  5
  3  3  6  9  9  6
  3  3  6  9  9  6
- 2  2  5  8  8  5</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/padding.jl#L282-L316">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_zeros" href="#NNlib.pad_zeros"><code>NNlib.pad_zeros</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_zeros(x, pad::Tuple; [dims])
-pad_zeros(x, pad::Int; [dims])</code></pre><p>Pad the array <code>x</code> with zeros. Equivalent to <a href="#NNlib.pad_constant"><code>pad_constant</code></a> with the constant equal to 0. </p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/padding.jl#L1-L7">source</a></section></article><h2 id="Convolution"><a class="docs-heading-anchor" href="#Convolution">Convolution</a><a id="Convolution-1"></a><a class="docs-heading-anchor-permalink" href="#Convolution" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.Conv"><code>Conv</code></a> and <a href="../layers/#Flux.CrossCor"><code>CrossCor</code></a> layers use <a href="#NNlib.DenseConvDims"><code>NNlib.DenseConvDims</code></a> and <a href="#NNlib.conv"><code>NNlib.conv</code></a> internally. </p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.conv" href="#NNlib.conv"><code>NNlib.conv</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">conv(x, w; stride = 1, pad = 0, dilation = 1, flipped = false, groups = 1)</code></pre><p>Apply convolution filter <code>w</code> to input <code>x</code>. <code>x</code> and <code>w</code> are 3d/4d/5d tensors in 1d/2d/3d convolutions respectively. <code>x</code> and <code>w</code> may have real or complex element types.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/conv.jl#L44-L49">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.ConvDims" href="#NNlib.ConvDims"><code>NNlib.ConvDims</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">ConvDims</code></pre><p>Type system-level information about convolution dimensions. Critical for things like <code>im2col!()</code> to generate efficient code, and helpful to reduce the number of kwargs getting passed around.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/dim_helpers/ConvDims.jl#L1-L7">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.depthwiseconv" href="#NNlib.depthwiseconv"><code>NNlib.depthwiseconv</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">depthwiseconv(x, w; stride=1, pad=0, dilation=1, flipped=false)</code></pre><p>Depthwise convolution operation with filter <code>w</code> on input <code>x</code>. <code>x</code> and <code>w</code> are 3d/4d/5d tensors in 1d/2d/3d convolutions respectively.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/conv.jl#L59-L64">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.DepthwiseConvDims" href="#NNlib.DepthwiseConvDims"><code>NNlib.DepthwiseConvDims</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">DepthwiseConvDims</code></pre><p>Concrete subclass of <code>ConvDims</code> for a depthwise convolution.  Differs primarily due to characterization by C<em>in, C</em>mult, rather than C<em>in, C</em>out.  Useful to be separate from DenseConvDims primarily for channel calculation differences.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/dim_helpers/DepthwiseConvDims.jl#L1-L7">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.DenseConvDims" href="#NNlib.DenseConvDims"><code>NNlib.DenseConvDims</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">DenseConvDims</code></pre><p>Concrete subclass of <code>ConvDims</code> for a normal, dense, conv2d/conv3d.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/dim_helpers/DenseConvDims.jl#L1-L5">source</a></section></article><h2 id="Dropout"><a class="docs-heading-anchor" href="#Dropout">Dropout</a><a id="Dropout-1"></a><a class="docs-heading-anchor-permalink" href="#Dropout" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.dropout" href="#NNlib.dropout"><code>NNlib.dropout</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dropout([rng], A, p; [dims])</code></pre><p>Returns an array in which each element of <code>A</code> is either replaced with zero, with probability <code>p</code>, or else multiplied by <code>1/(1-p)</code>.</p><p>By default every element is treated independently. With keyword <code>dims=1</code>, a choice is made for every value of the 1st index i.e. each row of a matrix is either zero or not.</p><p>Optional first argument is the random number generator used.</p><p><strong>Examples</strong></p><pre><code class="nohighlight hljs">julia&gt; dropout(ones(2, 10), 0.2)
+ 2  2  5  8  8  5</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/padding.jl#L282-L316">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pad_zeros" href="#NNlib.pad_zeros"><code>NNlib.pad_zeros</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pad_zeros(x, pad::Tuple; [dims])
+pad_zeros(x, pad::Int; [dims])</code></pre><p>Pad the array <code>x</code> with zeros. Equivalent to <a href="#NNlib.pad_constant"><code>pad_constant</code></a> with the constant equal to 0. </p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/padding.jl#L1-L7">source</a></section></article><h2 id="Convolution"><a class="docs-heading-anchor" href="#Convolution">Convolution</a><a id="Convolution-1"></a><a class="docs-heading-anchor-permalink" href="#Convolution" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.Conv"><code>Conv</code></a> and <a href="../layers/#Flux.CrossCor"><code>CrossCor</code></a> layers use <a href="#NNlib.DenseConvDims"><code>NNlib.DenseConvDims</code></a> and <a href="#NNlib.conv"><code>NNlib.conv</code></a> internally. </p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.conv" href="#NNlib.conv"><code>NNlib.conv</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">conv(x, w; stride = 1, pad = 0, dilation = 1, flipped = false, groups = 1)</code></pre><p>Apply convolution filter <code>w</code> to input <code>x</code>. <code>x</code> and <code>w</code> are 3d/4d/5d tensors in 1d/2d/3d convolutions respectively. <code>x</code> and <code>w</code> may have real or complex element types.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/conv.jl#L44-L49">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.ConvDims" href="#NNlib.ConvDims"><code>NNlib.ConvDims</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">ConvDims</code></pre><p>Type system-level information about convolution dimensions. Critical for things like <code>im2col!()</code> to generate efficient code, and helpful to reduce the number of kwargs getting passed around.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/dim_helpers/ConvDims.jl#L1-L7">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.depthwiseconv" href="#NNlib.depthwiseconv"><code>NNlib.depthwiseconv</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">depthwiseconv(x, w; stride=1, pad=0, dilation=1, flipped=false)</code></pre><p>Depthwise convolution operation with filter <code>w</code> on input <code>x</code>. <code>x</code> and <code>w</code> are 3d/4d/5d tensors in 1d/2d/3d convolutions respectively.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/conv.jl#L59-L64">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.DepthwiseConvDims" href="#NNlib.DepthwiseConvDims"><code>NNlib.DepthwiseConvDims</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">DepthwiseConvDims</code></pre><p>Concrete subclass of <code>ConvDims</code> for a depthwise convolution.  Differs primarily due to characterization by C<em>in, C</em>mult, rather than C<em>in, C</em>out.  Useful to be separate from DenseConvDims primarily for channel calculation differences.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/dim_helpers/DepthwiseConvDims.jl#L1-L7">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.DenseConvDims" href="#NNlib.DenseConvDims"><code>NNlib.DenseConvDims</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">DenseConvDims</code></pre><p>Concrete subclass of <code>ConvDims</code> for a normal, dense, conv2d/conv3d.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/dim_helpers/DenseConvDims.jl#L1-L5">source</a></section></article><h2 id="Dropout"><a class="docs-heading-anchor" href="#Dropout">Dropout</a><a id="Dropout-1"></a><a class="docs-heading-anchor-permalink" href="#Dropout" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.dropout" href="#NNlib.dropout"><code>NNlib.dropout</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dropout([rng], A, p; [dims])</code></pre><p>Returns an array in which each element of <code>A</code> is either replaced with zero, with probability <code>p</code>, or else multiplied by <code>1/(1-p)</code>.</p><p>By default every element is treated independently. With keyword <code>dims=1</code>, a choice is made for every value of the 1st index i.e. each row of a matrix is either zero or not.</p><p>Optional first argument is the random number generator used.</p><p><strong>Examples</strong></p><pre><code class="nohighlight hljs">julia&gt; dropout(ones(2, 10), 0.2)
 2×10 Matrix{Float64}:
  1.25  1.25  0.0   1.25  1.25  1.25  1.25  1.25  1.25  1.25
  1.25  1.25  1.25  0.0   1.25  1.25  0.0   1.25  1.25  1.25
@@ -172,7 +172,7 @@
 
 julia&gt; mean(dropout(ones(10^4, 5), 0.3, dims=1), dims=1)
 1×5 Matrix{Float64}:
- 1.00571  1.00571  1.00571  1.00571  1.00571</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/dropout.jl#L2-L37">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.dropout!" href="#NNlib.dropout!"><code>NNlib.dropout!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dropout!(B, A, p; [dims])</code></pre><p>This does exactly <code>B .= dropout(A, p; dims)</code>, or rather, it&#39;s the implementation of out-of-place <a href="#NNlib.dropout"><code>dropout</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/dropout.jl#L55-L60">source</a></section></article><h2 id="Upsampling"><a class="docs-heading-anchor" href="#Upsampling">Upsampling</a><a id="Upsampling-1"></a><a class="docs-heading-anchor-permalink" href="#Upsampling" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.Upsample"><code>Upsample</code></a> layer uses <a href="#NNlib.upsample_nearest"><code>NNlib.upsample_nearest</code></a>, <a href="#NNlib.upsample_bilinear"><code>NNlib.upsample_bilinear</code></a>, and <a href="#NNlib.upsample_trilinear"><code>NNlib.upsample_trilinear</code></a> as its backend. Additionally, <code>Flux</code>&#39;s <a href="../layers/#Flux.PixelShuffle"><code>PixelShuffle</code></a> layer uses <a href="#NNlib.pixel_shuffle"><code>NNlib.pixel_shuffle</code></a> as its backend.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.upsample_nearest" href="#NNlib.upsample_nearest"><code>NNlib.upsample_nearest</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">upsample_nearest(x, scale::NTuple{S,Int})
+ 1.00571  1.00571  1.00571  1.00571  1.00571</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/dropout.jl#L2-L37">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.dropout!" href="#NNlib.dropout!"><code>NNlib.dropout!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">dropout!(B, A, p; [dims])</code></pre><p>This does exactly <code>B .= dropout(A, p; dims)</code>, or rather, it&#39;s the implementation of out-of-place <a href="#NNlib.dropout"><code>dropout</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/dropout.jl#L55-L60">source</a></section></article><h2 id="Upsampling"><a class="docs-heading-anchor" href="#Upsampling">Upsampling</a><a id="Upsampling-1"></a><a class="docs-heading-anchor-permalink" href="#Upsampling" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.Upsample"><code>Upsample</code></a> layer uses <a href="#NNlib.upsample_nearest"><code>NNlib.upsample_nearest</code></a>, <a href="#NNlib.upsample_bilinear"><code>NNlib.upsample_bilinear</code></a>, and <a href="#NNlib.upsample_trilinear"><code>NNlib.upsample_trilinear</code></a> as its backend. Additionally, <code>Flux</code>&#39;s <a href="../layers/#Flux.PixelShuffle"><code>PixelShuffle</code></a> layer uses <a href="#NNlib.pixel_shuffle"><code>NNlib.pixel_shuffle</code></a> as its backend.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.upsample_nearest" href="#NNlib.upsample_nearest"><code>NNlib.upsample_nearest</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">upsample_nearest(x, scale::NTuple{S,Int})
 upsample_nearest(x; size::NTuple{S,Int})</code></pre><p>Upsamples the array <code>x</code> by integer multiples along the first <code>S</code> dimensions. Subsequent dimensions of <code>x</code> are not altered.</p><p>Either the <code>scale</code> factors or the final output <code>size</code> can be specified.</p><p>See also <a href="#NNlib.upsample_bilinear"><code>upsample_bilinear</code></a>, for two dimensions of an <code>N=4</code> array.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; upsample_nearest([1 2 3; 4 5 6], (2, 3))
 4×9 Matrix{Int64}:
  1  1  1  2  2  2  3  3  3
@@ -191,8 +191,8 @@
  4  5  6
 
 julia&gt; ans == upsample_nearest([1 2 3; 4 5 6], size=(4,))
-true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/upsample.jl#L120-L153">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.upsample_linear" href="#NNlib.upsample_linear"><code>NNlib.upsample_linear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">upsample_linear(x::AbstractArray{T,3}, scale::Real; align_corners::Bool = true)
-upsample_linear(x::AbstractArray{T,3}; size::Integer, align_corners::Bool = true)</code></pre><p>Upsamples the first dimension of the array <code>x</code> by the upsample provided <code>scale</code>, using linear interpolation. As an alternative to using <code>scale</code>, the resulting array <code>size</code> can be directly specified with a keyword argument.</p><p>The size of the output is equal to <code>(scale*S1, S2, S3)</code>, where <code>S1, S2, S3 = size(x)</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/upsample.jl#L207-L217">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.∇upsample_linear" href="#NNlib.∇upsample_linear"><code>NNlib.∇upsample_linear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">∇upsample_linear(Δ::AbstractArray{T,3}; size::Integer, align_corners::Bool = true) where T</code></pre><p><strong>Arguments</strong></p><ul><li><code>Δ</code>: Incoming gradient array, backpropagated from downstream layers</li><li><code>size</code>: Size of the image upsampled in the first place</li></ul><p><strong>Outputs</strong></p><ul><li><code>dx</code>: Downsampled version of <code>Δ</code></li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/upsample.jl#L247-L256">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.upsample_bilinear" href="#NNlib.upsample_bilinear"><code>NNlib.upsample_bilinear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">upsample_bilinear(x::AbstractArray{T,4}, scale::NTuple{2,Real}; align_corners::Bool = true)
+true</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/upsample.jl#L120-L153">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.upsample_linear" href="#NNlib.upsample_linear"><code>NNlib.upsample_linear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">upsample_linear(x::AbstractArray{T,3}, scale::Real; align_corners::Bool = true)
+upsample_linear(x::AbstractArray{T,3}; size::Integer, align_corners::Bool = true)</code></pre><p>Upsamples the first dimension of the array <code>x</code> by the upsample provided <code>scale</code>, using linear interpolation. As an alternative to using <code>scale</code>, the resulting array <code>size</code> can be directly specified with a keyword argument.</p><p>The size of the output is equal to <code>(scale*S1, S2, S3)</code>, where <code>S1, S2, S3 = size(x)</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/upsample.jl#L207-L217">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.∇upsample_linear" href="#NNlib.∇upsample_linear"><code>NNlib.∇upsample_linear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">∇upsample_linear(Δ::AbstractArray{T,3}; size::Integer, align_corners::Bool = true) where T</code></pre><p><strong>Arguments</strong></p><ul><li><code>Δ</code>: Incoming gradient array, backpropagated from downstream layers</li><li><code>size</code>: Size of the image upsampled in the first place</li></ul><p><strong>Outputs</strong></p><ul><li><code>dx</code>: Downsampled version of <code>Δ</code></li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/upsample.jl#L247-L256">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.upsample_bilinear" href="#NNlib.upsample_bilinear"><code>NNlib.upsample_bilinear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">upsample_bilinear(x::AbstractArray{T,4}, scale::NTuple{2,Real}; align_corners::Bool = true)
 upsample_bilinear(x::AbstractArray{T,4}; size::NTuple{2,Integer}, align_corners::Bool = true)</code></pre><p>Upsamples the first 2 dimensions of the array <code>x</code> by the upsample factors stored in <code>scale</code>, using bilinear interpolation. As an alternative to using <code>scale</code>, the resulting image <code>size</code> can be directly specified with a keyword argument.</p><p>The size of the output is equal to <code>(scale[1]*S1, scale[2]*S2, S3, S4)</code>, where <code>S1, S2, S3, S4 = size(x)</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; x = reshape(Float32[1 2 3; 4 5 6], (2,3,1,1))
 2×3×1×1 Array{Float32, 4}:
 [:, :, 1, 1] =
@@ -217,10 +217,10 @@
  1.75  1.97222  2.19444  2.41667  2.63889     3.08333  3.30556  3.52778  3.75
  2.5   2.72222  2.94444  3.16667  3.38889     3.83333  4.05556  4.27778  4.5
  3.25  3.47222  3.69444  3.91667  4.13889     4.58333  4.80556  5.02778  5.25
- 4.0   4.22222  4.44444  4.66667  4.88889     5.33333  5.55556  5.77778  6.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/upsample.jl#L274-L314">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.∇upsample_bilinear" href="#NNlib.∇upsample_bilinear"><code>NNlib.∇upsample_bilinear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">∇upsample_bilinear(Δ::AbstractArray{T,4}; size::NTuple{2,Integer}, align_corners::Bool = true) where T</code></pre><p><strong>Arguments</strong></p><ul><li><code>Δ</code>: Incoming gradient array, backpropagated from downstream layers</li><li><code>size</code>: Lateral (W,H) size of the image upsampled in the first place</li></ul><p><strong>Outputs</strong></p><ul><li><code>dx</code>: Downsampled version of <code>Δ</code></li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/upsample.jl#L319-L328">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.upsample_trilinear" href="#NNlib.upsample_trilinear"><code>NNlib.upsample_trilinear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">upsample_trilinear(x::AbstractArray{T,5}, scale::NTuple{3,Real}; align_corners::Bool = true)
+ 4.0   4.22222  4.44444  4.66667  4.88889     5.33333  5.55556  5.77778  6.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/upsample.jl#L274-L314">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.∇upsample_bilinear" href="#NNlib.∇upsample_bilinear"><code>NNlib.∇upsample_bilinear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">∇upsample_bilinear(Δ::AbstractArray{T,4}; size::NTuple{2,Integer}, align_corners::Bool = true) where T</code></pre><p><strong>Arguments</strong></p><ul><li><code>Δ</code>: Incoming gradient array, backpropagated from downstream layers</li><li><code>size</code>: Lateral (W,H) size of the image upsampled in the first place</li></ul><p><strong>Outputs</strong></p><ul><li><code>dx</code>: Downsampled version of <code>Δ</code></li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/upsample.jl#L319-L328">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.upsample_trilinear" href="#NNlib.upsample_trilinear"><code>NNlib.upsample_trilinear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">upsample_trilinear(x::AbstractArray{T,5}, scale::NTuple{3,Real}; align_corners::Bool = true)
 upsample_trilinear(x::AbstractArray{T,5}; size::NTuple{3,Integer}, align_corners::Bool = true)</code></pre><p>Upsamples the first 3 dimensions of the array <code>x</code> by the upsample factors stored in <code>scale</code>, using trilinear interpolation. As an alternative to using <code>scale</code>, the resulting image <code>size</code> can be directly specified with a keyword argument.</p><p>The size of the output is equal to <code>(scale[1]*S1, scale[2]*S2, scale[3]*S3, S4, S5)</code>, where <code>S1, S2, S3, S4, S5 = size(x)</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia hljs">upsample_trilinear(x, (2, 3, 4))
 upsample_trilinear(x; size=(4, 9, 11))  # specify ouput size instead
-upsample_trilinear(x, (2.5, 3.5, pi))  # non-integer scaling factors are allowed</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/upsample.jl#L331-L349">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.∇upsample_trilinear" href="#NNlib.∇upsample_trilinear"><code>NNlib.∇upsample_trilinear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">∇upsample_trilinear(Δ::AbstractArray{T,5}; size::NTuple{3,Integer}, align_corners::Bool = true) where T</code></pre><p><strong>Arguments</strong></p><ul><li><code>Δ</code>: Incoming gradient array, backpropagated from downstream layers</li><li><code>size</code>: Lateral size &amp; depth (W,H,D) of the image upsampled in the first place</li></ul><p><strong>Outputs</strong></p><ul><li><code>dx</code>: Downsampled version of <code>Δ</code></li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/upsample.jl#L353-L362">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pixel_shuffle" href="#NNlib.pixel_shuffle"><code>NNlib.pixel_shuffle</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pixel_shuffle(x, r::Integer)</code></pre><p>Pixel shuffling operation, upscaling by a factor <code>r</code>.</p><p>For 4-arrays representing <code>N</code> images, the operation converts input <code>size(x) == (W, H, r^2*C, N)</code> to output of size <code>(r*W, r*H, C, N)</code>. For <code>D</code>-dimensional data, it expects <code>ndims(x) == D+2</code> with channel and batch dimensions, and divides the number of channels by <code>r^D</code>.</p><p>Used in super-resolution networks to upsample towards high resolution features. Reference: Shi et. al., &quot;Real-Time Single Image and Video Super-Resolution ...&quot;, CVPR 2016, https://arxiv.org/abs/1609.05158</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; x = [10i + j + channel/10 for i in 1:2, j in 1:3, channel in 1:4, batch in 1:1]
+upsample_trilinear(x, (2.5, 3.5, pi))  # non-integer scaling factors are allowed</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/upsample.jl#L331-L349">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.∇upsample_trilinear" href="#NNlib.∇upsample_trilinear"><code>NNlib.∇upsample_trilinear</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">∇upsample_trilinear(Δ::AbstractArray{T,5}; size::NTuple{3,Integer}, align_corners::Bool = true) where T</code></pre><p><strong>Arguments</strong></p><ul><li><code>Δ</code>: Incoming gradient array, backpropagated from downstream layers</li><li><code>size</code>: Lateral size &amp; depth (W,H,D) of the image upsampled in the first place</li></ul><p><strong>Outputs</strong></p><ul><li><code>dx</code>: Downsampled version of <code>Δ</code></li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/upsample.jl#L353-L362">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.pixel_shuffle" href="#NNlib.pixel_shuffle"><code>NNlib.pixel_shuffle</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">pixel_shuffle(x, r::Integer)</code></pre><p>Pixel shuffling operation, upscaling by a factor <code>r</code>.</p><p>For 4-arrays representing <code>N</code> images, the operation converts input <code>size(x) == (W, H, r^2*C, N)</code> to output of size <code>(r*W, r*H, C, N)</code>. For <code>D</code>-dimensional data, it expects <code>ndims(x) == D+2</code> with channel and batch dimensions, and divides the number of channels by <code>r^D</code>.</p><p>Used in super-resolution networks to upsample towards high resolution features. Reference: Shi et. al., &quot;Real-Time Single Image and Video Super-Resolution ...&quot;, CVPR 2016, https://arxiv.org/abs/1609.05158</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; x = [10i + j + channel/10 for i in 1:2, j in 1:3, channel in 1:4, batch in 1:1]
 2×3×4×1 Array{Float64, 4}:
 [:, :, 1, 1] =
  11.1  12.1  13.1
@@ -261,7 +261,7 @@
  2.1  2.3  2.5
  2.2  2.4  2.6
  3.1  3.3  3.5
- 3.2  3.4  3.6</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/upsample.jl#L1-L59">source</a></section></article><h2 id="Batched-Operations"><a class="docs-heading-anchor" href="#Batched-Operations">Batched Operations</a><a id="Batched-Operations-1"></a><a class="docs-heading-anchor-permalink" href="#Batched-Operations" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.Bilinear"><code>Flux.Bilinear</code></a> layer uses <a href="#NNlib.batched_mul"><code>NNlib.batched_mul</code></a> internally.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_mul" href="#NNlib.batched_mul"><code>NNlib.batched_mul</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_mul(A, B) -&gt; C
+ 3.2  3.4  3.6</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/upsample.jl#L1-L59">source</a></section></article><h2 id="Batched-Operations"><a class="docs-heading-anchor" href="#Batched-Operations">Batched Operations</a><a id="Batched-Operations-1"></a><a class="docs-heading-anchor-permalink" href="#Batched-Operations" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.Bilinear"><code>Flux.Bilinear</code></a> layer uses <a href="#NNlib.batched_mul"><code>NNlib.batched_mul</code></a> internally.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_mul" href="#NNlib.batched_mul"><code>NNlib.batched_mul</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_mul(A, B) -&gt; C
 A ⊠ B  # \boxtimes</code></pre><p>Batched matrix multiplication. Result has <code>C[:,:,k...] == A[:,:,k...] * B[:,:,k...]</code> where <code>k...</code> represent  any indices in the last dimensions.</p><p>If <code>ndims(A) == ndims(B) == 3</code> and <code>size(B,3) == 1</code> then instead <code>C[:,:,k] == A[:,:,k] * B[:,:,1]</code>, and similarly for <code>A</code>.</p><p>To transpose each matrix, apply <code>batched_transpose</code> to the array, or <code>batched_adjoint</code> for conjugate-transpose:</p><pre><code class="language-julia-repl hljs">julia&gt; A, B = randn(2,5,17), randn(5,9,17);
 
 julia&gt; A ⊠ B |&gt; size
@@ -277,7 +277,7 @@
 (2, 9, 17)
 
 julia&gt; batched_transpose(A) == PermutedDimsArray(A, (2,1,3))
-true</code></pre><p>The equivalent <code>PermutedDimsArray</code> may be used in place of <code>batched_transpose</code>. Other permutations are also handled by BLAS, provided that the batch index <code>k</code> is not the first dimension of the underlying array. Thus <code>PermutedDimsArray(::Array, (1,3,2))</code> and <code>PermutedDimsArray(::Array, (3,1,2))</code> are fine.</p><p>However, <code>A = PermutedDimsArray(::Array, (3,2,1))</code> is not acceptable to BLAS, since the batch dimension is the contiguous one: <code>stride(A,3) == 1</code>. This will be copied, as doing so is faster than <code>batched_mul_generic!</code>.</p><p>Both this <code>copy</code> and <code>batched_mul_generic!</code> produce <code>@debug</code> messages, and setting for instance <code>ENV[&quot;JULIA_DEBUG&quot;] = NNlib</code> will display them.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/batched/batchedmul.jl#L4-L46">source</a></section><section><div><pre><code class="language-julia hljs">batched_mul(A::Array{T,3}, B::Matrix)
+true</code></pre><p>The equivalent <code>PermutedDimsArray</code> may be used in place of <code>batched_transpose</code>. Other permutations are also handled by BLAS, provided that the batch index <code>k</code> is not the first dimension of the underlying array. Thus <code>PermutedDimsArray(::Array, (1,3,2))</code> and <code>PermutedDimsArray(::Array, (3,1,2))</code> are fine.</p><p>However, <code>A = PermutedDimsArray(::Array, (3,2,1))</code> is not acceptable to BLAS, since the batch dimension is the contiguous one: <code>stride(A,3) == 1</code>. This will be copied, as doing so is faster than <code>batched_mul_generic!</code>.</p><p>Both this <code>copy</code> and <code>batched_mul_generic!</code> produce <code>@debug</code> messages, and setting for instance <code>ENV[&quot;JULIA_DEBUG&quot;] = NNlib</code> will display them.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/batched/batchedmul.jl#L4-L46">source</a></section><section><div><pre><code class="language-julia hljs">batched_mul(A::Array{T,3}, B::Matrix)
 batched_mul(A::Matrix, B::Array{T,3})
 A ⊠ B</code></pre><p>This is always matrix-matrix multiplication, but either <code>A</code> or <code>B</code> may lack a batch index.</p><ul><li><p>When <code>B</code> is a matrix, result has <code>C[:,:,k] == A[:,:,k] * B[:,:]</code> for all <code>k</code>.</p></li><li><p>When <code>A</code> is a matrix, then <code>C[:,:,k] == A[:,:] * B[:,:,k]</code>. This can also be done by reshaping and calling <code>*</code>, for instance <code>A ⊡ B</code> using TensorCore.jl, but is implemented here using <code>batched_gemm</code> instead of <code>gemm</code>.</p></li></ul><pre><code class="language-julia-repl hljs">julia&gt; randn(16,8,32) ⊠ randn(8,4) |&gt; size
 (16, 4, 32)
@@ -286,19 +286,19 @@
 (16, 4, 32)
 
 julia&gt; randn(16,8) ⊠ randn(8,4,32) |&gt; size
-(16, 4, 32)</code></pre><p>See also <code>batched_vec</code> to regard <code>B</code> as a batch of vectors, <code>A[:,:,k] * B[:,k]</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/batched/batchedmul.jl#L112-L139">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_mul!" href="#NNlib.batched_mul!"><code>NNlib.batched_mul!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_mul!(C, A, B) -&gt; C
-batched_mul!(C, A, B, α=1, β=0)</code></pre><p>In-place batched matrix multiplication, equivalent to <code>mul!(C[:,:,k], A[:,:,k], B[:,:,k], α, β)</code> for all <code>k</code>. If <code>size(B,3) == 1</code> then every batch uses <code>B[:,:,1]</code> instead.</p><p>This will call <code>batched_gemm!</code> whenever possible. For real arrays this means that, for <code>X ∈ [A,B,C]</code>, either <code>stride(X,1)==1</code> or <code>stride(X,2)==1</code>, the latter may be caused by <code>batched_transpose</code> or by for instance <code>PermutedDimsArray(::Array, (3,1,2))</code>. Unlike <code>batched_mul</code> this will never make a copy.</p><p>For complex arrays, the wrapper made by <code>batched_adjoint</code> must be outermost to be seen. In this case the strided accepted by BLAS are more restricted, if <code>stride(C,1)==1</code> then only <code>stride(AorB::BatchedAdjoint,2) == 1</code> is accepted.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/batched/batchedmul.jl#L197-L213">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_adjoint" href="#NNlib.batched_adjoint"><code>NNlib.batched_adjoint</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_transpose(A::AbstractArray{T,3})
+(16, 4, 32)</code></pre><p>See also <code>batched_vec</code> to regard <code>B</code> as a batch of vectors, <code>A[:,:,k] * B[:,k]</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/batched/batchedmul.jl#L112-L139">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_mul!" href="#NNlib.batched_mul!"><code>NNlib.batched_mul!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_mul!(C, A, B) -&gt; C
+batched_mul!(C, A, B, α=1, β=0)</code></pre><p>In-place batched matrix multiplication, equivalent to <code>mul!(C[:,:,k], A[:,:,k], B[:,:,k], α, β)</code> for all <code>k</code>. If <code>size(B,3) == 1</code> then every batch uses <code>B[:,:,1]</code> instead.</p><p>This will call <code>batched_gemm!</code> whenever possible. For real arrays this means that, for <code>X ∈ [A,B,C]</code>, either <code>stride(X,1)==1</code> or <code>stride(X,2)==1</code>, the latter may be caused by <code>batched_transpose</code> or by for instance <code>PermutedDimsArray(::Array, (3,1,2))</code>. Unlike <code>batched_mul</code> this will never make a copy.</p><p>For complex arrays, the wrapper made by <code>batched_adjoint</code> must be outermost to be seen. In this case the strided accepted by BLAS are more restricted, if <code>stride(C,1)==1</code> then only <code>stride(AorB::BatchedAdjoint,2) == 1</code> is accepted.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/batched/batchedmul.jl#L197-L213">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_adjoint" href="#NNlib.batched_adjoint"><code>NNlib.batched_adjoint</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_transpose(A::AbstractArray{T,3})
 batched_adjoint(A)</code></pre><p>Equivalent to applying <code>transpose</code> or <code>adjoint</code> to each matrix <code>A[:,:,k]</code>.</p><p>These exist to control how <code>batched_mul</code> behaves, as it operates on such matrix slices of an array with <code>ndims(A)==3</code>.</p><p><code>PermutedDimsArray(A, (2,1,3))</code> is equivalent to <code>batched_transpose(A)</code>, and is also understood by <code>batched_mul</code> (and more widely supported elsewhere).</p><pre><code class="nohighlight hljs">BatchedTranspose{T, S} &lt;: AbstractBatchedMatrix{T, 3}
-BatchedAdjoint{T, S}</code></pre><p>Lazy wrappers analogous to <code>Transpose</code> and <code>Adjoint</code>, returned by <code>batched_transpose</code> etc.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/batched/batchedadjtrans.jl#L38-L54">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_transpose" href="#NNlib.batched_transpose"><code>NNlib.batched_transpose</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_transpose(A::AbstractArray{T,3})
+BatchedAdjoint{T, S}</code></pre><p>Lazy wrappers analogous to <code>Transpose</code> and <code>Adjoint</code>, returned by <code>batched_transpose</code> etc.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/batched/batchedadjtrans.jl#L38-L54">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_transpose" href="#NNlib.batched_transpose"><code>NNlib.batched_transpose</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_transpose(A::AbstractArray{T,3})
 batched_adjoint(A)</code></pre><p>Equivalent to applying <code>transpose</code> or <code>adjoint</code> to each matrix <code>A[:,:,k]</code>.</p><p>These exist to control how <code>batched_mul</code> behaves, as it operates on such matrix slices of an array with <code>ndims(A)==3</code>.</p><p><code>PermutedDimsArray(A, (2,1,3))</code> is equivalent to <code>batched_transpose(A)</code>, and is also understood by <code>batched_mul</code> (and more widely supported elsewhere).</p><pre><code class="nohighlight hljs">BatchedTranspose{T, S} &lt;: AbstractBatchedMatrix{T, 3}
-BatchedAdjoint{T, S}</code></pre><p>Lazy wrappers analogous to <code>Transpose</code> and <code>Adjoint</code>, returned by <code>batched_transpose</code> etc.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/batched/batchedadjtrans.jl#L28-L44">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_vec" href="#NNlib.batched_vec"><code>NNlib.batched_vec</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_vec(A::Array{T,3}, B::Matrix)
+BatchedAdjoint{T, S}</code></pre><p>Lazy wrappers analogous to <code>Transpose</code> and <code>Adjoint</code>, returned by <code>batched_transpose</code> etc.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/batched/batchedadjtrans.jl#L28-L44">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.batched_vec" href="#NNlib.batched_vec"><code>NNlib.batched_vec</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">batched_vec(A::Array{T,3}, B::Matrix)
 batched_vec(A::Array{T,3}, b::Vector)</code></pre><p>Batched matrix-vector multiplication: the result has <code>C[:,:,k] == A[:,:,k] * B[:,k]</code> for all <code>k</code>, or else <code>C[:,:,k] == A[:,:,k] * b</code> for <code>b::Vector</code>.</p><p>With the same argument types, <code>batched_mul(A, B)</code> would regard <code>B</code> as a fixed matrix, not a batch of vectors. Both reshape and then call <code>batched_mul(::Array{T,3}, ::Array{T,3})</code>.</p><pre><code class="language-julia-repl hljs">julia&gt; A, B, b = randn(16,8,32), randn(8,32), randn(8);
 
 julia&gt; batched_vec(A,B) |&gt; size
 (16, 32)
 
 julia&gt; batched_vec(A,b) |&gt; size
-(16, 32)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/batched/batchedmul.jl#L164-L185">source</a></section></article><h2 id="Gather-and-Scatter"><a class="docs-heading-anchor" href="#Gather-and-Scatter">Gather and Scatter</a><a id="Gather-and-Scatter-1"></a><a class="docs-heading-anchor-permalink" href="#Gather-and-Scatter" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.Embedding"><code>Embedding</code></a> layer uses <a href="#NNlib.gather"><code>NNlib.gather</code></a> as its backend.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.gather" href="#NNlib.gather"><code>NNlib.gather</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">NNlib.gather(src, idx) -&gt; dst</code></pre><p>Reverse operation of <a href="#NNlib.scatter"><code>scatter</code></a>. Gathers data from source <code>src</code> and writes it in a destination <code>dst</code> according to the index array <code>idx</code>. For each <code>k</code> in <code>CartesianIndices(idx)</code>, assign values to <code>dst</code> according to</p><pre><code class="nohighlight hljs">dst[:, ... , k] .= src[:, ... , idx[k]...]</code></pre><p>Notice that if <code>idx</code> is a vector containing integers and <code>src</code> is a matrix, previous expression simplifies to</p><pre><code class="nohighlight hljs">dst[:, k] .= src[:, idx[k]]</code></pre><p>and <code>k</code> will run over <code>1:length(idx)</code>.</p><p>The elements of <code>idx</code> can be integers or integer tuples and may be repeated. A single <code>src</code> column can end up being copied into zero, one, or multiple <code>dst</code> columns.</p><p>See <a href="#NNlib.gather!"><code>gather!</code></a> for an in-place version.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; NNlib.gather([1,20,300,4000], [2,4,2])
+(16, 32)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/batched/batchedmul.jl#L164-L185">source</a></section></article><h2 id="Gather-and-Scatter"><a class="docs-heading-anchor" href="#Gather-and-Scatter">Gather and Scatter</a><a id="Gather-and-Scatter-1"></a><a class="docs-heading-anchor-permalink" href="#Gather-and-Scatter" title="Permalink"></a></h2><p><code>Flux</code>&#39;s <a href="../layers/#Flux.Embedding"><code>Embedding</code></a> layer uses <a href="#NNlib.gather"><code>NNlib.gather</code></a> as its backend.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.gather" href="#NNlib.gather"><code>NNlib.gather</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">NNlib.gather(src, idx) -&gt; dst</code></pre><p>Reverse operation of <a href="#NNlib.scatter"><code>scatter</code></a>. Gathers data from source <code>src</code> and writes it in a destination <code>dst</code> according to the index array <code>idx</code>. For each <code>k</code> in <code>CartesianIndices(idx)</code>, assign values to <code>dst</code> according to</p><pre><code class="nohighlight hljs">dst[:, ... , k] .= src[:, ... , idx[k]...]</code></pre><p>Notice that if <code>idx</code> is a vector containing integers and <code>src</code> is a matrix, previous expression simplifies to</p><pre><code class="nohighlight hljs">dst[:, k] .= src[:, idx[k]]</code></pre><p>and <code>k</code> will run over <code>1:length(idx)</code>.</p><p>The elements of <code>idx</code> can be integers or integer tuples and may be repeated. A single <code>src</code> column can end up being copied into zero, one, or multiple <code>dst</code> columns.</p><p>See <a href="#NNlib.gather!"><code>gather!</code></a> for an in-place version.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; NNlib.gather([1,20,300,4000], [2,4,2])
 3-element Vector{Int64}:
    20
  4000
@@ -307,7 +307,7 @@
 julia&gt; NNlib.gather([1 2 3; 4 5 6], [1,3,1,3,1])
 2×5 Matrix{Int64}:
  1  3  1  3  1
- 4  6  4  6  4</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/gather.jl#L1-L39">source</a></section><section><div><pre><code class="language-julia hljs">gather(src, IJK...)</code></pre><p>Convert the tuple of integer vectors <code>IJK</code> to a tuple of <code>CartesianIndex</code> and call <code>gather</code> on it: <code>gather(src, CartesianIndex.(IJK...))</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; src = reshape([1:15;], 3, 5)
+ 4  6  4  6  4</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/gather.jl#L1-L39">source</a></section><section><div><pre><code class="language-julia hljs">gather(src, IJK...)</code></pre><p>Convert the tuple of integer vectors <code>IJK</code> to a tuple of <code>CartesianIndex</code> and call <code>gather</code> on it: <code>gather(src, CartesianIndex.(IJK...))</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; src = reshape([1:15;], 3, 5)
 3×5 Matrix{Int64}:
  1  4  7  10  13
  2  5  8  11  14
@@ -316,7 +316,7 @@
 julia&gt; NNlib.gather(src, [1, 2], [2, 4])
 2-element Vector{Int64}:
   4
- 11</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/gather.jl#L49-L69">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.gather!" href="#NNlib.gather!"><code>NNlib.gather!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">NNlib.gather!(dst, src, idx)</code></pre><p>Reverse operation of <a href="#NNlib.scatter!"><code>scatter!</code></a>. Gathers data from source <code>src</code> and writes it in destination <code>dst</code> according to the index array <code>idx</code>. For each <code>k</code> in <code>CartesianIndices(idx)</code>, assign values to <code>dst</code> according to</p><pre><code class="nohighlight hljs">dst[:, ... , k] .= src[:, ... , idx[k]...]</code></pre><p>Notice that if <code>idx</code> is a vector containing integers, and both <code>dst</code> and <code>src</code> are matrices, previous expression simplifies to</p><pre><code class="nohighlight hljs">dst[:, k] .= src[:, idx[k]]</code></pre><p>and <code>k</code> will run over <code>1:length(idx)</code>.</p><p>The elements of <code>idx</code> can be integers or integer tuples and may be repeated. A single <code>src</code> column can end up being copied into zero, one, or multiple <code>dst</code> columns.</p><p>See <a href="#NNlib.gather"><code>gather</code></a> for an allocating version.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/gather.jl#L81-L102">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.scatter" href="#NNlib.scatter"><code>NNlib.scatter</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">NNlib.scatter(op, src, idx; [init, dstsize])</code></pre><p>Scatter operation allocating a destination array <code>dst</code> and calling <code>scatter!(op, dst, src, idx)</code> on it.</p><ul><li><p>If keyword <code>init</code> is provided, it is used to initialize the content of <code>dst</code>. Otherwise, the init values is inferred from the reduction operator <code>op</code> for some common operators (e.g. <code>init = 0</code> for <code>op = +</code>).</p></li><li><p>If <code>dstsize</code> is provided, it will be used to define the size of destination array, otherwise it will be inferred by <code>src</code> and <code>idx</code>.</p></li></ul><p>See <a href="#NNlib.scatter!"><code>scatter!</code></a> for full details on how <code>idx</code> works.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; NNlib.scatter(+, [10,100,1000], [3,1,2])
+ 11</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/gather.jl#L49-L69">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.gather!" href="#NNlib.gather!"><code>NNlib.gather!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">NNlib.gather!(dst, src, idx)</code></pre><p>Reverse operation of <a href="#NNlib.scatter!"><code>scatter!</code></a>. Gathers data from source <code>src</code> and writes it in destination <code>dst</code> according to the index array <code>idx</code>. For each <code>k</code> in <code>CartesianIndices(idx)</code>, assign values to <code>dst</code> according to</p><pre><code class="nohighlight hljs">dst[:, ... , k] .= src[:, ... , idx[k]...]</code></pre><p>Notice that if <code>idx</code> is a vector containing integers, and both <code>dst</code> and <code>src</code> are matrices, previous expression simplifies to</p><pre><code class="nohighlight hljs">dst[:, k] .= src[:, idx[k]]</code></pre><p>and <code>k</code> will run over <code>1:length(idx)</code>.</p><p>The elements of <code>idx</code> can be integers or integer tuples and may be repeated. A single <code>src</code> column can end up being copied into zero, one, or multiple <code>dst</code> columns.</p><p>See <a href="#NNlib.gather"><code>gather</code></a> for an allocating version.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/gather.jl#L81-L102">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.scatter" href="#NNlib.scatter"><code>NNlib.scatter</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">NNlib.scatter(op, src, idx; [init, dstsize])</code></pre><p>Scatter operation allocating a destination array <code>dst</code> and calling <code>scatter!(op, dst, src, idx)</code> on it.</p><ul><li><p>If keyword <code>init</code> is provided, it is used to initialize the content of <code>dst</code>. Otherwise, the init values is inferred from the reduction operator <code>op</code> for some common operators (e.g. <code>init = 0</code> for <code>op = +</code>).</p></li><li><p>If <code>dstsize</code> is provided, it will be used to define the size of destination array, otherwise it will be inferred by <code>src</code> and <code>idx</code>.</p></li></ul><p>See <a href="#NNlib.scatter!"><code>scatter!</code></a> for full details on how <code>idx</code> works.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; NNlib.scatter(+, [10,100,1000], [3,1,2])
 3-element Vector{Int64}:
   100
  1000
@@ -334,7 +334,7 @@
     10
   2000
     10
-    10</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/scatter.jl#L136-L173">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.scatter!" href="#NNlib.scatter!"><code>NNlib.scatter!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">NNlib.scatter!(op, dst, src, idx)</code></pre><p>Scatter operation, which writes data in <code>src</code> into <code>dst</code> at locations <code>idx</code>. A binary reduction operator <code>op</code> is applied during the scatter. For each index <code>k</code> in <code>idx</code>, accumulates values in <code>dst</code> according to</p><pre><code class="nohighlight hljs">dst[:, ..., idx[k]...] = (op).(dst[:, ..., idx[k]...], src[:, ..., k...])</code></pre><p>See also <a href="#NNlib.scatter"><code>scatter</code></a>, <a href="#NNlib.gather"><code>gather</code></a>.</p><p><strong>Arguments</strong></p><ul><li><code>op</code>: Operations to be applied on <code>dst</code> and <code>src</code>, e.g. <code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>, <code>max</code>, <code>min</code> and <code>mean</code>.</li><li><code>dst</code>: The destination for <code>src</code> to aggregate to. This argument will be mutated.</li><li><code>src</code>: The source data for aggregating.</li><li><code>idx</code>: The mapping for aggregation from source (index) to destination (value).        The <code>idx</code> array can contain either integers or tuples.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; NNlib.scatter!(+, ones(3), [10,100], [1,3])
+    10</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/scatter.jl#L136-L173">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.scatter!" href="#NNlib.scatter!"><code>NNlib.scatter!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">NNlib.scatter!(op, dst, src, idx)</code></pre><p>Scatter operation, which writes data in <code>src</code> into <code>dst</code> at locations <code>idx</code>. A binary reduction operator <code>op</code> is applied during the scatter. For each index <code>k</code> in <code>idx</code>, accumulates values in <code>dst</code> according to</p><pre><code class="nohighlight hljs">dst[:, ..., idx[k]...] = (op).(dst[:, ..., idx[k]...], src[:, ..., k...])</code></pre><p>See also <a href="#NNlib.scatter"><code>scatter</code></a>, <a href="#NNlib.gather"><code>gather</code></a>.</p><p><strong>Arguments</strong></p><ul><li><code>op</code>: Operations to be applied on <code>dst</code> and <code>src</code>, e.g. <code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>, <code>max</code>, <code>min</code> and <code>mean</code>.</li><li><code>dst</code>: The destination for <code>src</code> to aggregate to. This argument will be mutated.</li><li><code>src</code>: The source data for aggregating.</li><li><code>idx</code>: The mapping for aggregation from source (index) to destination (value).        The <code>idx</code> array can contain either integers or tuples.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; NNlib.scatter!(+, ones(3), [10,100], [1,3])
 3-element Vector{Float64}:
   11.0
    1.0
@@ -343,7 +343,7 @@
 julia&gt; NNlib.scatter!(*, fill(0.5, 2, 4), [1 10; 100 1000], [3,2])
 2×4 Matrix{Float64}:
  0.5    5.0   0.5  0.5
- 0.5  500.0  50.0  0.5</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/scatter.jl#L40-L72">source</a></section></article><h2 id="Sampling"><a class="docs-heading-anchor" href="#Sampling">Sampling</a><a id="Sampling-1"></a><a class="docs-heading-anchor-permalink" href="#Sampling" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.grid_sample" href="#NNlib.grid_sample"><code>NNlib.grid_sample</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">grid_sample(input::AbstractArray{T, 4}, grid::AbstractArray{T, 4}; padding_mode = :zeros)</code></pre><p>Given <code>input</code>, compute output by sampling <code>input</code> values at pixel locations from <code>grid</code>. Uses bilinear interpolation to calculate output values.</p><p>This implementation assumes the extrema (<code>-1</code> and <code>1</code>) are considered as referring to the center points of the input’s corner pixels (i.e. align corners is <code>true</code>).</p><p><strong>Arguments</strong></p><ul><li><p><code>input</code>: Input array in <code>(W_in, H_in, C, N)</code> shape.</p></li><li><p><code>grid</code>: Input grid in <code>(2, W_out, H_out, N)</code> shape.   Where for each <code>(W_out, H_out, N)</code> grid contains <code>(x, y)</code>   coordinates that specify sampling locations normalized by the <code>input</code> shape.</p><p>Therefore, <code>x</code> and <code>y</code> should have values in <code>[-1, 1]</code> range.   For example, <code>(x = -1, y = -1)</code> is the left-top pixel of <code>input</code>,   and <code>(x = 1, y = 1)</code> is the right-bottom pixel of <code>input</code>.</p><p>Out-of-bound values are handled according to the <code>padding_mode</code>.</p></li><li><p><code>padding_mode</code>: Out-of-bound padding.   <code>:zeros</code> to use <code>0</code> for out-of-bound grid locations.   <code>:border</code> to use border values for out-of-bound grid locations.   Default is <code>:zeros</code>.</p></li></ul><p><strong>Returns</strong></p><p><code>(W_out, H_out, C, N)</code> sampled grid from <code>input</code>.</p><p><strong>Examples</strong></p><p>In the example below, grid contains two out-of-bound sampling locations, which are handled differently, depending on the <code>padding_mode</code>.</p><pre><code class="language-julia-repl hljs">julia&gt; x = reshape(collect(1.0:4.0), (2, 2, 1, 1))
+ 0.5  500.0  50.0  0.5</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/scatter.jl#L40-L72">source</a></section></article><h2 id="Sampling"><a class="docs-heading-anchor" href="#Sampling">Sampling</a><a id="Sampling-1"></a><a class="docs-heading-anchor-permalink" href="#Sampling" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.grid_sample" href="#NNlib.grid_sample"><code>NNlib.grid_sample</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">grid_sample(input::AbstractArray{T, 4}, grid::AbstractArray{T, 4}; padding_mode = :zeros)</code></pre><p>Given <code>input</code>, compute output by sampling <code>input</code> values at pixel locations from <code>grid</code>. Uses bilinear interpolation to calculate output values.</p><p>This implementation assumes the extrema (<code>-1</code> and <code>1</code>) are considered as referring to the center points of the input’s corner pixels (i.e. align corners is <code>true</code>).</p><p><strong>Arguments</strong></p><ul><li><p><code>input</code>: Input array in <code>(W_in, H_in, C, N)</code> shape.</p></li><li><p><code>grid</code>: Input grid in <code>(2, W_out, H_out, N)</code> shape.   Where for each <code>(W_out, H_out, N)</code> grid contains <code>(x, y)</code>   coordinates that specify sampling locations normalized by the <code>input</code> shape.</p><p>Therefore, <code>x</code> and <code>y</code> should have values in <code>[-1, 1]</code> range.   For example, <code>(x = -1, y = -1)</code> is the left-top pixel of <code>input</code>,   and <code>(x = 1, y = 1)</code> is the right-bottom pixel of <code>input</code>.</p><p>Out-of-bound values are handled according to the <code>padding_mode</code>.</p></li><li><p><code>padding_mode</code>: Out-of-bound padding.   <code>:zeros</code> to use <code>0</code> for out-of-bound grid locations.   <code>:border</code> to use border values for out-of-bound grid locations.   Default is <code>:zeros</code>.</p></li></ul><p><strong>Returns</strong></p><p><code>(W_out, H_out, C, N)</code> sampled grid from <code>input</code>.</p><p><strong>Examples</strong></p><p>In the example below, grid contains two out-of-bound sampling locations, which are handled differently, depending on the <code>padding_mode</code>.</p><pre><code class="language-julia-repl hljs">julia&gt; x = reshape(collect(1.0:4.0), (2, 2, 1, 1))
 2×2×1×1 Array{Float64, 4}:
 [:, :, 1, 1] =
  1.0  3.0
@@ -375,4 +375,4 @@
 [:, :, 1, 1] =
  1.0  3.0
  1.5  3.5
- 2.0  4.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/sampling.jl#L26-L97">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.∇grid_sample" href="#NNlib.∇grid_sample"><code>NNlib.∇grid_sample</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">∇grid_sample(Δ::AbstractArray{T, 4}, input::AbstractArray{T, 4}, grid::AbstractArray{T, 4}; padding_mode = :zeros) where T</code></pre><p><strong>Arguments</strong></p><ul><li><code>Δ</code>: Input gradient in <code>(W_out, H_out, C, N)</code> shape   (same as output of the primal computation).</li><li><code>input</code>: Input from primal computation in <code>(W_in, H_in, C, N)</code> shape.</li><li><code>grid</code>: Grid from primal computation in <code>(2, W_out, H_out, N)</code> shape.</li><li><code>padding_mode</code>: Out-of-bound padding.   <code>:zeros</code> to use <code>0</code> for out-of-bound grid locations.   <code>:border</code> to use border values for out-of-bound grid locations.   Should be the same as in primal computation.   Default is <code>:zeros</code>.</li></ul><p><strong>Returns</strong></p><p><code>dinput</code> (same shape as <code>input</code>) and <code>dgrid</code> (same shape as <code>grid</code>) gradients.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/sampling.jl#L152-L170">source</a></section></article><h2 id="Losses"><a class="docs-heading-anchor" href="#Losses">Losses</a><a id="Losses-1"></a><a class="docs-heading-anchor-permalink" href="#Losses" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.ctc_loss" href="#NNlib.ctc_loss"><code>NNlib.ctc_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">ctc_loss(ŷ, y)</code></pre><p>Computes the connectionist temporal classification loss between <code>ŷ</code> and <code>y</code>. <code>ŷ</code> must be a classes-by-time matrices, i.e., each row represents a class and each column represents a time step. Additionally, the <code>logsoftmax</code> function will be applied to <code>ŷ</code>, so <code>ŷ</code> must be the raw activation values from the neural network and not, for example, the activations after being passed through a <code>softmax</code> activation function. <code>y</code> must be a 1D array of the labels associated with <code>ŷ</code>. The blank label is assumed to be the last label category in <code>ŷ</code>, so it is equivalent to <code>size(ŷ, 1)</code>. Used for sequence-to-sequence classification problems such as speech recognition and handwriting recognition where the exact time-alignment of the output (e.g., letters) is not needed to solve the problem. See <a href="https://www.cs.toronto.edu/~graves/icml_2006.pdf">Graves et al. (2006)</a> or <a href="https://www.cs.toronto.edu/~graves/preprint.pdf#chapter.7">Graves (2012)</a> for mathematical details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/ctc.jl#L108-L127">source</a></section></article><h2 id="Miscellaneous"><a class="docs-heading-anchor" href="#Miscellaneous">Miscellaneous</a><a id="Miscellaneous-1"></a><a class="docs-heading-anchor-permalink" href="#Miscellaneous" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.logsumexp" href="#NNlib.logsumexp"><code>NNlib.logsumexp</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logsumexp(x; dims = :)</code></pre><p>Computes <code>log.(sum(exp.(x); dims))</code> in a numerically stable way. Without <code>dims</code> keyword this returns a scalar.</p><p>See also <a href="#NNlib.logsoftmax"><code>logsoftmax</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/softmax.jl#L134-L141">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.glu" href="#NNlib.glu"><code>NNlib.glu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">glu(x, dim = 1)</code></pre><p>The gated linear unit from the <a href="https://arxiv.org/abs/1612.08083">&quot;Language Modeling with Gated Convolutional Networks&quot;</a> paper.</p><p>Calculates <code>a .* sigmoid(b)</code>, where <code>x</code> is split in half along given dimension <code>dim</code> to form <code>a</code> and <code>b</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.21/src/functions.jl#L1-L7">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../data/onehot/">« OneHotArrays.jl</a><a class="docs-footer-nextpage" href="../functors/">Nested Structures – Functors.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+ 2.0  4.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/sampling.jl#L26-L97">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.∇grid_sample" href="#NNlib.∇grid_sample"><code>NNlib.∇grid_sample</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">∇grid_sample(Δ::AbstractArray{T, 4}, input::AbstractArray{T, 4}, grid::AbstractArray{T, 4}; padding_mode = :zeros) where T</code></pre><p><strong>Arguments</strong></p><ul><li><code>Δ</code>: Input gradient in <code>(W_out, H_out, C, N)</code> shape   (same as output of the primal computation).</li><li><code>input</code>: Input from primal computation in <code>(W_in, H_in, C, N)</code> shape.</li><li><code>grid</code>: Grid from primal computation in <code>(2, W_out, H_out, N)</code> shape.</li><li><code>padding_mode</code>: Out-of-bound padding.   <code>:zeros</code> to use <code>0</code> for out-of-bound grid locations.   <code>:border</code> to use border values for out-of-bound grid locations.   Should be the same as in primal computation.   Default is <code>:zeros</code>.</li></ul><p><strong>Returns</strong></p><p><code>dinput</code> (same shape as <code>input</code>) and <code>dgrid</code> (same shape as <code>grid</code>) gradients.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/sampling.jl#L152-L170">source</a></section></article><h2 id="Losses"><a class="docs-heading-anchor" href="#Losses">Losses</a><a id="Losses-1"></a><a class="docs-heading-anchor-permalink" href="#Losses" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.ctc_loss" href="#NNlib.ctc_loss"><code>NNlib.ctc_loss</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">ctc_loss(ŷ, y)</code></pre><p>Computes the connectionist temporal classification loss between <code>ŷ</code> and <code>y</code>. <code>ŷ</code> must be a classes-by-time matrices, i.e., each row represents a class and each column represents a time step. Additionally, the <code>logsoftmax</code> function will be applied to <code>ŷ</code>, so <code>ŷ</code> must be the raw activation values from the neural network and not, for example, the activations after being passed through a <code>softmax</code> activation function. <code>y</code> must be a 1D array of the labels associated with <code>ŷ</code>. The blank label is assumed to be the last label category in <code>ŷ</code>, so it is equivalent to <code>size(ŷ, 1)</code>. Used for sequence-to-sequence classification problems such as speech recognition and handwriting recognition where the exact time-alignment of the output (e.g., letters) is not needed to solve the problem. See <a href="https://www.cs.toronto.edu/~graves/icml_2006.pdf">Graves et al. (2006)</a> or <a href="https://www.cs.toronto.edu/~graves/preprint.pdf#chapter.7">Graves (2012)</a> for mathematical details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/ctc.jl#L108-L127">source</a></section></article><h2 id="Miscellaneous"><a class="docs-heading-anchor" href="#Miscellaneous">Miscellaneous</a><a id="Miscellaneous-1"></a><a class="docs-heading-anchor-permalink" href="#Miscellaneous" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.logsumexp" href="#NNlib.logsumexp"><code>NNlib.logsumexp</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">logsumexp(x; dims = :)</code></pre><p>Computes <code>log.(sum(exp.(x); dims))</code> in a numerically stable way. Without <code>dims</code> keyword this returns a scalar.</p><p>See also <a href="#NNlib.logsoftmax"><code>logsoftmax</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/softmax.jl#L134-L141">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="NNlib.glu" href="#NNlib.glu"><code>NNlib.glu</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">glu(x, dim = 1)</code></pre><p>The gated linear unit from the <a href="https://arxiv.org/abs/1612.08083">&quot;Language Modeling with Gated Convolutional Networks&quot;</a> paper.</p><p>Calculates <code>a .* sigmoid(b)</code>, where <code>x</code> is split in half along given dimension <code>dim</code> to form <code>a</code> and <code>b</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/v0.9.22/src/functions.jl#L1-L7">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../data/onehot/">« OneHotArrays.jl</a><a class="docs-footer-nextpage" href="../functors/">Nested Structures – Functors.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/outputsize/index.html b/dev/reference/outputsize/index.html
index 7c59b18372..a7479f5bb0 100644
--- a/dev/reference/outputsize/index.html
+++ b/dev/reference/outputsize/index.html
@@ -68,7 +68,7 @@
           # plus 2 non-trainable, 10 parameters, summarysize 10.469 KiB.
 
 julia&gt; outputsize(ans, (28, 28, 1, 32))
-(10, 32)</code></pre><p>Limitations:</p><ul><li>While <code>@autosize (5, 32) Flux.Bilinear(_ =&gt; 7)</code> is OK, something like <code>Bilinear((_, _) =&gt; 7)</code> will fail.</li><li>While <code>Scale(_)</code> and <code>LayerNorm(_)</code> are fine (and use the first dimension), <code>Scale(_,_)</code> and <code>LayerNorm(_,_)</code> will fail if <code>size(x,1) != size(x,2)</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/outputsize.jl#L162-L217">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.outputsize" href="#Flux.outputsize"><code>Flux.outputsize</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">outputsize(m, x_size, y_size, ...; padbatch=false)</code></pre><p>For model or layer <code>m</code> accepting multiple arrays as input, this returns <code>size(m((x, y, ...)))</code> given <code>size_x = size(x)</code>, etc.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; x, y = rand(Float32, 5, 64), rand(Float32, 7, 64);
+(10, 32)</code></pre><p>Limitations:</p><ul><li>While <code>@autosize (5, 32) Flux.Bilinear(_ =&gt; 7)</code> is OK, something like <code>Bilinear((_, _) =&gt; 7)</code> will fail.</li><li>While <code>Scale(_)</code> and <code>LayerNorm(_)</code> are fine (and use the first dimension), <code>Scale(_,_)</code> and <code>LayerNorm(_,_)</code> will fail if <code>size(x,1) != size(x,2)</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/outputsize.jl#L162-L217">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.outputsize" href="#Flux.outputsize"><code>Flux.outputsize</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">outputsize(m, x_size, y_size, ...; padbatch=false)</code></pre><p>For model or layer <code>m</code> accepting multiple arrays as input, this returns <code>size(m((x, y, ...)))</code> given <code>size_x = size(x)</code>, etc.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; x, y = rand(Float32, 5, 64), rand(Float32, 7, 64);
 
 julia&gt; par = Parallel(vcat, Dense(5 =&gt; 9), Dense(7 =&gt; 11));
 
@@ -81,4 +81,4 @@
 (13, 1)
 
 julia&gt; par(x, y) == par((x, y)) == Chain(par, identity)((x, y))
-true</code></pre><p>Notice that <code>Chain</code> only accepts multiple arrays as a tuple, while <code>Parallel</code> also accepts them as multiple arguments; <code>outputsize</code> always supplies the tuple.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/outputsize.jl#L101-L127">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../training/optimisers/">« Optimisation Rules</a><a class="docs-footer-nextpage" href="../destructure/">Flat vs. Nested »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+true</code></pre><p>Notice that <code>Chain</code> only accepts multiple arrays as a tuple, while <code>Parallel</code> also accepts them as multiple arguments; <code>outputsize</code> always supplies the tuple.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/outputsize.jl#L101-L127">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../training/optimisers/">« Optimisation Rules</a><a class="docs-footer-nextpage" href="../destructure/">Flat vs. Nested »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/training/callbacks/index.html b/dev/reference/training/callbacks/index.html
index 0513a6e2bb..a4c1609869 100644
--- a/dev/reference/training/callbacks/index.html
+++ b/dev/reference/training/callbacks/index.html
@@ -10,7 +10,7 @@
            sleep(1)
        end
 Flux
-Flux</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L513-L535">source</a></section></article><h2 id="Patience-Helpers"><a class="docs-heading-anchor" href="#Patience-Helpers">Patience Helpers</a><a id="Patience-Helpers-1"></a><a class="docs-heading-anchor-permalink" href="#Patience-Helpers" title="Permalink"></a></h2><p>Flux provides utilities for controlling your training procedure according to some monitored condition and a maximum <code>patience</code>. For example, you can use <code>early_stopping</code> to stop training when the model is converging or deteriorating, or you can use <code>plateau</code> to check if the model is stagnating.</p><p>For example, below we create a pseudo-loss function that decreases, bottoms out, and then increases. The early stopping trigger will break the loop before the loss increases too much.</p><pre><code class="language-julia hljs"># create a pseudo-loss that decreases for 4 calls, then starts increasing
+Flux</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L513-L535">source</a></section></article><h2 id="Patience-Helpers"><a class="docs-heading-anchor" href="#Patience-Helpers">Patience Helpers</a><a id="Patience-Helpers-1"></a><a class="docs-heading-anchor-permalink" href="#Patience-Helpers" title="Permalink"></a></h2><p>Flux provides utilities for controlling your training procedure according to some monitored condition and a maximum <code>patience</code>. For example, you can use <code>early_stopping</code> to stop training when the model is converging or deteriorating, or you can use <code>plateau</code> to check if the model is stagnating.</p><p>For example, below we create a pseudo-loss function that decreases, bottoms out, and then increases. The early stopping trigger will break the loop before the loss increases too much.</p><pre><code class="language-julia hljs"># create a pseudo-loss that decreases for 4 calls, then starts increasing
 # we call this like loss()
 loss = let t = 0
   () -&gt; begin
@@ -61,7 +61,7 @@
        end
 [ Info: Epoch 1
 [ Info: Epoch 2
-[ Info: Epoch 3</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L626-L649">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.early_stopping" href="#Flux.early_stopping"><code>Flux.early_stopping</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">early_stopping(f, delay; distance = -, init_score = 0, min_dist = 0)</code></pre><p>Return a function that internally counts by one when <code>distance(best_score, f(...)) &lt;= min_dist</code>, where <code>best_score</code> is the last seen best value of <code>f(...)</code>. If the count is greater than or equal to <code>delay</code>, the function returns <code>true</code>, otherwise it returns <code>false</code>. The count is reset when <code>distance(best_score, f(...)) &gt; min_dist</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; loss = let l = 0
+[ Info: Epoch 3</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L626-L649">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.early_stopping" href="#Flux.early_stopping"><code>Flux.early_stopping</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">early_stopping(f, delay; distance = -, init_score = 0, min_dist = 0)</code></pre><p>Return a function that internally counts by one when <code>distance(best_score, f(...)) &lt;= min_dist</code>, where <code>best_score</code> is the last seen best value of <code>f(...)</code>. If the count is greater than or equal to <code>delay</code>, the function returns <code>true</code>, otherwise it returns <code>false</code>. The count is reset when <code>distance(best_score, f(...)) &gt; min_dist</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; loss = let l = 0
          () -&gt; l += 1
        end; # pseudo loss function that returns increasing values
 
@@ -74,7 +74,7 @@
        end
 [ Info: Epoch 1
 [ Info: Epoch 2
-[ Info: Epoch 3</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L660-L687">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.plateau" href="#Flux.plateau"><code>Flux.plateau</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">plateau(f, width; distance = -, init_score = 0, min_dist = 1f-6)</code></pre><p>Return a function that internally counts by one when <code>abs(distance(last_score, f(...))) &lt;= min_dist</code>, where <code>last_score</code> holds the last value of <code>f(...)</code>. If the count is greater than or equal to <code>width</code>, the function returns <code>true</code>, otherwise it returns <code>false</code>. The count is reset when <code>abs(distance(last_score, f(...))) &gt; min_dist</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; f = let v = 10
+[ Info: Epoch 3</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L660-L687">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.plateau" href="#Flux.plateau"><code>Flux.plateau</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">plateau(f, width; distance = -, init_score = 0, min_dist = 1f-6)</code></pre><p>Return a function that internally counts by one when <code>abs(distance(last_score, f(...))) &lt;= min_dist</code>, where <code>last_score</code> holds the last value of <code>f(...)</code>. If the count is greater than or equal to <code>width</code>, the function returns <code>true</code>, otherwise it returns <code>false</code>. The count is reset when <code>abs(distance(last_score, f(...))) &gt; min_dist</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; f = let v = 10
          () -&gt; v = v / abs(v) - v
        end; # -9, 8, -7, 6, ...
 
@@ -88,4 +88,4 @@
 [ Info: Epoch 1
 [ Info: Epoch 2
 [ Info: Epoch 3
-[ Info: Epoch 4</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L702-L730">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../destructure/">« Flat vs. Nested</a><a class="docs-footer-nextpage" href="../zygote/">Gradients – Zygote.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+[ Info: Epoch 4</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L702-L730">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../destructure/">« Flat vs. Nested</a><a class="docs-footer-nextpage" href="../zygote/">Gradients – Zygote.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/training/optimisers/index.html b/dev/reference/training/optimisers/index.html
index 1e137510e3..0d08f99e80 100644
--- a/dev/reference/training/optimisers/index.html
+++ b/dev/reference/training/optimisers/index.html
@@ -40,4 +40,4 @@
 for epoch in 1:100
   opt.eta = next!(schedule)
   # your training code here
-end</code></pre><p>ParameterSchedulers.jl allows for many more scheduling policies including arbitrary functions, looping any function with a given period, or sequences of many schedules. See the ParameterSchedulers.jl documentation for more info.</p><h2 id="Decays"><a class="docs-heading-anchor" href="#Decays">Decays</a><a id="Decays-1"></a><a class="docs-heading-anchor-permalink" href="#Decays" title="Permalink"></a></h2><p>Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.SignDecay" href="#Optimisers.SignDecay"><code>Optimisers.SignDecay</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">SignDecay(λ = 1e-3)</code></pre><p>Implements <span>$L_1$</span> regularisation, also known as LASSO regression, when composed  with other rules as the first transformation in an <a href="#Optimisers.OptimiserChain"><code>OptimiserChain</code></a>.</p><p>It does this by adding <code>λ .* sign(x)</code> to the gradient. This is equivalent to adding  <code>λ * sum(abs, x) == λ * norm(x, 1)</code> to the loss.</p><p>See also [<code>WeightDecay</code>] for <span>$L_2$</span> normalisation. They can be used together: <code>OptimiserChain(SignDecay(0.012), WeightDecay(0.034), Adam())</code> is equivalent to adding <code>0.012 * norm(x, 1) + 0.017 * norm(x, 2)^2</code> to the loss function.</p><p><strong>Parameters</strong></p><ul><li>Penalty (<code>λ ≥ 0</code>): Controls the strength of the regularisation.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/rules.jl#L586-L601">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.WeightDecay" href="#Optimisers.WeightDecay"><code>Optimisers.WeightDecay</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">WeightDecay(λ = 5e-4)</code></pre><p>Implements <span>$L_2$</span> regularisation, also known as ridge regression,  when composed  with other rules as the first transformation in an <a href="#Optimisers.OptimiserChain"><code>OptimiserChain</code></a>.</p><p>It does this by adding <code>λ .* x</code> to the gradient. This is equivalent to adding  <code>λ/2 * sum(abs2, x) == λ/2 * norm(x)^2</code> to the loss.</p><p>See also [<code>SignDecay</code>] for <span>$L_1$</span> normalisation.</p><p><strong>Parameters</strong></p><ul><li>Penalty (<code>λ ≥ 0</code>): Controls the strength of the regularisation.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/rules.jl#L549-L562">source</a></section></article><h2 id="Gradient-Clipping"><a class="docs-heading-anchor" href="#Gradient-Clipping">Gradient Clipping</a><a id="Gradient-Clipping-1"></a><a class="docs-heading-anchor-permalink" href="#Gradient-Clipping" title="Permalink"></a></h2><p>Gradient clipping is useful for training recurrent neural networks, which have a tendency to suffer from the exploding gradient problem. An example usage is</p><pre><code class="language-julia hljs">opt = OptimiserChain(ClipValue(1e-3), Adam(1e-3))</code></pre><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.ClipGrad" href="#Optimisers.ClipGrad"><code>Optimisers.ClipGrad</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">ClipGrad(δ = 10)</code></pre><p>Restricts every gradient component to obey <code>-δ ≤ dx[i] ≤ δ</code>.</p><p>Typically composed with other rules using <a href="#Optimisers.OptimiserChain"><code>OptimiserChain</code></a>.</p><p>See also <a href="#Optimisers.ClipNorm"><code>ClipNorm</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/rules.jl#L616-L624">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.ClipNorm" href="#Optimisers.ClipNorm"><code>Optimisers.ClipNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">ClipNorm(ω = 10, p = 2; throw = true)</code></pre><p>Scales any gradient array for which <code>norm(dx, p) &gt; ω</code> to stay at this threshold (unless <code>p==0</code>).</p><p>Throws an error if the norm is infinite or <code>NaN</code>, which you can turn off with <code>throw = false</code>.</p><p>Typically composed with other rules using <a href="#Optimisers.OptimiserChain"><code>OptimiserChain</code></a>.</p><p>See also <a href="#Optimisers.ClipGrad"><code>ClipGrad</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/rules.jl#L638-L650">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../reference/">« Training API</a><a class="docs-footer-nextpage" href="../../outputsize/">Shape Inference »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body><div data-docstringscollapsed="true"></div></html>
+end</code></pre><p>ParameterSchedulers.jl allows for many more scheduling policies including arbitrary functions, looping any function with a given period, or sequences of many schedules. See the ParameterSchedulers.jl documentation for more info.</p><h2 id="Decays"><a class="docs-heading-anchor" href="#Decays">Decays</a><a id="Decays-1"></a><a class="docs-heading-anchor-permalink" href="#Decays" title="Permalink"></a></h2><p>Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.SignDecay" href="#Optimisers.SignDecay"><code>Optimisers.SignDecay</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">SignDecay(λ = 1e-3)</code></pre><p>Implements <span>$L_1$</span> regularisation, also known as LASSO regression, when composed  with other rules as the first transformation in an <a href="#Optimisers.OptimiserChain"><code>OptimiserChain</code></a>.</p><p>It does this by adding <code>λ .* sign(x)</code> to the gradient. This is equivalent to adding  <code>λ * sum(abs, x) == λ * norm(x, 1)</code> to the loss.</p><p>See also [<code>WeightDecay</code>] for <span>$L_2$</span> normalisation. They can be used together: <code>OptimiserChain(SignDecay(0.012), WeightDecay(0.034), Adam())</code> is equivalent to adding <code>0.012 * norm(x, 1) + 0.017 * norm(x, 2)^2</code> to the loss function.</p><p><strong>Parameters</strong></p><ul><li>Penalty (<code>λ ≥ 0</code>): Controls the strength of the regularisation.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/rules.jl#L586-L601">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.WeightDecay" href="#Optimisers.WeightDecay"><code>Optimisers.WeightDecay</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">WeightDecay(λ = 5e-4)</code></pre><p>Implements <span>$L_2$</span> regularisation, also known as ridge regression,  when composed  with other rules as the first transformation in an <a href="#Optimisers.OptimiserChain"><code>OptimiserChain</code></a>.</p><p>It does this by adding <code>λ .* x</code> to the gradient. This is equivalent to adding  <code>λ/2 * sum(abs2, x) == λ/2 * norm(x)^2</code> to the loss.</p><p>See also [<code>SignDecay</code>] for <span>$L_1$</span> normalisation.</p><p><strong>Parameters</strong></p><ul><li>Penalty (<code>λ ≥ 0</code>): Controls the strength of the regularisation.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/rules.jl#L549-L562">source</a></section></article><h2 id="Gradient-Clipping"><a class="docs-heading-anchor" href="#Gradient-Clipping">Gradient Clipping</a><a id="Gradient-Clipping-1"></a><a class="docs-heading-anchor-permalink" href="#Gradient-Clipping" title="Permalink"></a></h2><p>Gradient clipping is useful for training recurrent neural networks, which have a tendency to suffer from the exploding gradient problem. An example usage is</p><pre><code class="language-julia hljs">opt = OptimiserChain(ClipValue(1e-3), Adam(1e-3))</code></pre><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.ClipGrad" href="#Optimisers.ClipGrad"><code>Optimisers.ClipGrad</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">ClipGrad(δ = 10)</code></pre><p>Restricts every gradient component to obey <code>-δ ≤ dx[i] ≤ δ</code>.</p><p>Typically composed with other rules using <a href="#Optimisers.OptimiserChain"><code>OptimiserChain</code></a>.</p><p>See also <a href="#Optimisers.ClipNorm"><code>ClipNorm</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/rules.jl#L616-L624">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.ClipNorm" href="#Optimisers.ClipNorm"><code>Optimisers.ClipNorm</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">ClipNorm(ω = 10, p = 2; throw = true)</code></pre><p>Scales any gradient array for which <code>norm(dx, p) &gt; ω</code> to stay at this threshold (unless <code>p==0</code>).</p><p>Throws an error if the norm is infinite or <code>NaN</code>, which you can turn off with <code>throw = false</code>.</p><p>Typically composed with other rules using <a href="#Optimisers.OptimiserChain"><code>OptimiserChain</code></a>.</p><p>See also <a href="#Optimisers.ClipGrad"><code>ClipGrad</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/rules.jl#L638-L650">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../reference/">« Training API</a><a class="docs-footer-nextpage" href="../../outputsize/">Shape Inference »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body><div data-docstringscollapsed="true"></div></html>
diff --git a/dev/reference/training/reference/index.html b/dev/reference/training/reference/index.html
index addb3b3e11..a2e141291d 100644
--- a/dev/reference/training/reference/index.html
+++ b/dev/reference/training/reference/index.html
@@ -19,14 +19,14 @@
  10.19
 
 julia&gt; opt_state  # mutated by Flux.train!
-(weight = Leaf(Momentum(0.1, 0.9), [-2.018 3.027]), bias = Leaf(Momentum(0.1, 0.9), [-10.09]), σ = ())</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/train.jl#L14-L44">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Optimise.train!-NTuple{4, Any}" href="#Flux.Optimise.train!-NTuple{4, Any}"><code>Flux.Optimise.train!</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">train!(loss, model, data, opt_state)</code></pre><p>Uses a <code>loss</code> function and training <code>data</code> to improve the <code>model</code>&#39;s parameters according to a particular optimisation rule encoded in <code>opt_state</code>.  Iterates through <code>data</code> once, evaluating for each <code>d in data</code> either <code>loss(model, d...)</code> if <code>d isa Tuple</code>, or else <code>loss(model, d)</code> for other <code>d</code>.</p><p>If <code>model</code> is an Enzyme.Duplicated and <code>Enzyme.jl</code> is loaded, gradients will be computed with Enzyme, otherwise they will be computed with Zygote.</p><p>For example, with these definitions...</p><pre><code class="nohighlight hljs">data = [(x1, y1), (x2, y2), (x3, y3)]
+(weight = Leaf(Momentum(0.1, 0.9), [-2.018 3.027]), bias = Leaf(Momentum(0.1, 0.9), [-10.09]), σ = ())</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/train.jl#L14-L44">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.Optimise.train!-NTuple{4, Any}" href="#Flux.Optimise.train!-NTuple{4, Any}"><code>Flux.Optimise.train!</code></a> — <span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">train!(loss, model, data, opt_state)</code></pre><p>Uses a <code>loss</code> function and training <code>data</code> to improve the <code>model</code>&#39;s parameters according to a particular optimisation rule encoded in <code>opt_state</code>.  Iterates through <code>data</code> once, evaluating for each <code>d in data</code> either <code>loss(model, d...)</code> if <code>d isa Tuple</code>, or else <code>loss(model, d)</code> for other <code>d</code>.</p><p>If <code>model</code> is an Enzyme.Duplicated and <code>Enzyme.jl</code> is loaded, gradients will be computed with Enzyme, otherwise they will be computed with Zygote.</p><p>For example, with these definitions...</p><pre><code class="nohighlight hljs">data = [(x1, y1), (x2, y2), (x3, y3)]
 
 loss3(m, x, y) = norm(m(x) .- y)        # the model is the first argument
 
 opt_state = Flux.setup(Adam(), model)   # explicit setup of optimiser momenta</code></pre><p>...calling <code>Flux.train!(loss3, model, data, opt_state)</code> runs a loop much like this:</p><pre><code class="nohighlight hljs">for d in data
     ∂L∂m = gradient(loss3, model, d...)[1]
     update!(opt_state, model, ∂L∂m)
-end</code></pre><p>You can also write this loop yourself, if you need more flexibility. For this reason <code>train!</code> is not highly extensible. It adds only a few features to the loop above:</p><ul><li><p>Stop with a <code>DomainError</code> if the loss is infinite or <code>NaN</code> at any point.</p></li><li><p>Show a progress bar using <a href="https://github.com/JuliaLogging/ProgressLogging.jl"><code>@withprogress</code></a>.</p></li></ul><div class="admonition is-compat"><header class="admonition-header">New</header><div class="admonition-body"><p>This method was added in Flux 0.13.9. It has significant changes from the one used by Flux ≤ 0.13:</p><ul><li>It now takes the <code>model</code> itself, not the result of <code>Flux.params</code>. (This is to move away from Zygote&#39;s &quot;implicit&quot; parameter handling, with <code>Grads</code>.)</li><li>Instead of <code>loss</code> being a function which accepts only the data, now it must also accept the <code>model</code> itself, as the first argument.</li><li><code>opt_state</code> should be the result of <a href="#Flux.Train.setup"><code>Flux.setup</code></a>. Using an optimiser such as <code>Adam()</code> without this step should give you a warning.</li><li>Callback functions are not supported. (But any code can be included in the above <code>for</code> loop.)</li></ul></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/train.jl#L55-L100">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.update" href="#Optimisers.update"><code>Optimisers.update</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Optimisers.update(tree, model, gradient) -&gt; (tree, model)</code></pre><p>Uses the optimiser and the gradient to change the trainable parameters in the model. Returns the improved model, and the optimiser states needed for the next update. The initial tree of states comes from <a href="#Optimisers.setup"><code>setup</code></a>.</p><p>See also <a href="#Optimisers.update!"><code>update!</code></a>, which will be faster for models of ordinary <code>Array</code>s or <code>CuArray</code>s.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = (x = Float32[1,2,3], y = tanh);
+end</code></pre><p>You can also write this loop yourself, if you need more flexibility. For this reason <code>train!</code> is not highly extensible. It adds only a few features to the loop above:</p><ul><li><p>Stop with a <code>DomainError</code> if the loss is infinite or <code>NaN</code> at any point.</p></li><li><p>Show a progress bar using <a href="https://github.com/JuliaLogging/ProgressLogging.jl"><code>@withprogress</code></a>.</p></li></ul><div class="admonition is-compat"><header class="admonition-header">New</header><div class="admonition-body"><p>This method was added in Flux 0.13.9. It has significant changes from the one used by Flux ≤ 0.13:</p><ul><li>It now takes the <code>model</code> itself, not the result of <code>Flux.params</code>. (This is to move away from Zygote&#39;s &quot;implicit&quot; parameter handling, with <code>Grads</code>.)</li><li>Instead of <code>loss</code> being a function which accepts only the data, now it must also accept the <code>model</code> itself, as the first argument.</li><li><code>opt_state</code> should be the result of <a href="#Flux.Train.setup"><code>Flux.setup</code></a>. Using an optimiser such as <code>Adam()</code> without this step should give you a warning.</li><li>Callback functions are not supported. (But any code can be included in the above <code>for</code> loop.)</li></ul></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/train.jl#L55-L100">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.update" href="#Optimisers.update"><code>Optimisers.update</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Optimisers.update(tree, model, gradient) -&gt; (tree, model)</code></pre><p>Uses the optimiser and the gradient to change the trainable parameters in the model. Returns the improved model, and the optimiser states needed for the next update. The initial tree of states comes from <a href="#Optimisers.setup"><code>setup</code></a>.</p><p>See also <a href="#Optimisers.update!"><code>update!</code></a>, which will be faster for models of ordinary <code>Array</code>s or <code>CuArray</code>s.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = (x = Float32[1,2,3], y = tanh);
 
 julia&gt; t = Optimisers.setup(Descent(0.1), m)
 (x = Leaf(Descent(0.1), nothing), y = ())
@@ -115,4 +115,4 @@
 julia&gt; Optimisers.thaw!(s)
 
 julia&gt; s.x
-(Leaf(Momentum(0.01, 0.9), [0.0]), ())</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/adjust.jl#L5-L36">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.thaw!" href="#Optimisers.thaw!"><code>Optimisers.thaw!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Optimisers.thaw!(tree)</code></pre><p>The reverse of <a href="#Optimisers.freeze!"><code>freeze!</code></a>. Applies to all parameters, mutating every <code>Leaf(rule, state, frozen = true)</code> to <code>Leaf(rule, state, frozen = false)</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/adjust.jl#L40-L45">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../models/losses/">« Loss Functions</a><a class="docs-footer-nextpage" href="../optimisers/">Optimisation Rules »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+(Leaf(Momentum(0.01, 0.9), [0.0]), ())</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/adjust.jl#L5-L36">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Optimisers.thaw!" href="#Optimisers.thaw!"><code>Optimisers.thaw!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Optimisers.thaw!(tree)</code></pre><p>The reverse of <a href="#Optimisers.freeze!"><code>freeze!</code></a>. Applies to all parameters, mutating every <code>Leaf(rule, state, frozen = true)</code> to <code>Leaf(rule, state, frozen = false)</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Optimisers.jl/blob/v0.3.3/src/adjust.jl#L40-L45">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../models/losses/">« Loss Functions</a><a class="docs-footer-nextpage" href="../optimisers/">Optimisation Rules »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/reference/training/zygote/index.html b/dev/reference/training/zygote/index.html
index 55f2a9068d..2fded768f4 100644
--- a/dev/reference/training/zygote/index.html
+++ b/dev/reference/training/zygote/index.html
@@ -203,4 +203,4 @@
 
 # this definition of map is for any AD that only defines a reverse mode.
 # It is not as good as the rrule that can be used if the AD defines a forward-mode as well.
-rrule(conf::RuleConfig{&gt;:Union{NoForwardsMode, HasReverseMode}}, typeof(map), ::Vector) = ...</code></pre><p>For more details see <a href="@ref config">rule configurations and calling back into AD</a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaDiff/ChainRulesCore.jl/blob/v1.24.0/src/config.jl#L1-L25">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="ChainRulesCore.Tangent" href="#ChainRulesCore.Tangent"><code>ChainRulesCore.Tangent</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Tangent{P, T} &lt;: StructuralTangent{P} &lt;: AbstractTangent</code></pre><p>This type represents the tangent for a <code>struct</code>/<code>NamedTuple</code>, or <code>Tuple</code>. <code>P</code> is the the corresponding primal type that this is a tangent for.</p><p><code>Tangent{P}</code> should have fields (technically properties), that match to a subset of the fields of the primal type; and each should be a tangent type matching to the primal type of that field. Fields of the P that are not present in the Tangent are treated as <code>Zero</code>.</p><p><code>T</code> is an implementation detail representing the backing data structure. For Tuple it will be a Tuple, and for everything else it will be a <code>NamedTuple</code>. It should not be passed in by user.</p><p>For <code>Tangent</code>s of <code>Tuple</code>s, <code>iterate</code> and <code>getindex</code> are overloaded to behave similarly to for a tuple. For <code>Tangent</code>s of <code>struct</code>s, <code>getproperty</code> is overloaded to allow for accessing values via <code>tangent.fieldname</code>. Any fields not explictly present in the <code>Tangent</code> are treated as being set to <code>ZeroTangent()</code>. To make a <code>Tangent</code> have all the fields of the primal the <a href="#ChainRulesCore.canonicalize"><code>canonicalize</code></a> function is provided.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaDiff/ChainRulesCore.jl/blob/v1.24.0/src/tangent_types/structural_tangent.jl#L16-L38">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="ChainRulesCore.canonicalize" href="#ChainRulesCore.canonicalize"><code>ChainRulesCore.canonicalize</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">canonicalize(tangent::Tangent{P}) -&gt; Tangent{P}</code></pre><p>Return the canonical <code>Tangent</code> for the primal type <code>P</code>. The property names of the returned <code>Tangent</code> match the field names of the primal, and all fields of <code>P</code> not present in the input <code>tangent</code> are explictly set to <code>ZeroTangent()</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaDiff/ChainRulesCore.jl/blob/v1.24.0/src/tangent_types/structural_tangent.jl#L427-L433">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../callbacks/">« Callback Helpers</a><a class="docs-footer-nextpage" href="../../data/mlutils/">Batching Data – MLUtils.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body><div data-docstringscollapsed="true"></div></html>
+rrule(conf::RuleConfig{&gt;:Union{NoForwardsMode, HasReverseMode}}, typeof(map), ::Vector) = ...</code></pre><p>For more details see <a href="@ref config">rule configurations and calling back into AD</a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaDiff/ChainRulesCore.jl/blob/v1.24.0/src/config.jl#L1-L25">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="ChainRulesCore.Tangent" href="#ChainRulesCore.Tangent"><code>ChainRulesCore.Tangent</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Tangent{P, T} &lt;: StructuralTangent{P} &lt;: AbstractTangent</code></pre><p>This type represents the tangent for a <code>struct</code>/<code>NamedTuple</code>, or <code>Tuple</code>. <code>P</code> is the the corresponding primal type that this is a tangent for.</p><p><code>Tangent{P}</code> should have fields (technically properties), that match to a subset of the fields of the primal type; and each should be a tangent type matching to the primal type of that field. Fields of the P that are not present in the Tangent are treated as <code>Zero</code>.</p><p><code>T</code> is an implementation detail representing the backing data structure. For Tuple it will be a Tuple, and for everything else it will be a <code>NamedTuple</code>. It should not be passed in by user.</p><p>For <code>Tangent</code>s of <code>Tuple</code>s, <code>iterate</code> and <code>getindex</code> are overloaded to behave similarly to for a tuple. For <code>Tangent</code>s of <code>struct</code>s, <code>getproperty</code> is overloaded to allow for accessing values via <code>tangent.fieldname</code>. Any fields not explictly present in the <code>Tangent</code> are treated as being set to <code>ZeroTangent()</code>. To make a <code>Tangent</code> have all the fields of the primal the <a href="#ChainRulesCore.canonicalize"><code>canonicalize</code></a> function is provided.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaDiff/ChainRulesCore.jl/blob/v1.24.0/src/tangent_types/structural_tangent.jl#L16-L38">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="ChainRulesCore.canonicalize" href="#ChainRulesCore.canonicalize"><code>ChainRulesCore.canonicalize</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">canonicalize(tangent::Tangent{P}) -&gt; Tangent{P}</code></pre><p>Return the canonical <code>Tangent</code> for the primal type <code>P</code>. The property names of the returned <code>Tangent</code> match the field names of the primal, and all fields of <code>P</code> not present in the input <code>tangent</code> are explictly set to <code>ZeroTangent()</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaDiff/ChainRulesCore.jl/blob/v1.24.0/src/tangent_types/structural_tangent.jl#L427-L433">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../callbacks/">« Callback Helpers</a><a class="docs-footer-nextpage" href="../../data/mlutils/">Batching Data – MLUtils.jl »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body><div data-docstringscollapsed="true"></div></html>
diff --git a/dev/reference/utilities/index.html b/dev/reference/utilities/index.html
index 3cbca8caef..4163da594b 100644
--- a/dev/reference/utilities/index.html
+++ b/dev/reference/utilities/index.html
@@ -32,7 +32,7 @@
 julia&gt; ans.bias
 2-element Vector{Float32}:
  0.0
- 0.0</code></pre><p><strong>References</strong></p><p>[1] Glorot, Xavier, and Yoshua Bengio. &quot;Understanding the difficulty of training deep feedforward neural networks.&quot; <em>Proceedings of the thirteenth international conference on artificial intelligence and statistics</em>. 2010.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L49-L84">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.glorot_normal" href="#Flux.glorot_normal"><code>Flux.glorot_normal</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">glorot_normal([rng], size...; gain = 1) -&gt; Array
+ 0.0</code></pre><p><strong>References</strong></p><p>[1] Glorot, Xavier, and Yoshua Bengio. &quot;Understanding the difficulty of training deep feedforward neural networks.&quot; <em>Proceedings of the thirteenth international conference on artificial intelligence and statistics</em>. 2010.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L49-L84">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.glorot_normal" href="#Flux.glorot_normal"><code>Flux.glorot_normal</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">glorot_normal([rng], size...; gain = 1) -&gt; Array
 glorot_normal([rng]; kw...) -&gt; Function</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> containing random numbers drawn from a normal distribution with standard deviation <code>gain * sqrt(2 / (fan_in + fan_out))</code>, using <a href="#Flux.nfan"><code>nfan</code></a>.</p><p>This method is described in [1] and also known as Xavier initialization.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
 
 julia&gt; round(std(Flux.glorot_normal(10, 1000)), digits=3)
@@ -48,7 +48,7 @@
 Dense(10 =&gt; 1000, tanh)  # 11_000 parameters
 
 julia&gt; round(std(ans.weight), sigdigits=3)
-4.45f0</code></pre><p><strong>References</strong></p><p>[1] Glorot, Xavier, and Yoshua Bengio. &quot;Understanding the difficulty of training deep feedforward neural networks.&quot; <em>Proceedings of the thirteenth international conference on artificial intelligence and statistics</em>. 2010.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L94-L127">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.kaiming_uniform" href="#Flux.kaiming_uniform"><code>Flux.kaiming_uniform</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">kaiming_uniform([rng], size...; gain = √2) -&gt; Array
+4.45f0</code></pre><p><strong>References</strong></p><p>[1] Glorot, Xavier, and Yoshua Bengio. &quot;Understanding the difficulty of training deep feedforward neural networks.&quot; <em>Proceedings of the thirteenth international conference on artificial intelligence and statistics</em>. 2010.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L94-L127">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.kaiming_uniform" href="#Flux.kaiming_uniform"><code>Flux.kaiming_uniform</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">kaiming_uniform([rng], size...; gain = √2) -&gt; Array
 kaiming_uniform([rng]; kw...) -&gt; Function</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> containing random numbers drawn from a uniform distribution on the interval <code>[-x, x]</code>, where <code>x = gain * sqrt(3/fan_in)</code> using <a href="#Flux.nfan"><code>nfan</code></a>.</p><p>This method is described in [1] and also known as He initialization.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; round.(extrema(Flux.kaiming_uniform(100, 10)), digits=3)
 (-0.774f0, 0.773f0)
 
@@ -56,7 +56,7 @@
 (-0.243f0, 0.245f0)
 
 julia&gt; round.(extrema(Flux.kaiming_uniform(100, 100)), digits=3)
-(-0.245f0, 0.245f0)</code></pre><p><strong>References</strong></p><p>[1] He, Kaiming, et al. &quot;Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.&quot; <em>Proceedings of the IEEE international conference on computer vision</em>. 2015.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L137-L161">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.kaiming_normal" href="#Flux.kaiming_normal"><code>Flux.kaiming_normal</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">kaiming_normal([rng], size...; gain = √2) -&gt; Array
+(-0.245f0, 0.245f0)</code></pre><p><strong>References</strong></p><p>[1] He, Kaiming, et al. &quot;Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.&quot; <em>Proceedings of the IEEE international conference on computer vision</em>. 2015.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L137-L161">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.kaiming_normal" href="#Flux.kaiming_normal"><code>Flux.kaiming_normal</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">kaiming_normal([rng], size...; gain = √2) -&gt; Array
 kaiming_normal([rng]; kw...) -&gt; Function</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> containing random numbers taken from a normal distribution standard deviation <code>gain / sqrt(fan_in)</code>, using <a href="#Flux.nfan"><code>nfan</code></a>.</p><p>This method is described in [1] and also known as He initialization.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
 
 julia&gt; round(std(Flux.kaiming_normal(10, 1000)), digits=3)
@@ -66,7 +66,7 @@
 0.449f0
 
 julia&gt; round(std(Flux.kaiming_normal(1000, 1000)), digits=3)
-0.045f0</code></pre><p><strong>References</strong></p><p>[1] He, Kaiming, et al. &quot;Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.&quot; <em>Proceedings of the IEEE international conference on computer vision</em>. 2015.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L172-L198">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.truncated_normal" href="#Flux.truncated_normal"><code>Flux.truncated_normal</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">truncated_normal([rng], size...; mean = 0, std = 1, lo = -2, hi = 2) -&gt; Array
+0.045f0</code></pre><p><strong>References</strong></p><p>[1] He, Kaiming, et al. &quot;Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.&quot; <em>Proceedings of the IEEE international conference on computer vision</em>. 2015.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L172-L198">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.truncated_normal" href="#Flux.truncated_normal"><code>Flux.truncated_normal</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">truncated_normal([rng], size...; mean = 0, std = 1, lo = -2, hi = 2) -&gt; Array
 truncated_normal([rng]; kw...) -&gt; Function</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> where each element is drawn from a truncated normal distribution. The numbers are distributed like <code>filter(x -&gt; lo&lt;=x&lt;=hi, mean .+ std .* randn(100))</code>.</p><p>The values are generated by sampling a Uniform(0, 1) (<code>rand()</code>) and then applying the inverse CDF of the truncated normal distribution. This method works best when <code>lo ≤ mean ≤ hi</code>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; using Statistics
 
 julia&gt; Flux.truncated_normal(3, 4) |&gt; summary
@@ -76,7 +76,7 @@
 (-2.0f0, 2.0f0)
 
 julia&gt; round(std(Flux.truncated_normal(10^6; lo = -100, hi = 100)))
-1.0f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L209-L233">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.orthogonal" href="#Flux.orthogonal"><code>Flux.orthogonal</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">orthogonal([rng], size...; gain = 1) -&gt; Array
+1.0f0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L209-L233">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.orthogonal" href="#Flux.orthogonal"><code>Flux.orthogonal</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">orthogonal([rng], size...; gain = 1) -&gt; Array
 orthogonal([rng]; kw...) -&gt; Function</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> which is a (semi) orthogonal matrix, as described in [1].</p><p>Cannot construct a vector, i.e. <code>length(size) == 1</code> is forbidden. For <code>length(size) &gt; 2</code>, a <code>prod(size[1:(end - 1)])</code> by <code>size[end]</code> orthogonal matrix is computed before reshaping it to the original dimensions.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; W = Flux.orthogonal(5, 7);
 
 julia&gt; summary(W)
@@ -96,7 +96,7 @@
 julia&gt; W3 = Flux.orthogonal(3, 3, 2, 4);
 
 julia&gt; transpose(reshape(W3, :, 4)) * reshape(W3, :, 4) ≈ I(4)
-true</code></pre><p><strong>References</strong></p><p>[1] Saxe, McClelland, Ganguli. &quot;Exact solutions to the nonlinear dynamics of learning in deep linear neural networks&quot;, ICLR 2014, https://arxiv.org/abs/1312.6120</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L255-L293">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.sparse_init" href="#Flux.sparse_init"><code>Flux.sparse_init</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">sparse_init([rng], rows, cols; sparsity, std = 0.01) -&gt; Array
+true</code></pre><p><strong>References</strong></p><p>[1] Saxe, McClelland, Ganguli. &quot;Exact solutions to the nonlinear dynamics of learning in deep linear neural networks&quot;, ICLR 2014, https://arxiv.org/abs/1312.6120</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L255-L293">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.sparse_init" href="#Flux.sparse_init"><code>Flux.sparse_init</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">sparse_init([rng], rows, cols; sparsity, std = 0.01) -&gt; Array
 sparse_init([rng]; kw...) -&gt; Function</code></pre><p>Return a <code>Matrix{Float32}</code> of size <code>rows, cols</code> where each column contains a fixed fraction of zero elements given by <code>sparsity</code>. Non-zero elements are normally distributed with a mean of zero and standard deviation <code>std</code>.</p><p>This method is described in [1].</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; count(iszero, Flux.sparse_init(10, 10, sparsity=1/5))
 20
 
@@ -109,7 +109,7 @@
 
 julia&gt; count(iszero, ans.weight, dims=1)
 1×3 Matrix{Int64}:
- 5  5  5</code></pre><p><strong>References</strong></p><p>[1] Martens, J, &quot;Deep learning via Hessian-free optimization&quot; <em>Proceedings of the 27th International Conference on International Conference on Machine Learning</em>. 2010.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L316-L346">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.identity_init" href="#Flux.identity_init"><code>Flux.identity_init</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">identity_init(size...; gain=1, shift=0) -&gt; Array
+ 5  5  5</code></pre><p><strong>References</strong></p><p>[1] Martens, J, &quot;Deep learning via Hessian-free optimization&quot; <em>Proceedings of the 27th International Conference on International Conference on Machine Learning</em>. 2010.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L316-L346">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.identity_init" href="#Flux.identity_init"><code>Flux.identity_init</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">identity_init(size...; gain=1, shift=0) -&gt; Array
 identity_init(; kw...) -&gt; Function</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> which yields an identity mapping when used as parameters in most Flux layers. Use <code>gain</code> to scale the identity by a constant.</p><p>Often useful in the context of transfer learning, i.e when one wants to add more capacity to a model but start from the same mapping.</p><p>Has the following behaviour</p><ul><li>1D: A <code>Vector</code> of <code>zeros</code> (useful for an identity bias)</li><li>2D: An identity matrix (useful for an identity matrix multiplication)</li><li>More than 2D: A dense block array of center tap spatial filters (useful for an identity convolution)</li></ul><p>Some caveats: </p><ul><li><p>Not all layers will be identity mapping when used with this init. Exceptions include recurrent layers and normalization layers.</p></li><li><p>Layers must have <code>input_size == output_size</code> for identity mapping to be possible. When this is not the case, extra dimensions of the array are padded with zeros.</p></li><li><p>For convolutional layers, in addition to the above, the kernel sizes must also be odd and padding must be applied so that output feature maps have the same size as input feature maps, e.g by using <a href="../models/layers/#Flux.SamePad"><code>SamePad</code></a>.</p></li></ul><p>Use keyword <code>shift</code> (integer or tuple) to apply circular shift to the output, equivalent to <code>Base.circshift(identity_init(size...), shift)</code>.</p><p>For consistency with other initialisers, it accepts <code>rng::AbstractRNG</code> as an optional first argument. But this is ignored, since the result is not random.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; Flux.identity_init(3,5)
 3×5 Matrix{Float32}:
  1.0  0.0  0.0  0.0  0.0
@@ -141,7 +141,7 @@
 [:, :, 1, 1] =
  10.0  20.0  30.0
  40.0  50.0  60.0
- 70.0  80.0  90.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L364-L431">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.ones32" href="#Flux.ones32"><code>Flux.ones32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">ones32(size...) = ones(Float32, size...)</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> filled with 1s.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L454-L458">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.zeros32" href="#Flux.zeros32"><code>Flux.zeros32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">zeros32(size...) = zeros(Float32, size...)</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> filled with 0s.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L461-L465">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.rand32" href="#Flux.rand32"><code>Flux.rand32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">rand32([rng], size...)</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code>, filled like <code>rand</code>. When the size is not provided, <code>rand32(rng::AbstractRNG)</code> returns a function.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L468-L473">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.randn32" href="#Flux.randn32"><code>Flux.randn32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">randn32([rng], size...)</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code>, filled like <code>randn</code>. When the size is not provided, <code>randn32(rng::AbstractRNG)</code> returns a function.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L478-L483">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.create_bias" href="#Flux.create_bias"><code>Flux.create_bias</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">create_bias(weights, bias, size...)</code></pre><p>Return a bias parameter for a layer, based on the value given to the constructor&#39;s keyword <code>bias=bias</code>.</p><ul><li><code>bias == true</code> creates a trainable array of the given size, of the same type as <code>weights</code>, initialised to zero.</li><li><code>bias == false</code> returns <code>false</code>, which is understood by AD to be non-differentiable.</li><li><code>bias::AbstractArray</code> uses the array provided, provided it has the correct size. It will also correct the <code>eltype</code> to match that of <code>weights</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L488-L498">source</a></section></article><p>These functions call:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.rng_from_array" href="#Flux.rng_from_array"><code>Flux.rng_from_array</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">rng_from_array(x)</code></pre><p>Create an instance of the RNG most appropriate for <code>x</code>. The current defaults are:</p><ul><li><code>x isa CuArray</code>: <code>CUDA.default_rng()</code></li><li><code>x isa AbstractArray</code>: `Random.default_rng()</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L36-L43">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.nfan" href="#Flux.nfan"><code>Flux.nfan</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">nfan(n_out, n_in=1) -&gt; Tuple
+ 70.0  80.0  90.0</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L364-L431">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.ones32" href="#Flux.ones32"><code>Flux.ones32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">ones32(size...) = ones(Float32, size...)</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> filled with 1s.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L454-L458">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.zeros32" href="#Flux.zeros32"><code>Flux.zeros32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">zeros32(size...) = zeros(Float32, size...)</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code> filled with 0s.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L461-L465">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.rand32" href="#Flux.rand32"><code>Flux.rand32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">rand32([rng], size...)</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code>, filled like <code>rand</code>. When the size is not provided, <code>rand32(rng::AbstractRNG)</code> returns a function.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L468-L473">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.randn32" href="#Flux.randn32"><code>Flux.randn32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">randn32([rng], size...)</code></pre><p>Return an <code>Array{Float32}</code> of the given <code>size</code>, filled like <code>randn</code>. When the size is not provided, <code>randn32(rng::AbstractRNG)</code> returns a function.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L478-L483">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.create_bias" href="#Flux.create_bias"><code>Flux.create_bias</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">create_bias(weights, bias, size...)</code></pre><p>Return a bias parameter for a layer, based on the value given to the constructor&#39;s keyword <code>bias=bias</code>.</p><ul><li><code>bias == true</code> creates a trainable array of the given size, of the same type as <code>weights</code>, initialised to zero.</li><li><code>bias == false</code> returns <code>false</code>, which is understood by AD to be non-differentiable.</li><li><code>bias::AbstractArray</code> uses the array provided, provided it has the correct size. It will also correct the <code>eltype</code> to match that of <code>weights</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L488-L498">source</a></section></article><p>These functions call:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.rng_from_array" href="#Flux.rng_from_array"><code>Flux.rng_from_array</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">rng_from_array(x)</code></pre><p>Create an instance of the RNG most appropriate for <code>x</code>. The current defaults are:</p><ul><li><code>x isa CuArray</code>: <code>CUDA.default_rng()</code></li><li><code>x isa AbstractArray</code>: `Random.default_rng()</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L36-L43">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.nfan" href="#Flux.nfan"><code>Flux.nfan</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">nfan(n_out, n_in=1) -&gt; Tuple
 nfan(dims...)
 nfan(dims::Tuple)</code></pre><p>For a layer characterized by dimensions <code>dims</code>, return a tuple <code>(fan_in, fan_out)</code>, where <code>fan_in</code> is the number of input neurons connected to an output one, and <code>fan_out</code> is the number of output neurons connected to an input one.</p><p>This function is mainly used by weight initializers, e.g., <a href="#Flux.kaiming_normal"><code>kaiming_normal</code></a>.</p><p><strong>Examples</strong></p><pre><code class="language-julia-repl hljs">julia&gt; layer = Dense(10, 20);
 
@@ -151,7 +151,7 @@
 julia&gt; layer = Conv((3, 3), 2=&gt;10);
 
 julia&gt; Flux.nfan(size(layer.weight))
-(18, 90)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/utils.jl#L2-L26">source</a></section></article><h2 id="Changing-the-type-of-all-parameters"><a class="docs-heading-anchor" href="#Changing-the-type-of-all-parameters">Changing the type of all parameters</a><a id="Changing-the-type-of-all-parameters-1"></a><a class="docs-heading-anchor-permalink" href="#Changing-the-type-of-all-parameters" title="Permalink"></a></h2><p>The default <code>eltype</code> for models is <code>Float32</code> since models are often trained/run on GPUs. The <code>eltype</code> of model <code>m</code> can be changed to <code>Float64</code> by <code>f64(m)</code>:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.f64" href="#Flux.f64"><code>Flux.f64</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">f64(m)</code></pre><p>Converts the <code>eltype</code> of model&#39;s <em>floating point</em> parameters to <code>Float64</code>. Recurses into structs marked with <a href="../models/functors/#Flux.@layer"><code>@layer</code></a>.</p><p>See also <a href="#Flux.f32"><code>f32</code></a> and <a href="#Flux.f16"><code>f16</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L304-L311">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.f32" href="#Flux.f32"><code>Flux.f32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">f32(m)</code></pre><p>Converts the <code>eltype</code> of model&#39;s <em>floating point</em> parameters to <code>Float32</code> (which is Flux&#39;s default). Recurses into structs marked with <a href="../models/functors/#Flux.@layer"><code>@layer</code></a>.</p><p>See also <a href="#Flux.f64"><code>f64</code></a> and <a href="#Flux.f16"><code>f16</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L294-L301">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.f16" href="#Flux.f16"><code>Flux.f16</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">f16(m)</code></pre><p>Converts the <code>eltype</code> of model&#39;s <em>floating point</em> parameters to <code>Float16</code>. Recurses into structs marked with <a href="../models/functors/#Flux.@layer"><code>@layer</code></a>.</p><p>Support for <code>Float16</code> is limited on many CPUs. Julia may convert to <code>Float32</code> for each operation, which is slow.</p><p>See also <a href="#Flux.f32"><code>f32</code></a> and <a href="#Flux.f64"><code>f64</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = Chain(Dense(784, 2048, relu), Dense(2048, 10))  # all Float32
+(18, 90)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/utils.jl#L2-L26">source</a></section></article><h2 id="Changing-the-type-of-all-parameters"><a class="docs-heading-anchor" href="#Changing-the-type-of-all-parameters">Changing the type of all parameters</a><a id="Changing-the-type-of-all-parameters-1"></a><a class="docs-heading-anchor-permalink" href="#Changing-the-type-of-all-parameters" title="Permalink"></a></h2><p>The default <code>eltype</code> for models is <code>Float32</code> since models are often trained/run on GPUs. The <code>eltype</code> of model <code>m</code> can be changed to <code>Float64</code> by <code>f64(m)</code>:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.f64" href="#Flux.f64"><code>Flux.f64</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">f64(m)</code></pre><p>Converts the <code>eltype</code> of model&#39;s <em>floating point</em> parameters to <code>Float64</code>. Recurses into structs marked with <a href="../models/functors/#Flux.@layer"><code>@layer</code></a>.</p><p>See also <a href="#Flux.f32"><code>f32</code></a> and <a href="#Flux.f16"><code>f16</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L304-L311">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.f32" href="#Flux.f32"><code>Flux.f32</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">f32(m)</code></pre><p>Converts the <code>eltype</code> of model&#39;s <em>floating point</em> parameters to <code>Float32</code> (which is Flux&#39;s default). Recurses into structs marked with <a href="../models/functors/#Flux.@layer"><code>@layer</code></a>.</p><p>See also <a href="#Flux.f64"><code>f64</code></a> and <a href="#Flux.f16"><code>f16</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L294-L301">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Flux.f16" href="#Flux.f16"><code>Flux.f16</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">f16(m)</code></pre><p>Converts the <code>eltype</code> of model&#39;s <em>floating point</em> parameters to <code>Float16</code>. Recurses into structs marked with <a href="../models/functors/#Flux.@layer"><code>@layer</code></a>.</p><p>Support for <code>Float16</code> is limited on many CPUs. Julia may convert to <code>Float32</code> for each operation, which is slow.</p><p>See also <a href="#Flux.f32"><code>f32</code></a> and <a href="#Flux.f64"><code>f64</code></a>.</p><p><strong>Example</strong></p><pre><code class="language-julia-repl hljs">julia&gt; m = Chain(Dense(784, 2048, relu), Dense(2048, 10))  # all Float32
 Chain(
   Dense(784 =&gt; 2048, relu),             # 1_607_680 parameters
   Dense(2048 =&gt; 10),                    # 20_490 parameters
@@ -161,4 +161,4 @@
 Chain(
   Dense(784 =&gt; 2048, relu),             # 1_607_680 parameters
   Dense(2048 =&gt; 10),                    # 20_490 parameters
-)                   # Total: 4 arrays, 1_628_170 parameters, 3.106 MiB.</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a5bbb5f20f1fa9582c059929220b5a4cd996dde5/src/functor.jl#L314-L339">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../models/activation/">« Activation Functions</a><a class="docs-footer-nextpage" href="../models/losses/">Loss Functions »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+)                   # Total: 4 arrays, 1_628_170 parameters, 3.106 MiB.</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/07862daee1fe321e106e84c6bcc9ef42323648a0/src/functor.jl#L314-L339">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../models/activation/">« Activation Functions</a><a class="docs-footer-nextpage" href="../models/losses/">Loss Functions »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/tutorials/linear_regression/index.html b/dev/tutorials/linear_regression/index.html
index 83eb3fedd1..bb4e8586f9 100644
--- a/dev/tutorials/linear_regression/index.html
+++ b/dev/tutorials/linear_regression/index.html
@@ -106,4 +106,4 @@
 27.1272f0</code></pre><p>The loss went down significantly! It can be minimized further by choosing an even smaller <code>δ</code>.</p><h2 id="Testing-the-Flux-model"><a class="docs-heading-anchor" href="#Testing-the-Flux-model">Testing the Flux model</a><a id="Testing-the-Flux-model-1"></a><a class="docs-heading-anchor-permalink" href="#Testing-the-Flux-model" title="Permalink"></a></h2><p>The last step of this tutorial would be to test our model using the testing data. We will first normalise the testing data and then calculate the corresponding loss.</p><pre><code class="language-julia-repl hljs">julia&gt; x_test_n = Flux.normalise(x_test);
 
 julia&gt; loss(model, x_test_n, y_test)
-66.91015f0</code></pre><p>The loss is not as small as the loss of the training data, but it looks good! This also shows that our model is not overfitting!</p><hr/><p>Summarising this tutorial, we started by generating a random yet correlated dataset for our <code>custom model</code>. We then saw how a simple linear regression model could be built with and without <code>Flux</code>, and how they were almost identical. </p><p>Next, we trained the model by manually writing down the Gradient Descent algorithm and optimising the loss. We also saw how <code>Flux</code> provides various wrapper functionalities and keeps the API extremely intuitive and simple for the users. </p><p>After getting familiar with the basics of <code>Flux</code> and <code>Julia</code>, we moved ahead to build a machine learning model for a real dataset. We repeated the exact same steps, but this time with a lot more features and data points, and by harnessing <code>Flux</code>&#39;s full capabilities. In the end, we developed a training loop that was smarter than the hardcoded one and ran the model on our normalised dataset to conclude the tutorial.</p><div class="admonition is-info"><header class="admonition-header">Info</header><div class="admonition-body"><p>Originally published on 21 November 2022, by Saransh Chopra.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../reference/models/functors/">« Nested Structures – Functors.jl</a><a class="docs-footer-nextpage" href="../logistic_regression/">Logistic Regression »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+66.91015f0</code></pre><p>The loss is not as small as the loss of the training data, but it looks good! This also shows that our model is not overfitting!</p><hr/><p>Summarising this tutorial, we started by generating a random yet correlated dataset for our <code>custom model</code>. We then saw how a simple linear regression model could be built with and without <code>Flux</code>, and how they were almost identical. </p><p>Next, we trained the model by manually writing down the Gradient Descent algorithm and optimising the loss. We also saw how <code>Flux</code> provides various wrapper functionalities and keeps the API extremely intuitive and simple for the users. </p><p>After getting familiar with the basics of <code>Flux</code> and <code>Julia</code>, we moved ahead to build a machine learning model for a real dataset. We repeated the exact same steps, but this time with a lot more features and data points, and by harnessing <code>Flux</code>&#39;s full capabilities. In the end, we developed a training loop that was smarter than the hardcoded one and ran the model on our normalised dataset to conclude the tutorial.</p><div class="admonition is-info"><header class="admonition-header">Info</header><div class="admonition-body"><p>Originally published on 21 November 2022, by Saransh Chopra.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../reference/models/functors/">« Nested Structures – Functors.jl</a><a class="docs-footer-nextpage" href="../logistic_regression/">Logistic Regression »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/tutorials/logistic_regression/index.html b/dev/tutorials/logistic_regression/index.html
index b9ea5cc819..79a55a5a6b 100644
--- a/dev/tutorials/logistic_regression/index.html
+++ b/dev/tutorials/logistic_regression/index.html
@@ -131,4 +131,4 @@
 flux_accuracy(x, y) = 0.98
 
 julia&gt; flux_loss(flux_model, x, flux_y_onehot)
-0.6952386604624324</code></pre><p>We see a very similar final loss and accuracy.</p><hr/><p>Summarising this tutorial, we saw how we can run a logistic regression algorithm in Julia with and without using Flux. We started by importing the classic <code>Iris</code> dataset, and one hot encoded the labels. Next, we defined our model, the loss function, and the accuracy, all by ourselves.</p><p>Finally, we trained the model by manually writing down the Gradient Descent algorithm and optimising the loss. Interestingly, we implemented most of the functions on our own, and then parallelly compared them with the functionalities provided by Flux!</p><div class="admonition is-info"><header class="admonition-header">Info</header><div class="admonition-body"><p>Originally published on 1st April 2023, by Saransh Chopra.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../linear_regression/">« Linear Regression</a><a class="docs-footer-nextpage" href="../model_zoo/">Model Zoo »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+0.6952386604624324</code></pre><p>We see a very similar final loss and accuracy.</p><hr/><p>Summarising this tutorial, we saw how we can run a logistic regression algorithm in Julia with and without using Flux. We started by importing the classic <code>Iris</code> dataset, and one hot encoded the labels. Next, we defined our model, the loss function, and the accuracy, all by ourselves.</p><p>Finally, we trained the model by manually writing down the Gradient Descent algorithm and optimising the loss. Interestingly, we implemented most of the functions on our own, and then parallelly compared them with the functionalities provided by Flux!</p><div class="admonition is-info"><header class="admonition-header">Info</header><div class="admonition-body"><p>Originally published on 1st April 2023, by Saransh Chopra.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../linear_regression/">« Linear Regression</a><a class="docs-footer-nextpage" href="../model_zoo/">Model Zoo »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/tutorials/model_zoo/index.html b/dev/tutorials/model_zoo/index.html
index 4588ecf825..6ecd4adef4 100644
--- a/dev/tutorials/model_zoo/index.html
+++ b/dev/tutorials/model_zoo/index.html
@@ -3,4 +3,4 @@
   function gtag(){dataLayer.push(arguments);}
   gtag('js', new Date());
   gtag('config', 'UA-36890222-9', {'page_path': location.pathname + location.search + location.hash});
-</script><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img class="docs-light-only" src="../../assets/logo.png" alt="Flux logo"/><img class="docs-dark-only" src="../../assets/logo-dark.png" alt="Flux logo"/></a><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Welcome</a></li><li><span class="tocitem">Guide</span><ul><li><a class="tocitem" href="../../guide/models/quickstart/">Quick Start</a></li><li><a class="tocitem" href="../../guide/models/overview/">Fitting a Line</a></li><li><a class="tocitem" href="../../guide/models/basics/">Gradients and Layers</a></li><li><a class="tocitem" href="../../guide/models/custom_layers/">Custom Layers</a></li><li><a class="tocitem" href="../../guide/training/training/">Training</a></li><li><a class="tocitem" href="../../guide/models/recurrence/">Recurrence</a></li><li><a class="tocitem" href="../../guide/gpu/">GPU Support</a></li><li><a class="tocitem" href="../../guide/saving/">Saving &amp; Loading</a></li><li><a class="tocitem" href="../../guide/performance/">Performance Tips</a></li></ul></li><li><a class="tocitem" href="../../ecosystem/">Ecosystem</a></li><li><span class="tocitem">Reference</span><ul><li><a class="tocitem" href="../../reference/models/layers/">Built-in Layers</a></li><li><a class="tocitem" href="../../reference/models/activation/">Activation Functions</a></li><li><a class="tocitem" href="../../reference/utilities/">Weight Initialisation</a></li><li><a class="tocitem" href="../../reference/models/losses/">Loss Functions</a></li><li><a class="tocitem" href="../../reference/training/reference/">Training API</a></li><li><a class="tocitem" href="../../reference/training/optimisers/">Optimisation Rules</a></li><li><a class="tocitem" href="../../reference/outputsize/">Shape Inference</a></li><li><a class="tocitem" href="../../reference/destructure/">Flat vs. Nested</a></li><li><a class="tocitem" href="../../reference/training/callbacks/">Callback Helpers</a></li><li><a class="tocitem" href="../../reference/training/zygote/">Gradients – Zygote.jl</a></li><li><a class="tocitem" href="../../reference/data/mlutils/">Batching Data – MLUtils.jl</a></li><li><a class="tocitem" href="../../reference/data/onehot/">OneHotArrays.jl</a></li><li><a class="tocitem" href="../../reference/models/nnlib/">Low-level Operations – NNlib.jl</a></li><li><a class="tocitem" href="../../reference/models/functors/">Nested Structures – Functors.jl</a></li></ul></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../linear_regression/">Linear Regression</a></li><li><a class="tocitem" href="../logistic_regression/">Logistic Regression</a></li><li class="is-active"><a class="tocitem" href>Model Zoo</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">Tutorials</a></li><li class="is-active"><a href>Model Zoo</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Model Zoo</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl" title="View the repository on GitHub"><span class="docs-icon fa-brands"></span><span class="docs-label is-hidden-touch">GitHub</span></a><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/tutorials/model_zoo.md" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Model-Zoo"><a class="docs-heading-anchor" href="#Model-Zoo">Model Zoo</a><a id="Model-Zoo-1"></a><a class="docs-heading-anchor-permalink" href="#Model-Zoo" title="Permalink"></a></h1><p>The <a href="https://github.com/FluxML/model-zoo">model zoo</a> is a collection of examples that demonstrate how to build and train models using Flux. The examples are organised by domain and include vision, text, and audio. Each example includes a description of the model, the data used, and the training process.</p><p>Some of the examples are pedagogical, see for instance</p><ul><li><a href="https://github.com/FluxML/model-zoo/tree/master/vision/mlp_mnist">Multilayer Perceptron</a></li><li><a href="https://github.com/FluxML/model-zoo/tree/master/vision/conv_mnist">Simple Convolutional Neural Network</a></li></ul><p>Others are more advanced, see for instance</p><ul><li><a href="https://github.com/FluxML/model-zoo/blob/master/vision/vae_mnist">Variational Autoencoder</a></li></ul></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../logistic_regression/">« Logistic Regression</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Saturday 3 August 2024 08:23">Saturday 3 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+</script><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img class="docs-light-only" src="../../assets/logo.png" alt="Flux logo"/><img class="docs-dark-only" src="../../assets/logo-dark.png" alt="Flux logo"/></a><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Welcome</a></li><li><span class="tocitem">Guide</span><ul><li><a class="tocitem" href="../../guide/models/quickstart/">Quick Start</a></li><li><a class="tocitem" href="../../guide/models/overview/">Fitting a Line</a></li><li><a class="tocitem" href="../../guide/models/basics/">Gradients and Layers</a></li><li><a class="tocitem" href="../../guide/models/custom_layers/">Custom Layers</a></li><li><a class="tocitem" href="../../guide/training/training/">Training</a></li><li><a class="tocitem" href="../../guide/models/recurrence/">Recurrence</a></li><li><a class="tocitem" href="../../guide/gpu/">GPU Support</a></li><li><a class="tocitem" href="../../guide/saving/">Saving &amp; Loading</a></li><li><a class="tocitem" href="../../guide/performance/">Performance Tips</a></li></ul></li><li><a class="tocitem" href="../../ecosystem/">Ecosystem</a></li><li><span class="tocitem">Reference</span><ul><li><a class="tocitem" href="../../reference/models/layers/">Built-in Layers</a></li><li><a class="tocitem" href="../../reference/models/activation/">Activation Functions</a></li><li><a class="tocitem" href="../../reference/utilities/">Weight Initialisation</a></li><li><a class="tocitem" href="../../reference/models/losses/">Loss Functions</a></li><li><a class="tocitem" href="../../reference/training/reference/">Training API</a></li><li><a class="tocitem" href="../../reference/training/optimisers/">Optimisation Rules</a></li><li><a class="tocitem" href="../../reference/outputsize/">Shape Inference</a></li><li><a class="tocitem" href="../../reference/destructure/">Flat vs. Nested</a></li><li><a class="tocitem" href="../../reference/training/callbacks/">Callback Helpers</a></li><li><a class="tocitem" href="../../reference/training/zygote/">Gradients – Zygote.jl</a></li><li><a class="tocitem" href="../../reference/data/mlutils/">Batching Data – MLUtils.jl</a></li><li><a class="tocitem" href="../../reference/data/onehot/">OneHotArrays.jl</a></li><li><a class="tocitem" href="../../reference/models/nnlib/">Low-level Operations – NNlib.jl</a></li><li><a class="tocitem" href="../../reference/models/functors/">Nested Structures – Functors.jl</a></li></ul></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../linear_regression/">Linear Regression</a></li><li><a class="tocitem" href="../logistic_regression/">Logistic Regression</a></li><li class="is-active"><a class="tocitem" href>Model Zoo</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">Tutorials</a></li><li class="is-active"><a href>Model Zoo</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Model Zoo</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl" title="View the repository on GitHub"><span class="docs-icon fa-brands"></span><span class="docs-label is-hidden-touch">GitHub</span></a><a class="docs-navbar-link" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/tutorials/model_zoo.md" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Model-Zoo"><a class="docs-heading-anchor" href="#Model-Zoo">Model Zoo</a><a id="Model-Zoo-1"></a><a class="docs-heading-anchor-permalink" href="#Model-Zoo" title="Permalink"></a></h1><p>The <a href="https://github.com/FluxML/model-zoo">model zoo</a> is a collection of examples that demonstrate how to build and train models using Flux. The examples are organised by domain and include vision, text, and audio. Each example includes a description of the model, the data used, and the training process.</p><p>Some of the examples are pedagogical, see for instance</p><ul><li><a href="https://github.com/FluxML/model-zoo/tree/master/vision/mlp_mnist">Multilayer Perceptron</a></li><li><a href="https://github.com/FluxML/model-zoo/tree/master/vision/conv_mnist">Simple Convolutional Neural Network</a></li></ul><p>Others are more advanced, see for instance</p><ul><li><a href="https://github.com/FluxML/model-zoo/blob/master/vision/vae_mnist">Variational Autoencoder</a></li></ul></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../logistic_regression/">« Logistic Regression</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 9 August 2024 18:43">Friday 9 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>