From 3db4431d0514d095e4f27a5e38d0f0a210dc4ae6 Mon Sep 17 00:00:00 2001 From: Carlo Lucibello Date: Wed, 22 Mar 2023 09:23:22 +0100 Subject: [PATCH 1/5] docs for MultiHeadAttention --- .DS_Store | Bin 0 -> 6148 bytes docs/.DS_Store | Bin 0 -> 6148 bytes docs/src/models/layers.md | 11 ++++++++- docs/src/models/nnlib.md | 24 ++++++++++++++----- docs/src/tutorials/2021-10-08-dcgan-mnist.md | 2 +- 5 files changed, 29 insertions(+), 8 deletions(-) create mode 100644 .DS_Store create mode 100644 docs/.DS_Store diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..b00850a7bb12ed26254a7dff76e4d40f362e727c GIT binary patch literal 6148 zcmeHK%}T>S5Z-O0O({YSDjpZS7EDVo#Y>3#0!H+pQWH}&7_+5G&7l->)EDwmd>&_Z zH)1ho5jz9B-~8@oKgj+t#<;(T2aGw4F&i2pN2Ni~-56?_WJHc*L}fmUQW=5$Zeo8O z@Y^jGGs*5*{{8!-S(@a9>wfT7+uGXh*d4oP-v*De42rN=uS4kGj(wB=Y&Q)!o19sQ$PMrPaa^Md8!d>;nay)eVVmunH zR$Y7V@aXt_@{~Sj@>SExfo&x_25Wc&4?FdQL literal 0 HcmV?d00001 diff --git a/docs/.DS_Store b/docs/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..915d5752021f0d258460c94df0d22b0d7ef28ee6 GIT binary patch literal 6148 zcmeHKPfNov6i@cYbqt{g6^{Y01G}-y@KWmh0#@{(GFv*dSevo7?l1;D>KF2(_<4LU zNnyi!6>;xD@_TuIlI91^OBiF^E205oHe<|!hR9K=5j5Aj8YUQ#t2v@@na#sQhBedt zO%r~5n}w`kF-zFy_kV=*B+hcj`Q(jetG(B;I#$oR_n+j_&x5?kykK#Qqbn(su+oF@ zI-V`3_QAPK^B_)V3zZN@GYGl6iPK0fJz1ntrgDAlu)0=vY9Fmu183M5&blvFESiw6e8~b_n7il8XN3d2IRU{!XKnxHA#K5jH zU@C&u+f@Q+-^2hh@FN3wJ_u-tuEA2HIy#`k>ofXWh$x`rTLMuSbPbjo!2`l|Dxgl~ z=83^|I@pDYa}Aanbvol}WthjTTs>a6S{>{{g){DIq@EZc2DTY!>YJaA|EH&aR SXjkcgbP-U5P)7{>0s~(u(@IkS literal 0 HcmV?d00001 diff --git a/docs/src/models/layers.md b/docs/src/models/layers.md index c0e1c57307..b4667e2ef3 100644 --- a/docs/src/models/layers.md +++ b/docs/src/models/layers.md @@ -10,7 +10,7 @@ The `Dense` exemplifies several features: * It take an `init` keyword, which accepts a function acting like `rand`. That is, `init(2,3,4)` should create an array of this size. Flux has [many such functions](@ref man-init-funcs) built-in. All make a CPU array, moved later with [`gpu`](@ref Flux.gpu) if desired. -* The bias vector is always intialised [`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero. +* The bias vector is always initialised [`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero. * It is annotated with [`@functor`](@ref Functors.@functor), which means that [`params`](@ref Flux.params) will see the contents, and [`gpu`](@ref Flux.gpu) will move their arrays to the GPU. @@ -54,6 +54,15 @@ SamePad Flux.flatten ``` +## MultiHeadAttention + +The basic blocks needed to implement [Transformer](https://arxiv.org/abs/1706.03762) architectures. See also the functional counterparts +documented in NNlib's [Attention](@ref) section. + +```@docs +MultiHeadAttention +``` + ### Pooling These layers are commonly used after a convolution layer, and reduce the size of its output. They have no trainable parameters. diff --git a/docs/src/models/nnlib.md b/docs/src/models/nnlib.md index 72b8481f56..cf2618eb97 100644 --- a/docs/src/models/nnlib.md +++ b/docs/src/models/nnlib.md @@ -2,9 +2,20 @@ Flux re-exports all of the functions exported by the [NNlib](https://github.com/FluxML/NNlib.jl) package. This includes activation functions, described on [their own page](@ref man-activations). Many of the functions on this page exist primarily as the internal implementation of Flux layer, but can also be used independently. + +## Attention + +Primitives for the [`MultiHeadAttention`](ref) layer. + +```@docs +NNlib.dot_product_attention +NNlib.dot_product_attention_scores +NNlib.make_causal_mask +``` + ## Softmax -`Flux`'s `logitcrossentropy` uses `NNlib.softmax` internally. +`Flux`'s [`logitcrossentropy`](@ref) uses [`NNlib.logsoftmax`](@ref) internally. ```@docs softmax @@ -13,7 +24,8 @@ logsoftmax ## Pooling -`Flux`'s `AdaptiveMaxPool`, `AdaptiveMeanPool`, `GlobalMaxPool`, `GlobalMeanPool`, `MaxPool`, and `MeanPool` use `NNlib.PoolDims`, `NNlib.maxpool`, and `NNlib.meanpool` as their backend. +`Flux`'s [`AdaptiveMaxPool`](@ref), [`AdaptiveMeanPool`](@ref), [`GlobalMaxPool`](@ref), [`GlobalMeanPool`](@ref), +[`MaxPool`](@ref), and [`MeanPool`](@ref) use [`NNlib.PoolDims`](@ref), [`NNlib.maxpool`](@ref), and [`NNlib.meanpool`](@ref) as their backend. ```@docs PoolDims @@ -32,7 +44,7 @@ pad_zeros ## Convolution -`Flux`'s `Conv` and `CrossCor` layers use `NNlib.DenseConvDims` and `NNlib.conv` internally. +`Flux`'s [`Conv`](@ref) and [`CrossCor`](@ref) layers use [`NNlib.DenseConvDims`](@ref) and [`NNlib.conv`](@ref) internally. ```@docs conv @@ -44,7 +56,7 @@ DenseConvDims ## Upsampling -`Flux`'s `Upsample` layer uses `NNlib.upsample_nearest`, `NNlib.upsample_bilinear`, and `NNlib.upsample_trilinear` as its backend. Additionally, `Flux`'s `PixelShuffle` layer uses `NNlib.pixel_shuffle` as its backend. +`Flux`'s [`Upsample`](@ref) layer uses [`NNlib.upsample_nearest`](@ref), [`NNlib.upsample_bilinear`](@ref), and [`NNlib.upsample_trilinear`](@ref) as its backend. Additionally, `Flux`'s [`PixelShuffle`](@ref) layer uses [`NNlib.pixel_shuffle`](@ref) as its backend. ```@docs upsample_nearest @@ -60,7 +72,7 @@ pixel_shuffle ## Batched Operations -`Flux`'s `Bilinear` layer uses `NNlib.batched_mul` internally. +`Flux`'s [`Bilinear`](@ref) layer uses [`NNlib.batched_mul`](@ref) internally. ```@docs batched_mul @@ -72,7 +84,7 @@ batched_vec ## Gather and Scatter -`Flux`'s `Embedding` layer uses `NNlib.gather` as its backend. +`Flux`'s [`Embedding`](@ref) layer uses [`NNlib.gather`](@ref) as its backend. ```@docs NNlib.gather diff --git a/docs/src/tutorials/2021-10-08-dcgan-mnist.md b/docs/src/tutorials/2021-10-08-dcgan-mnist.md index f56d47d52f..4da32e5f2c 100644 --- a/docs/src/tutorials/2021-10-08-dcgan-mnist.md +++ b/docs/src/tutorials/2021-10-08-dcgan-mnist.md @@ -101,7 +101,7 @@ We will be using the [relu](https://fluxml.ai/Flux.jl/stable/models/nnlib/#NNlib We will also apply the weight initialization method mentioned in the original DCGAN paper. ```julia -# Function for intializing the model weights with values +# Function for initializing the model weights with values # sampled from a Gaussian distribution with μ=0 and σ=0.02 dcgan_init(shape...) = randn(Float32, shape) * 0.02f0 ``` From a43835ab0a52f9428238d47e9fec22b735c73593 Mon Sep 17 00:00:00 2001 From: Carlo Lucibello Date: Wed, 22 Mar 2023 09:24:28 +0100 Subject: [PATCH 2/5] cleanup --- .DS_Store | Bin 6148 -> 0 bytes .gitignore | 1 + docs/.DS_Store | Bin 6148 -> 0 bytes 3 files changed, 1 insertion(+) delete mode 100644 .DS_Store delete mode 100644 docs/.DS_Store diff --git a/.DS_Store b/.DS_Store deleted file mode 100644 index b00850a7bb12ed26254a7dff76e4d40f362e727c..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 6148 zcmeHK%}T>S5Z-O0O({YSDjpZS7EDVo#Y>3#0!H+pQWH}&7_+5G&7l->)EDwmd>&_Z zH)1ho5jz9B-~8@oKgj+t#<;(T2aGw4F&i2pN2Ni~-56?_WJHc*L}fmUQW=5$Zeo8O z@Y^jGGs*5*{{8!-S(@a9>wfT7+uGXh*d4oP-v*De42rN=uS4kGj(wB=Y&Q)!o19sQ$PMrPaa^Md8!d>;nay)eVVmunH zR$Y7V@aXt_@{~Sj@>SExfo&x_25Wc&4?FdQL diff --git a/.gitignore b/.gitignore index 45b845a41b..ccb9aaf97f 100644 --- a/.gitignore +++ b/.gitignore @@ -8,3 +8,4 @@ deps .vscode Manifest.toml LocalPreferences.toml +.DS_Store diff --git a/docs/.DS_Store b/docs/.DS_Store deleted file mode 100644 index 915d5752021f0d258460c94df0d22b0d7ef28ee6..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 6148 zcmeHKPfNov6i@cYbqt{g6^{Y01G}-y@KWmh0#@{(GFv*dSevo7?l1;D>KF2(_<4LU zNnyi!6>;xD@_TuIlI91^OBiF^E205oHe<|!hR9K=5j5Aj8YUQ#t2v@@na#sQhBedt zO%r~5n}w`kF-zFy_kV=*B+hcj`Q(jetG(B;I#$oR_n+j_&x5?kykK#Qqbn(su+oF@ zI-V`3_QAPK^B_)V3zZN@GYGl6iPK0fJz1ntrgDAlu)0=vY9Fmu183M5&blvFESiw6e8~b_n7il8XN3d2IRU{!XKnxHA#K5jH zU@C&u+f@Q+-^2hh@FN3wJ_u-tuEA2HIy#`k>ofXWh$x`rTLMuSbPbjo!2`l|Dxgl~ z=83^|I@pDYa}Aanbvol}WthjTTs>a6S{>{{g){DIq@EZc2DTY!>YJaA|EH&aR SXjkcgbP-U5P)7{>0s~(u(@IkS From 5f9e05782a08151dc756e82b6e3ac1f777d89c98 Mon Sep 17 00:00:00 2001 From: Carlo Lucibello Date: Wed, 22 Mar 2023 10:00:16 +0100 Subject: [PATCH 3/5] news --- NEWS.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/NEWS.md b/NEWS.md index 9db14d47d5..9b82dc5347 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,14 +1,16 @@ # Flux Release Notes +## v0.13.15 +* Added [MultiHeadAttention](https://github.com/FluxML/Flux.jl/pull/2146) layer. ## v0.13.14 * Fixed various deprecation warnings, from `Zygone.@nograd` and `Vararg`. +* Initial support for `AMDGPU` via extension mechanism. +* Add `gpu_backend` preference to select GPU backend using `LocalPreference.toml`. +* Add `Flux.gpu_backend!` method to switch between GPU backends. ## v0.13.13 * Added `f16` which changes precision to `Float16`, recursively. -* Initial support for AMDGPU via extension mechanism. -* Add `gpu_backend` preference to select GPU backend using `LocalPreference.toml`. -* Add `Flux.gpu_backend!` method to switch between GPU backends. ## v0.13.12 * CUDA.jl 4.0 compatibility. From feec5c9b7eace98dbf3be394ba80cfc23a1f22ba Mon Sep 17 00:00:00 2001 From: Carlo Lucibello Date: Wed, 22 Mar 2023 22:18:00 +0100 Subject: [PATCH 4/5] Update docs/src/models/nnlib.md Co-authored-by: Saransh Chopra --- docs/src/models/nnlib.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/models/nnlib.md b/docs/src/models/nnlib.md index cf2618eb97..2634990db5 100644 --- a/docs/src/models/nnlib.md +++ b/docs/src/models/nnlib.md @@ -72,7 +72,7 @@ pixel_shuffle ## Batched Operations -`Flux`'s [`Bilinear`](@ref) layer uses [`NNlib.batched_mul`](@ref) internally. +`Flux`'s [`Flux.Bilinear`](@ref) layer uses [`NNlib.batched_mul`](@ref) internally. ```@docs batched_mul From 94d0a1c19cd6c153a9529f094b20ad733078231f Mon Sep 17 00:00:00 2001 From: Carlo Lucibello Date: Wed, 22 Mar 2023 22:18:09 +0100 Subject: [PATCH 5/5] Update docs/src/models/nnlib.md Co-authored-by: Saransh Chopra --- docs/src/models/nnlib.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/models/nnlib.md b/docs/src/models/nnlib.md index 2634990db5..b308af4917 100644 --- a/docs/src/models/nnlib.md +++ b/docs/src/models/nnlib.md @@ -15,7 +15,7 @@ NNlib.make_causal_mask ## Softmax -`Flux`'s [`logitcrossentropy`](@ref) uses [`NNlib.logsoftmax`](@ref) internally. +`Flux`'s [`Flux.logitcrossentropy`](@ref) uses [`NNlib.logsoftmax`](@ref) internally. ```@docs softmax