FluxML · CarloLucibello · Mar 31, 2024 · Mar 30, 2024 · ToucheSir · Mar 31, 2024
diff --git a/docs/src/models/recurrence.md b/docs/src/models/recurrence.md
@@ -8,7 +8,7 @@ To introduce Flux's recurrence functionalities, we will consider the following v
 
 In the above, we have a sequence of length 3, where `x1` to `x3` represent the input at each step (could be a timestamp or a word in a sentence), and `y1` to `y3` are their respective outputs.
 
-An aspect to recognize is that in such a model, the recurrent cells `A` all refer to the same structure. What distinguishes it from a simple dense layer is that the cell `A` is fed, in addition to an input `x`, with information from the previous state of the model (hidden state denoted as `h1` & `h2` in the diagram).
+An aspect to recognise is that in such a model, the recurrent cells `A` all refer to the same structure. What distinguishes it from a simple dense layer is that the cell `A` is fed, in addition to an input `x`, with information from the previous state of the model (hidden state denoted as `h1` & `h2` in the diagram).
 
 In the most basic RNN case, cell A could be defined by the following: 
 

diff --git a/docs/src/performance.md b/docs/src/performance.md
@@ -1,4 +1,4 @@
-# [Performance Tips]((@id man-performance-tips))
+# [Performance Tips](@id man-performance-tips)
 
 All the usual [Julia performance tips apply](https://docs.julialang.org/en/v1/manual/performance-tips/).
 As always [profiling your code](https://docs.julialang.org/en/v1/manual/profile/#Profiling-1) is generally a useful way of finding bottlenecks.
@@ -44,7 +44,7 @@ While one could change the activation function (e.g. to use `0.01f0*x`), the idi
 leaky_tanh(x) = oftype(x/1, 0.01)*x + tanh(x)
 ```
 
-## Evaluate batches as Matrices of features
+## Evaluate batches as matrices of features
 
 While it can sometimes be tempting to process your observations (feature vectors) one at a time
 e.g.

diff --git a/docs/src/saving.md b/docs/src/saving.md
@@ -106,7 +106,7 @@ jldsave("checkpoint_epoch=42.jld2"; model_state, opt_state)
 Models are just normal Julia structs, so it's fine to use any Julia storage
 format to save the struct as it is instead of saving the state returned by [`Flux.state`](@ref). 
 [BSON.jl](https://github.com/JuliaIO/BSON.jl) is particularly convenient for this,
-since it can also save anynomous functions, which are sometimes part of a model definition.
+since it can also save anonymous functions, which are sometimes part of a model definition.
 
 Save a model:
 

diff --git a/docs/src/training/training.md b/docs/src/training/training.md
@@ -33,7 +33,7 @@ end
 ```
 
 This loop can also be written using the function [`train!`](@ref Flux.Train.train!),
-but it's helpful to undersand the pieces first:
+but it's helpful to understand the pieces first:
 
 ```julia
 train!(model, train_set, opt_state) do m, x, y
@@ -43,7 +43,7 @@ end
 
 ## Model Gradients
 
-Fist recall from the section on [taking gradients](@ref man-training) that 
+Fist recall from the section on [taking gradients](@ref man-taking-gradients) that
 `Flux.gradient(f, a, b)` always calls `f(a, b)`, and returns a tuple `(∂f_∂a, ∂f_∂b)`.
 In the code above, the function `f` passed to `gradient` is an anonymous function with
 one argument, created by the `do` block, hence  `grads` is a tuple with one element.
@@ -275,10 +275,10 @@ end
 The term *regularisation* covers a wide variety of techniques aiming to improve the
 result of training. This is often done to avoid overfitting.
 
-Some of these are can be implemented by simply modifying the loss function. 
+Some of these can be implemented by simply modifying the loss function.
 *L₂ regularisation* (sometimes called ridge regression) adds to the loss a penalty
 proportional to `θ^2` for every scalar parameter.
-For a very simple model could be implemented as follows:
+A very simple model could be implemented as follows:
 
 ```julia
 grads = Flux.gradient(densemodel) do m
@@ -318,7 +318,7 @@ decay_opt_state = Flux.setup(OptimiserChain(WeightDecay(0.42), Adam(0.1)), model
 
 Flux's optimisers are really modifications applied to the gradient before using it to update
 the parameters, and `OptimiserChain` applies two such modifications.
-The first, [`WeightDecay`](@ref Flux.WeightDecay) adds `0.42` times original parameter to the gradient,
+The first, [`WeightDecay`](@ref Flux.WeightDecay) adds `0.42` times the original parameter to the gradient,
 matching the gradient of the penalty above (with the same, unrealistically large, constant).
 After that, in either case, [`Adam`](@ref Flux.Adam) computes the final update.
 

diff --git a/docs/src/tutorials/logistic_regression.md b/docs/src/tutorials/logistic_regression.md
@@ -35,7 +35,7 @@ julia> x |> summary
 
 The `y` values here corresponds to a type of iris plant, with a total of 150 data points. The `x` values depict the sepal length, sepal width, petal length, and petal width (all in `cm`) of 150 iris plant (hence the matrix size `4×150`). Different type of iris plants have different lengths and widths of sepals and petals associated with them, and there is a definitive pattern for this in nature. We can leverage this to train a simple classifier that outputs the type of iris plant using the length and width of sepals and petals as inputs.
 
-Our next step would be to convert this data into a form that can be fed to a machine learning model. The `x` values are arranged in a matrix and should ideally be converted to `Float32` type (see [Performance tips](@ref id-man-performance-tips)), but the labels must be one hot encoded. [Here](https://discourse.julialang.org/t/all-the-ways-to-do-one-hot-encoding/64807) is a great discourse thread on different techniques that can be used to one hot encode data with or without using any external Julia package.
+Our next step would be to convert this data into a form that can be fed to a machine learning model. The `x` values are arranged in a matrix and should ideally be converted to `Float32` type (see [Performance tips](@ref man-performance-tips)), but the labels must be one hot encoded. [Here](https://discourse.julialang.org/t/all-the-ways-to-do-one-hot-encoding/64807) is a great discourse thread on different techniques that can be used to one hot encode data with or without using any external Julia package.
 
 ```jldoctest logistic_regression
 julia> x = Float32.(x);