Remove `train!` from quickstart example #2110

mcabbott · 2022-11-11T04:21:21Z

As suggested in #2104, this replaces train! with an explicit loop in the quickstart example. Also steals the idea from there about showing logging code.

It does not completely scrap the example, which otherwise tries to make quite a few points carefully & concisely. Not sure it fits on one screen anymore, though.

github-actions · 2022-11-11T04:22:20Z

Once the build has completed, you can preview any updated documentation at this URL: https://fluxml.ai/Flux.jl/previews/PR2110/ in ~20 minutes

Edit: especially https://fluxml.ai/Flux.jl/previews/PR2110/models/quickstart/

MilesCranmer · 2022-11-11T04:39:33Z

LGTM! Although I would suggest you just remove the README example as it is intimidating and unclear. If the goal is to get the user to the docs, I would just leave the API a surprise, so they see it in the best light.

docs/src/models/quickstart.md

MilesCranmer · 2022-11-11T05:08:35Z

For the start example, I would some syntax changes to make things more idiomatic to newcomers to Flux.jl/Julia who might have previous DL experience (probably in Python). In due time they will learn map, ..., eachcol, but I think they don't need it all at once. I also removed the BatchNorm - not sure why it was used there but found it a bit confusing. If people need it they can probably find it, right?

# Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
noisy = rand(Float32, 2, 1000)                                    # 2×1000 Matrix{Float32}
truth = map(col -> xor(col...), eachcol(noisy .> 0.5))            # 1000-element Vector{Bool}

# Define our model, a multi-layer perceptron with one hidden layer of size 3:
model = Chain(Dense(2 => 3, tanh), BatchNorm(3), Dense(3 => 2), softmax)

# The model encapsulates parameters, randomly initialised. Its initial output is:
out1 = model(noisy)                                               # 2×1000 Matrix{Float32}

into this:

# Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
noisy = rand(Float32, 2, 1000)                                    # 2×1000 Matrix{Float32}
truth = [xor(noisy[1, i] > 0.5, noisy[2, i] > 0.5) for i=1:1000]  # 1000-element Vector{Bool}

# Define our model, a multi-layer perceptron with one hidden layer of size 16:
model = Chain(
    Dense(2 => 16), tanh,
    Dense(16 => 16), tanh,
    Dense(16 => 2), softmax
)

# The model encapsulates parameters, randomly initialised. Its initial output is:
out1 = model(noisy)                                               # 2×1000 Matrix{Float32}

(keeping the activation outside the layer is more idiomatic, showing it inside makes me confused about what I can put inside Chain. They'll learn later on that they can do this though.)

docs/src/models/quickstart.md

ToucheSir · 2022-11-11T05:13:18Z

I think it best not to put the activation functions outside of their associated layers, because Julia assigns special meanings to some of them beyond fn(::Array) == fn.(::Array):

julia> x = rand(2, 2)
2×2 Matrix{Float64}:
 0.479567  0.200181
 0.719861  0.0244788

julia> tanh(x) == tanh.(x)
false

julia> tanh(x)
2×2 Matrix{Float64}:
 0.409203  0.178483
 0.641832  0.00344357

julia> tanh.(x)
2×2 Matrix{Float64}:
 0.445897  0.197549
 0.616823  0.0244739

softmax however should stay outside, because it only comes as a "vectorized" function (makes no sense to apply element-wise). This also clues users into the difference between activation functions and functional layers.

MilesCranmer · 2022-11-11T05:16:50Z

softmax however should stay outside, because it only comes as a "vectorized" function (makes no sense to apply element-wise). This also clues users into the difference between activation functions and functional layers.

Good point! Okay fine with me to keep them inside.

MilesCranmer · 2022-11-11T05:20:58Z

I would implement this change though:

- truth = map(col -> xor(col...), eachcol(noisy .> 0.5))            # 1000-element Vector{Bool}
+ truth = [xor(noisy[1, i] > 0.5, noisy[2, i] > 0.5) for i=1:1000]  # 1000-element Vector{Bool}

map, ..., eachcol, .> - just too many things to learn in a single line. (In all honesty, I didn't even know eachcol myself until now, despite developing packages in Julia for a couple of years...!)

mcabbott · 2022-11-11T05:22:03Z

Worse, the model proposed doesn't run:

julia> # The model encapsulates parameters, randomly initialised. Its initial output is:
       out1 = model(noisy)                                               
ERROR: DimensionMismatch: matrix is not square: dimensions are (16, 1000)
Stacktrace:
 [1] checksquare
   @ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/LinearAlgebra.jl:239 [inlined]

tanh acting on a matrix already means something. (This is why I was unhappy about making the others auto-broadcast.)

Some care has gone into this thing. The reason BatchNorm is there is partly that it really does improve performance on this small model. (The one you propose is much much larger.) It is arranged that training is good but not perfect, since with a perfect classifier the first and last images look identical.

And also because it's a common layer which has non-parameter state, which is also encapsulated in the layer. If you don't encapsulate & handle this separately, it's a bit of a pain.

MilesCranmer · 2022-11-11T05:29:31Z

tanh acting on a matrix already means something. (This is why I was unhappy about making the others auto-broadcast.)

Scary indeed! Maybe a warning should be printed if tanh or exp is passed nakedly to Chain? (since it would be really weird for a user to do this and actually mean it?) If I had set the batchsize equal to the number of features, it wouldn't have errored and I might never have realized this.

MilesCranmer · 2022-11-11T05:52:43Z

README.md

 ```julia
 using Flux  # should install everything for you, including CUDA

 x = hcat(digits.(0:3, base=2, pad=2)...) |> gpu  # let's solve the XOR problem!
-y = Flux.onehotbatch(xor.(eachrow(x)...), 0:1) |> gpu
+y = Flux.onehotbatch(xor.(eachrow(cpu(x))...), 0:1) |> gpu


Why cpu(x) |> gpu? Are some of these functions not implemented on GPU?

This gives scalar indexing warnings if run on the GPU.

Maybe just move them both to the GPU when creating the dataloader?

I tried a few things, but they are all ugly, or don't work. Maybe simpler to give up on squeezing |> gpu into this model.

ba975db gives up on train! here too. Every attempt seemed pretty convoluted. I guess this is a sign I don't believe in the interface.

It gains a few lines, but at least it's pretty clear what each one does. Explicit parameters will remove the Flux.params line.

Something like this, no more params:

model = Chain(Dense(2 => 3, sigmoid), BatchNorm(3), Dense(3 => 2)) optim = Flux.setup(Adam(0.1, (0.7, 0.95)), model) for _ in 1:100 grad = gradient(m -> Flux.logitcrossentropy(m(x), y), model)[1] Flux.update!(optim, model, grad) # this changes model & optim end

Very clean! Multiple dispatch rules 🚀

I couldn't find a version of this model which was both compact and clear. So I gave up & replaced it with a curve-fitting problem, 918fc0b.

But IDK, maybe we should give up & go back to having no example in the readme.

I like the

grad = gradient(m -> Flux.logitcrossentropy(m(x), y), model)[1] Flux.update!(optim, model, grad) # this changes model & optim

example a lot. Why not just show this one?

(or the equivalent, but with the MSE loss)

codecov-commenter · 2022-11-11T06:18:57Z

Codecov Report

Base: 86.86% // Head: 86.86% // No change to project coverage 👍

Coverage data is based on head (c9cde50) compared to base (065c191).
Patch has no changes to coverable lines.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2110   +/-   ##
=======================================
  Coverage   86.86%   86.86%           
=======================================
  Files          19       19           
  Lines        1469     1469           
=======================================
  Hits         1276     1276           
  Misses        193      193

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

docs/src/models/quickstart.md

MilesCranmer · 2022-11-25T02:16:09Z

README.md

 ```julia
-using Flux  # should install everything for you, including CUDA
+using Flux, Plots
+data = [([x], x-cbrt(x)) for x in range(-2, 2, 100)]


Suggested change

data = [([x], x-cbrt(x)) for x in range(-2, 2, 100)]

data = [([x], x - x^3) for x in range(-2, 2, 100)]

Something simpler (In all honesty I have never seen cbrt before... wasn't sure what it was)

I guess the point is to make a pretty curve which isn't obviously trivial to fit:

Does it matter if the function isn't immediately familiar?

Ah, sorry, I edited my comment to be x - x^3. I agree x - x^2 looks way too simple.

(When I read cbrt I wasn't sure if the output was scalar or not in the example - whereas ^ is more clear to me)

Now with 2x-x^3

wasn't sure if the output was scalar or not in the example

This example is a little odd in that it takes a trivial vector input, and returns a scalar. As it makes the loss nice and simple. Maybe that introduces confusion though? E.g.

https://discourse.julialang.org/t/input-to-neural-network/90770

I was going to suggest that it take two features as input instead, like f(x, y) = 2x - y^3 over $(-2:2) \times (-2:2)$. However the downside is it would be harder to visualize (I think a heatmap is good too, though).

Yes. I think if this readme thing exists at all, it needs to be super-simple. It must confirm that Flux is running on your system, and solve a vaguely NN-like problem (not linear regression).

I think the present XOR one invites too much time spent figuring it out, when people should rather go into the docs.

(The rule BTW around here is that PRs need one approval to merge, not necc. from anyone with rights.)

README.md

…hough this is slower

…and doesn't waste lines

mcabbott · 2022-11-26T20:04:26Z

docs/src/models/quickstart.md


 # The model encapsulates parameters, randomly initialised. Its initial output is:
-out1 = model(noisy)                                               # 2×1000 Matrix{Float32}
+out1 = model(noisy |> gpu) |> cpu                                 # 2×1000 Matrix{Float32}


Recent commit runs things on the GPU, as that seems worth showing off. (Even though this is actually slower). One quirk is that model(noisy |> gpu) |> cpu is a bit noisy, but maybe not so confusing to figure out.

Yeah this looks fine to me.

MilesCranmer

LGTM!

Couple minor comments which I'll leave up to you:

I'd either (1) upload images to a separate docs-specific repo, or (2) upload the image file directly to GitHub rather than the git repo (you can do this by drag-and-dropping an image into the markdown), so it doesn't weight down the git history with binary files.
Use a slightly large neural net in the README, so that the fit is better:

model = Chain(Dense(1 => 23, tanh), Dense(23 => 23, tanh), Dense(23 => 1, bias=false), only)

does the trick.

mcabbott · 2022-11-27T05:21:47Z

either (1) upload images to a separate docs-specific repo, or (2) upload the image file directly to GitHub rather than the git repo (you can do this by drag-and-dropping an image into the markdown), so it doesn't weight down the git history

This is a good point. I have unfortunately committed sins to the tune of 1.5MB in #2125 , and if we want more tutorials in here we probably need a plan. This PR is 4% of that (the larger image is already here) so my inclination is to kick the can down the road for now.

MilesCranmer · 2022-11-27T05:58:01Z

Sounds good to me! (I'm a sinner as well; I recently realized some of my repositories were getting large because of it and have started slowly changing this habit)

mcabbott · 2022-11-29T16:54:37Z

Cool, many thanks for taking a look.

If you have a minute, your take on #2114 would also be welcomed.

mcabbott added the documentation label Nov 11, 2022

MilesCranmer reviewed Nov 11, 2022

View reviewed changes

docs/src/models/quickstart.md Outdated Show resolved Hide resolved

MilesCranmer reviewed Nov 11, 2022

View reviewed changes

docs/src/models/quickstart.md Outdated Show resolved Hide resolved

MilesCranmer reviewed Nov 11, 2022

View reviewed changes

MilesCranmer reviewed Nov 17, 2022

View reviewed changes

docs/src/models/quickstart.md Outdated Show resolved Hide resolved

mcabbott force-pushed the quickstart2 branch from 9c99d60 to ecc9eb3 Compare November 20, 2022 22:31

mcabbott requested a review from MilesCranmer November 25, 2022 00:34

MilesCranmer reviewed Nov 25, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

MilesCranmer reviewed Nov 25, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

mcabbott force-pushed the quickstart2 branch from ecc9eb3 to e4c306b Compare November 25, 2022 02:43

mcabbott added 10 commits November 26, 2022 14:09

remove train from quickstart example

5631d6c

fixes & suggestions

adb1f8b

better bullet points

221d435

dump train! and gpu from the readme too

0faaa25

remove a few comments

27e1600

rm mention of Zygote

0cb65e3

maybe we should have a much simpler readme example

4ba8f74

tweaks

467c432

no more cbrt, no more abs2

881280b

remove controversial println code, and make it shorter

ed64705

fix some fences

739197d

mcabbott force-pushed the quickstart2 branch from 5b764cc to 739197d Compare November 26, 2022 19:09

mcabbott added 2 commits November 26, 2022 14:34

maybe this example should run on the GPU, since it easily can, even t…

c0994c7

…hough this is slower

let's replace explicit printing with showprogress macro, it's pretty …

cd33b1a

…and doesn't waste lines

mcabbott force-pushed the quickstart2 branch from bf6517a to cd33b1a Compare November 26, 2022 19:45

mcabbott commented Nov 26, 2022

View reviewed changes

mcabbott added 2 commits November 26, 2022 15:34

add graph of the loss, since we log it? also move to a folder.

9a2abbe

one more .. perhaps

c9cde50

MilesCranmer approved these changes Nov 27, 2022

View reviewed changes

mcabbott merged commit b015b7a into master Nov 27, 2022

mcabbott deleted the quickstart2 branch November 27, 2022 05:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `train!` from quickstart example #2110

Remove `train!` from quickstart example #2110

mcabbott commented Nov 11, 2022

github-actions bot commented Nov 11, 2022 •

edited by mcabbott

Loading

MilesCranmer commented Nov 11, 2022

MilesCranmer commented Nov 11, 2022 •

edited

Loading

ToucheSir commented Nov 11, 2022 •

edited

Loading

MilesCranmer commented Nov 11, 2022

MilesCranmer commented Nov 11, 2022

mcabbott commented Nov 11, 2022 •

edited

Loading

MilesCranmer commented Nov 11, 2022

MilesCranmer Nov 11, 2022

mcabbott Nov 11, 2022

MilesCranmer Nov 11, 2022

mcabbott Nov 16, 2022

mcabbott Nov 16, 2022

mcabbott Nov 16, 2022

MilesCranmer Nov 16, 2022

mcabbott Nov 25, 2022

MilesCranmer Nov 25, 2022

MilesCranmer Nov 25, 2022

codecov-commenter commented Nov 11, 2022 •

edited

Loading

MilesCranmer Nov 25, 2022 •

edited

Loading

mcabbott Nov 25, 2022

MilesCranmer Nov 25, 2022

MilesCranmer Nov 25, 2022

mcabbott Nov 25, 2022

mcabbott Nov 25, 2022

MilesCranmer Nov 25, 2022

mcabbott Nov 25, 2022

mcabbott Nov 25, 2022

mcabbott Nov 26, 2022

MilesCranmer Nov 26, 2022

MilesCranmer left a comment •

edited

Loading

mcabbott commented Nov 27, 2022

MilesCranmer commented Nov 27, 2022

mcabbott commented Nov 29, 2022

	data = [([x], x-cbrt(x)) for x in range(-2, 2, 100)]
	data = [([x], x - x^3) for x in range(-2, 2, 100)]

Remove train! from quickstart example #2110

Remove train! from quickstart example #2110

Conversation

mcabbott commented Nov 11, 2022

github-actions bot commented Nov 11, 2022 • edited by mcabbott Loading

MilesCranmer commented Nov 11, 2022

MilesCranmer commented Nov 11, 2022 • edited Loading

ToucheSir commented Nov 11, 2022 • edited Loading

MilesCranmer commented Nov 11, 2022

MilesCranmer commented Nov 11, 2022

mcabbott commented Nov 11, 2022 • edited Loading

MilesCranmer commented Nov 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Nov 11, 2022 • edited Loading

Codecov Report

MilesCranmer Nov 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MilesCranmer left a comment • edited Loading

Choose a reason for hiding this comment

mcabbott commented Nov 27, 2022

MilesCranmer commented Nov 27, 2022

mcabbott commented Nov 29, 2022

Remove `train!` from quickstart example #2110

Remove `train!` from quickstart example #2110

github-actions bot commented Nov 11, 2022 •

edited by mcabbott

Loading

MilesCranmer commented Nov 11, 2022 •

edited

Loading

ToucheSir commented Nov 11, 2022 •

edited

Loading

mcabbott commented Nov 11, 2022 •

edited

Loading

codecov-commenter commented Nov 11, 2022 •

edited

Loading

MilesCranmer Nov 25, 2022 •

edited

Loading

MilesCranmer left a comment •

edited

Loading