-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove train!
from quickstart example
#2110
Conversation
Once the build has completed, you can preview any updated documentation at this URL: https://fluxml.ai/Flux.jl/previews/PR2110/ in ~20 minutes Edit: especially https://fluxml.ai/Flux.jl/previews/PR2110/models/quickstart/ |
LGTM! Although I would suggest you just remove the README example as it is intimidating and unclear. If the goal is to get the user to the docs, I would just leave the API a surprise, so they see it in the best light. |
For the start example, I would some syntax changes to make things more idiomatic to newcomers to Flux.jl/Julia who might have previous DL experience (probably in Python). In due time they will learn # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
noisy = rand(Float32, 2, 1000) # 2×1000 Matrix{Float32}
truth = map(col -> xor(col...), eachcol(noisy .> 0.5)) # 1000-element Vector{Bool}
# Define our model, a multi-layer perceptron with one hidden layer of size 3:
model = Chain(Dense(2 => 3, tanh), BatchNorm(3), Dense(3 => 2), softmax)
# The model encapsulates parameters, randomly initialised. Its initial output is:
out1 = model(noisy) # 2×1000 Matrix{Float32} into this: # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
noisy = rand(Float32, 2, 1000) # 2×1000 Matrix{Float32}
truth = [xor(noisy[1, i] > 0.5, noisy[2, i] > 0.5) for i=1:1000] # 1000-element Vector{Bool}
# Define our model, a multi-layer perceptron with one hidden layer of size 16:
model = Chain(
Dense(2 => 16), tanh,
Dense(16 => 16), tanh,
Dense(16 => 2), softmax
)
# The model encapsulates parameters, randomly initialised. Its initial output is:
out1 = model(noisy) # 2×1000 Matrix{Float32} (keeping the activation outside the layer is more idiomatic, showing it inside makes me confused about what I can put inside |
I think it best not to put the activation functions outside of their associated layers, because Julia assigns special meanings to some of them beyond julia> x = rand(2, 2)
2×2 Matrix{Float64}:
0.479567 0.200181
0.719861 0.0244788
julia> tanh(x) == tanh.(x)
false
julia> tanh(x)
2×2 Matrix{Float64}:
0.409203 0.178483
0.641832 0.00344357
julia> tanh.(x)
2×2 Matrix{Float64}:
0.445897 0.197549
0.616823 0.0244739
|
Good point! Okay fine with me to keep them inside. |
I would implement this change though: - truth = map(col -> xor(col...), eachcol(noisy .> 0.5)) # 1000-element Vector{Bool}
+ truth = [xor(noisy[1, i] > 0.5, noisy[2, i] > 0.5) for i=1:1000] # 1000-element Vector{Bool}
|
Worse, the model proposed doesn't run:
Some care has gone into this thing. The reason BatchNorm is there is partly that it really does improve performance on this small model. (The one you propose is much much larger.) It is arranged that training is good but not perfect, since with a perfect classifier the first and last images look identical. And also because it's a common layer which has non-parameter state, which is also encapsulated in the layer. If you don't encapsulate & handle this separately, it's a bit of a pain. |
Scary indeed! Maybe a warning should be printed if |
README.md
Outdated
```julia | ||
using Flux # should install everything for you, including CUDA | ||
|
||
x = hcat(digits.(0:3, base=2, pad=2)...) |> gpu # let's solve the XOR problem! | ||
y = Flux.onehotbatch(xor.(eachrow(x)...), 0:1) |> gpu | ||
y = Flux.onehotbatch(xor.(eachrow(cpu(x))...), 0:1) |> gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cpu(x) |> gpu
? Are some of these functions not implemented on GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives scalar indexing warnings if run on the GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just move them both to the GPU when creating the dataloader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a few things, but they are all ugly, or don't work. Maybe simpler to give up on squeezing |> gpu
into this model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ba975db gives up on train!
here too. Every attempt seemed pretty convoluted. I guess this is a sign I don't believe in the interface.
It gains a few lines, but at least it's pretty clear what each one does. Explicit parameters will remove the Flux.params
line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this, no more params
:
model = Chain(Dense(2 => 3, sigmoid), BatchNorm(3), Dense(3 => 2))
optim = Flux.setup(Adam(0.1, (0.7, 0.95)), model)
for _ in 1:100
grad = gradient(m -> Flux.logitcrossentropy(m(x), y), model)[1]
Flux.update!(optim, model, grad) # this changes model & optim
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean! Multiple dispatch rules 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a version of this model which was both compact and clear. So I gave up & replaced it with a curve-fitting problem, 918fc0b.
But IDK, maybe we should give up & go back to having no example in the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the
grad = gradient(m -> Flux.logitcrossentropy(m(x), y), model)[1]
Flux.update!(optim, model, grad) # this changes model & optim
example a lot. Why not just show this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or the equivalent, but with the MSE loss)
Codecov ReportBase: 86.86% // Head: 86.86% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## master #2110 +/- ##
=======================================
Coverage 86.86% 86.86%
=======================================
Files 19 19
Lines 1469 1469
=======================================
Hits 1276 1276
Misses 193 193 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
9c99d60
to
ecc9eb3
Compare
README.md
Outdated
```julia | ||
using Flux # should install everything for you, including CUDA | ||
using Flux, Plots | ||
data = [([x], x-cbrt(x)) for x in range(-2, 2, 100)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data = [([x], x-cbrt(x)) for x in range(-2, 2, 100)] | |
data = [([x], x - x^3) for x in range(-2, 2, 100)] |
Something simpler (In all honesty I have never seen cbrt
before... wasn't sure what it was)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I edited my comment to be x - x^3
. I agree x - x^2
looks way too simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(When I read cbrt
I wasn't sure if the output was scalar or not in the example - whereas ^
is more clear to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't sure if the output was scalar or not in the example
This example is a little odd in that it takes a trivial vector input, and returns a scalar. As it makes the loss nice and simple. Maybe that introduces confusion though? E.g.
https://discourse.julialang.org/t/input-to-neural-network/90770
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest that it take two features as input instead, like f(x, y) = 2x - y^3
over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think if this readme thing exists at all, it needs to be super-simple. It must confirm that Flux is running on your system, and solve a vaguely NN-like problem (not linear regression).
I think the present XOR one invites too much time spent figuring it out, when people should rather go into the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The rule BTW around here is that PRs need one approval to merge, not necc. from anyone with rights.)
ecc9eb3
to
e4c306b
Compare
5b764cc
to
739197d
Compare
…hough this is slower
…and doesn't waste lines
bf6517a
to
cd33b1a
Compare
|
||
# The model encapsulates parameters, randomly initialised. Its initial output is: | ||
out1 = model(noisy) # 2×1000 Matrix{Float32} | ||
out1 = model(noisy |> gpu) |> cpu # 2×1000 Matrix{Float32} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent commit runs things on the GPU, as that seems worth showing off. (Even though this is actually slower). One quirk is that model(noisy |> gpu) |> cpu
is a bit noisy, but maybe not so confusing to figure out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this looks fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Couple minor comments which I'll leave up to you:
- I'd either (1) upload images to a separate docs-specific repo, or (2) upload the image file directly to GitHub rather than the git repo (you can do this by drag-and-dropping an image into the markdown), so it doesn't weight down the git history with binary files.
- Use a slightly large neural net in the README, so that the fit is better:
model = Chain(Dense(1 => 23, tanh), Dense(23 => 23, tanh), Dense(23 => 1, bias=false), only)
does the trick.
This is a good point. I have unfortunately committed sins to the tune of 1.5MB in #2125 , and if we want more tutorials in here we probably need a plan. This PR is 4% of that (the larger image is already here) so my inclination is to kick the can down the road for now. |
Sounds good to me! (I'm a sinner as well; I recently realized some of my repositories were getting large because of it and have started slowly changing this habit) |
Cool, many thanks for taking a look. If you have a minute, your take on #2114 would also be welcomed. |
As suggested in #2104, this replaces
train!
with an explicit loop in the quickstart example. Also steals the idea from there about showing logging code.It does not completely scrap the example, which otherwise tries to make quite a few points carefully & concisely. Not sure it fits on one screen anymore, though.