-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update index.qmd #548
base: master
Are you sure you want to change the base?
Update index.qmd #548
Conversation
Preview the changes: https://turinglang.org/docs/pr-previews/548 |
I think there might still be a misunderstanding here. The keyword argument So the real question is:
|
Ah, I see. Maybe @torfjelde is the right person to ask about this? I've not dug in to exactly where to sort out RD gradient stuff yet in Turing.jl. |
Typically this occurs in the initial step of the sampler, i.e. once (there are exceptions, but in those cases you can't really do much better in our case).
Unfortunately not for ReverseDiff.jl in compiled mode (they can however just not use compiled mode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments 👍
@torfjelde I think that's the key confusion between us. My understanding is that, regardless of whether it is compiled or not, the tape only captures execution for one version of the control flow. Thus, the tape always becomes invalid if the control flow changes (e.g. different branch of an Source: https://juliadiff.org/ReverseDiff.jl/dev/api/#The-AbstractTape-API |
The tape is purely internal and neither stored nor reused if
This is wrong. The tape is only reused if |
Okay then IIUC it's a matter of inconsistency between
|
Note that in ADTypes, the "compile" argument to AutoReverseDiff is defined ambiguously too. So we should perhaps add more details to the struct, something like AutoReverseDiff(tape=true, compile=false)? |
I'd love for you to chime in here @devmotion @torfjelde: SciML/ADTypes.jl#91 |
I'm not too opinionated about this, as compilation without caching seems somewhat useless? Are there scenarios where you'd like to do that? |
Indeed you can't compile a tape you never record in the first place. In any case, I think the ambiguous terminology was fixed by SciML/ADTypes.jl#91. It's just a shame that the word "compile" was chosen instead of "record", given how both are used in ReverseDiff's documentation. But it's a sunk cost now. |
Co-authored-by: Guillaume Dalle <[email protected]>
Co-authored-by: Guillaume Dalle <[email protected]>
Co-authored-by: Tor Erlend Fjelde <[email protected]>
Co-authored-by: Tor Erlend Fjelde <[email protected]>
Thank you all for raising this problem / helping to fix it -- it's good that we've gotten to the bottom of this! I'm basically happy with this. @gdalle and @torfjelde could take a final look at this, and let me know whether you are happy. |
I agree with that 😕 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Added one final comment that would be nice to shoehorn in here since we're already making alterations, but looks good:)
If the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is. Currently, this defaults to `ForwardDiff`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about this, but can we maybe also add the following paragraph at the very end (see the comment below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest way to determine which backend to use is to benchmark the model using TuringBenchmarking.jl
and run the following:
using TuringBenchmarking
benchmark_model(model, adbackends=[AutoForwardDiff(), AutoReverseDiff(), ...])
where the adbackends
kwarg is used to specify which backends you want to benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you fully switch to DI you can also use DifferentiationInterfaceTest.benchmark_differentiation
for this ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point @torfjelde . However, would it make more sense to just link to TuringBenchmarking.jl, rather than copying + pasting in a code snippet which may become outdated? Alternatively, is there a simple way to put example benchmarking code in here which runs whenever the docs are built?
Addresses part of #547 .
@gdalle does this read more correctly?