Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static computation #10

Open
CarloLucibello opened this issue Mar 2, 2017 · 7 comments
Open

static computation #10

CarloLucibello opened this issue Mar 2, 2017 · 7 comments

Comments

@CarloLucibello
Copy link
Collaborator

Hi,
thanks for this nice package (and for Knet as well).
How difficult would be to support static computation, at least for a limited set of operations? Here is a comparison with ReverseDiff.jl where AutoGrad lags two orders of magnitude behind

julia> f(x) = sum(x->x^2,x)
f (generic function with 1 method)

julia> v=rand(100);

julia> @benchmark grad(f)(v)
BenchmarkTools.Trial: 
  memory estimate:  411.38 KiB
  allocs estimate:  9398
  --------------
  minimum time:     1.068 ms (0.00% GC)
  median time:      1.088 ms (0.00% GC)
  mean time:        1.182 ms (6.49% GC)
  maximum time:     5.658 ms (78.79% GC)
  --------------
  samples:          4204
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> df! = ReverseDiff.compile_gradient(f,v)
(::#301) (generic function with 1 method)

julia> y=ones(v);

julia> @benchmark df!(y,v)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     11.353 μs (0.00% GC)
  median time:      11.426 μs (0.00% GC)
  mean time:        11.636 μs (0.00% GC)
  maximum time:     35.284 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

I encounter the same 100x slowdown if I increase the size to v=rand(1000)

Cheers,
Carlo

@denizyuret
Copy link
Owner

denizyuret commented Mar 3, 2017 via email

@jrevels
Copy link

jrevels commented Mar 3, 2017

2. One major feature of Knet is that it supports dynamic computational
graphs, i.e. the ability to construct the CG at runtime so one can use
arbitrary Julia code and change the operations of the model every iteration.

Note that ReverseDiff also supports this. Tape reuse/compilation is simply an additional feature for when you do, in fact, have a static CG (common in many of the non-ML applications I'm targeting with ReverseDiff).

@denizyuret
Copy link
Owner

@jrevels did you try running any Knet examples with ReverseDiff?

@jrevels
Copy link

jrevels commented Mar 8, 2017

@jrevels did you try running any Knet examples with ReverseDiff?

Nope, that could be fun. Looking at the examples it seems like (in most cases) it'd be as easy as switching out the lossgradient with a ReverseDiff-generated gradient rather than an AutoGrad-generated one?

@denizyuret
Copy link
Owner

denizyuret commented Mar 8, 2017 via email

@jrevels
Copy link

jrevels commented Mar 8, 2017

Yup - on the surface, ReverseDiff is standard operator-overloading reverse-mode AD, and supports all the things dynamically re-taping AD libraries generally support. Under the hood, there are a lot of Julia-specific optimizations thrown in, including per-instruction memory caching, mixed-mode AD and indexing elision. It's more in the ADOL-C tradition than the autograd tradition, where tape reuse is encouraged for code with static computation graphs.

I'm curious to see how code with dictionaries will fare. Theoretically, it should be fine, but it's not something I test for (I'm more in the traditional optimization world than the ML world). For example, ReverseDiff's API is currently only written to differentiate functions whose arguments are scalars or arrays (though dictionaries/arbitrary data structures are totally fair game within the function itself).

@denizyuret
Copy link
Owner

See JuliaDiff/ReverseDiff.jl#77 for relevant discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants