included complex step finite method for the gradient computation #74

flavioluiz · 2015-08-15T23:26:03Z

Hi,
I've just included the "complex" step method for the multivariable finite_difference, so it can be used to compute the gradient too.

mlubin · 2015-08-15T23:35:51Z

This is okay, but are you aware of https://github.com/JuliaDiff/ForwardDiff.jl which evaluates gradients exactly? The complex step method is approximate, albeit with a very small error term. The complex step method is also not safe for arbitrary user functions:

julia> gradient(x -> dot(x,[2.0,3.0]), [1.0, 1.0], :complex)
2-element Array{Float64,1}:
 -2.0
 -3.0

CC @jrevels

mlubin · 2015-08-15T23:53:09Z

The methods used in ForwardDiff might also be quite a bit faster than the complex step method depending on the function.

flavioluiz · 2015-08-16T13:53:32Z

Hi! I agree with you: complex step method must be used with care. The user needs to know how each function deals with complex variables (as in your example, the standard library defines the dot product using the conjugate of the first vector and it breaks the complex step method). But I still believe that this method is a good option against finite differences in many cases if proper care is taken, since it lead to "nearly" exact differentiation (actually, unlike finite differences, complex step can get exact result up to numerical precision).

However, I wasn't aware about ForwardDiff! It seems really interesting!! I thought that those "algorithmic differentiation" methods were slower than complex step. Seems I was wrong. I will look forward to check this package deeply. Thank you for your tip!!

flavioluiz · 2015-08-16T15:35:43Z

Hi @mlubin. I've tested ForwardDiff with a few simple examples. Weirdly, complex step lead to an "exact" computation and ForwardDiff doesn't (but very accurate results anyway):

The test function and its exact gradient are:

    f(x) = sin(x[1]) + cos(x[2])
    fd(x) = [cos(x[1]), -sin(x[2])]

With ForwardDiff:

    using ForwardDiff
    g = ForwardDiff.gradient(f)
    g([0.5,0.5]) - fd([0.5,0.5])

which leads to the following error:

    2-element Array{Float64,1}:
     -2.16915e-12
     -2.10881e-12

With complex step:

    julia> gradient(f, [0.5, 0.5],:complex)-fd([0.5,0.5])
    2-element Array{Float64,1}:
     0.0
     0.0

similar results for other simple functions.. I think I am using ForwardDiff wrong. It seems that it uses exactly the same function from Calculus.gradient (which uses central finite differences). I've just started working with Julia, maybe I am doing something wrong.

jrevels · 2015-08-16T18:38:20Z

On my machine:

julia> using ForwardDiff

julia> f(x) = sin(x[1]) + cos(x[2])
f (generic function with 1 method)

julia> fd(x) = [cos(x[1]), -sin(x[2])]
fd (generic function with 1 method)

julia> g = ForwardDiff.gradient(f)
gradf (generic function with 1 method)

julia> g([0.5,0.5]) - fd([0.5,0.5])
2-element Array{Float64,1}:
 0.0
 0.0

...so somehow there seems to be a namespacing issue? I even tried it with using Calculus and things still seemed fine on my end:

julia> using ForwardDiff

julia> using Calculus

julia> f(x) = sin(x[1]) + cos(x[2])
f (generic function with 1 method)

julia> fd(x) = [cos(x[1]), -sin(x[2])]
fd (generic function with 1 method)

julia> g = ForwardDiff.gradient(f)
gradf (generic function with 1 method)

julia> g([0.5,0.5]) - fd([0.5,0.5])
2-element Array{Float64,1}:
 0.0
 0.0

jrevels · 2015-08-16T18:53:27Z

@flavioluiz I just pushed a commit to ForwardDiff.jl that should totally prevent method switch-ups on ForwardDiff's end, if you'd like to try things out again to see if that fixes it. If it still end up picking the wrong method, would you mind opening an issue in the ForwardDiff.jl repo with the example? Thanks.

mlubin · 2015-08-16T19:37:00Z

@flavioluiz, which version of ForwardDiff were you testing with?

flavioluiz · 2015-08-16T22:43:07Z

ouch. I've realized that I was supposed to use Julia v0.4. I was using Julia v0.3.11 (and last version of ForwardDiff obtained from Pkg).

I've just installed v0.4 and added ForwardDiff from Pkg. Same problem (for some reason, the version from Pkg uses the gradient from Calculus).

Then I cloned 'ForwardDiff' repo and with last version it works fine! Exactly as showed by @jrevels . :-)

jrevels · 2015-08-17T03:09:07Z

Same problem (for some reason, the version from Pkg uses the gradient from Calculus).

Yeah, we haven't yet officially tagged the new version of ForwardDiff, so the version in METADATA.jl hasn't received the update (that's the version you get from Pkg.add("...")) . We're waiting a little bit before releasing the new version to give folks time to adjust to the new API/critique it.

flavioluiz · 2015-08-18T15:23:31Z

Hi again. I've been playing with ForwardDiff: it is really nice! In particular, I am impressed with the Hessian computation (in my tests, ForwardDiff was much faster than Calculus finite differences). I am looking forward to try this API with my simulations!!

I compared the gradient methods using Complex step vs ForwardDiff for several functions. Both methods computes the derivative "exactly". In most cases complex step was about two times faster!

I don't know much about these "automatic differentiation" methods (I quickly read the wiki about the subject - still trying to understand). However, I have a feeling that the philosophy behind "complex step" is not so far from this Forward mode method. While Forward mode extends real algebra using dual numbers, complex step profits from the standard implementation of complex arithmetic (whose basic operations are pretty much the same of the dual algebra!). In complex step: if epsilon is "small enough", epsilon^2 and higher orders easily becomes 0.0 in double precision (while first order term still keeps full numerical precision); then the derivative is "almost always exact". if I understood well, forward mode dual numbers uses epsilon^2 = 0 as definition and all the basic operations are done taking this definition into account - so there is no need to chose a small epsilon and there is no danger: the derivative is always exact. Is it right or am I talking nonsense? ;-)

mlubin · 2015-08-19T02:58:31Z

@flavioluiz, yes, you're basically correct. My understanding is that the complex step rule is a "poor man's AD" which came from the matlab world where the only number types which are fast to compute with are real and complex floating-point values.

Do you have any examples you could share where the complex step is faster than ForwardDiff? The only examples I can think of where this could happen is if using BLAS/LAPACK operations, since there's BLAS for complex numbers but not for ForwardDiff numbers.

flavioluiz · 2015-08-19T22:34:40Z

@mlubin , right: I was using matrix/vector multiplications:

Q = randn(N,N)
f(x) = dot(conj(x),Q*x)

and similar functions with additional nonlinearities (taking sinus of the result, exponentials, multiplying by sum of x, etc).

After your comment, I tried do be sure of not using any LAPACK operation:

function f(x)
     res = 0.
     for i = 1:length(x)
         res += x[i]*x[i]
     end
     return res
end

Obtaining the gradient using complex step is still faster than ForwardDifffor small vector x (up to about 12 elements). For more variables, ForwardDiff clearly outperforms complex step.

jrevels · 2015-08-27T15:04:23Z

function f(x)
     res = 0.
     for i = 1:length(x)
         res += x[i]*x[i]
     end
     return res
end

@flavioluiz This function isn't type-stable, and is probably causing ForwardDiff to run a bit slower than it normally would. To make the function type-stable, try changing this line:

res = 0.

To this:

res = zero(eltype(x))

That way the type of res won't be variable inside the loop for non-Float64 vectors (ForwardDiff needs to wrap your original values in new types to do forward mode automatic differentiation).

I'd be interested to see if it's still problematic after this change.

included complex step finite method for the gradient computation

b2d008a

solves a problem for derivative when x = 0

6a5f478

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

included complex step finite method for the gradient computation #74

included complex step finite method for the gradient computation #74

flavioluiz commented Aug 15, 2015

mlubin commented Aug 15, 2015

mlubin commented Aug 15, 2015

flavioluiz commented Aug 16, 2015

flavioluiz commented Aug 16, 2015

jrevels commented Aug 16, 2015

jrevels commented Aug 16, 2015

mlubin commented Aug 16, 2015

flavioluiz commented Aug 16, 2015

jrevels commented Aug 17, 2015

flavioluiz commented Aug 18, 2015

mlubin commented Aug 19, 2015

flavioluiz commented Aug 19, 2015

jrevels commented Aug 27, 2015

included complex step finite method for the gradient computation #74

Are you sure you want to change the base?

included complex step finite method for the gradient computation #74

Conversation

flavioluiz commented Aug 15, 2015

mlubin commented Aug 15, 2015

mlubin commented Aug 15, 2015

flavioluiz commented Aug 16, 2015

flavioluiz commented Aug 16, 2015

jrevels commented Aug 16, 2015

jrevels commented Aug 16, 2015

mlubin commented Aug 16, 2015

flavioluiz commented Aug 16, 2015

jrevels commented Aug 17, 2015

flavioluiz commented Aug 18, 2015

mlubin commented Aug 19, 2015

flavioluiz commented Aug 19, 2015

jrevels commented Aug 27, 2015