-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
included complex step finite method for the gradient computation #74
base: master
Are you sure you want to change the base?
Conversation
This is okay, but are you aware of https://github.com/JuliaDiff/ForwardDiff.jl which evaluates gradients exactly? The complex step method is approximate, albeit with a very small error term. The complex step method is also not safe for arbitrary user functions:
CC @jrevels |
The methods used in ForwardDiff might also be quite a bit faster than the complex step method depending on the function. |
Hi! I agree with you: complex step method must be used with care. The user needs to know how each function deals with complex variables (as in your example, the standard library defines the dot product using the conjugate of the first vector and it breaks the complex step method). But I still believe that this method is a good option against finite differences in many cases if proper care is taken, since it lead to "nearly" exact differentiation (actually, unlike finite differences, complex step can get exact result up to numerical precision). However, I wasn't aware about ForwardDiff! It seems really interesting!! I thought that those "algorithmic differentiation" methods were slower than complex step. Seems I was wrong. I will look forward to check this package deeply. Thank you for your tip!! |
Hi @mlubin. I've tested ForwardDiff with a few simple examples. Weirdly, complex step lead to an "exact" computation and ForwardDiff doesn't (but very accurate results anyway): The test function and its exact gradient are:
With ForwardDiff:
which leads to the following error:
With complex step:
similar results for other simple functions.. I think I am using ForwardDiff wrong. It seems that it uses exactly the same function from Calculus.gradient (which uses central finite differences). I've just started working with Julia, maybe I am doing something wrong. |
On my machine: julia> using ForwardDiff
julia> f(x) = sin(x[1]) + cos(x[2])
f (generic function with 1 method)
julia> fd(x) = [cos(x[1]), -sin(x[2])]
fd (generic function with 1 method)
julia> g = ForwardDiff.gradient(f)
gradf (generic function with 1 method)
julia> g([0.5,0.5]) - fd([0.5,0.5])
2-element Array{Float64,1}:
0.0
0.0 ...so somehow there seems to be a namespacing issue? I even tried it with julia> using ForwardDiff
julia> using Calculus
julia> f(x) = sin(x[1]) + cos(x[2])
f (generic function with 1 method)
julia> fd(x) = [cos(x[1]), -sin(x[2])]
fd (generic function with 1 method)
julia> g = ForwardDiff.gradient(f)
gradf (generic function with 1 method)
julia> g([0.5,0.5]) - fd([0.5,0.5])
2-element Array{Float64,1}:
0.0
0.0 |
@flavioluiz I just pushed a commit to ForwardDiff.jl that should totally prevent method switch-ups on ForwardDiff's end, if you'd like to try things out again to see if that fixes it. If it still end up picking the wrong method, would you mind opening an issue in the ForwardDiff.jl repo with the example? Thanks. |
@flavioluiz, which version of |
ouch. I've realized that I was supposed to use Julia v0.4. I was using Julia v0.3.11 (and last version of I've just installed v0.4 and added ForwardDiff from Pkg. Same problem (for some reason, the version from Pkg uses the gradient from Calculus). Then I cloned 'ForwardDiff' repo and with last version it works fine! Exactly as showed by @jrevels . :-) |
Yeah, we haven't yet officially tagged the new version of ForwardDiff, so the version in METADATA.jl hasn't received the update (that's the version you get from |
Hi again. I've been playing with I compared the gradient methods using Complex step vs ForwardDiff for several functions. Both methods computes the derivative "exactly". In most cases complex step was about two times faster! I don't know much about these "automatic differentiation" methods (I quickly read the wiki about the subject - still trying to understand). However, I have a feeling that the philosophy behind "complex step" is not so far from this Forward mode method. While Forward mode extends real algebra using dual numbers, complex step profits from the standard implementation of complex arithmetic (whose basic operations are pretty much the same of the dual algebra!). In complex step: if epsilon is "small enough", epsilon^2 and higher orders easily becomes 0.0 in double precision (while first order term still keeps full numerical precision); then the derivative is "almost always exact". if I understood well, forward mode dual numbers uses epsilon^2 = 0 as definition and all the basic operations are done taking this definition into account - so there is no need to chose a small epsilon and there is no danger: the derivative is always exact. Is it right or am I talking nonsense? ;-) |
@flavioluiz, yes, you're basically correct. My understanding is that the complex step rule is a "poor man's AD" which came from the matlab world where the only number types which are fast to compute with are real and complex floating-point values. Do you have any examples you could share where the complex step is faster than |
@mlubin , right: I was using matrix/vector multiplications: Q = randn(N,N)
f(x) = dot(conj(x),Q*x) and similar functions with additional nonlinearities (taking sinus of the result, exponentials, multiplying by sum of After your comment, I tried do be sure of not using any LAPACK operation: function f(x)
res = 0.
for i = 1:length(x)
res += x[i]*x[i]
end
return res
end Obtaining the gradient using complex step is still faster than |
@flavioluiz This function isn't type-stable, and is probably causing ForwardDiff to run a bit slower than it normally would. To make the function type-stable, try changing this line:
To this: res = zero(eltype(x)) That way the type of I'd be interested to see if it's still problematic after this change. |
Hi,
I've just included the "complex" step method for the multivariable finite_difference, so it can be used to compute the gradient too.