You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why Evaluating the finite difference requires only two forward passes for the weights and two backward passes for α, and the complexity is reduced from O(|α||w|) to O(|α|+|w|) ?
Looking at equation 7, we have a second-order partial derivative which is computationally expensive to compute. To solve this, the finite difference method is used. <-- how is second-order partial derivative related to finite difference method ?
We also note that when momentum is enabled for weight optimisation, the one-step unrolled learning objective in equation 6 is modified accordingly and all of our analysis still applies. <-- How is momentum directly related to the need of applying chain rule to equation 6 ?
The text was updated successfully, but these errors were encountered:
I have few questions on the section : Approximate Architecture Gradient in the paper
O(|α||w|)
toO(|α|+|w|)
?The text was updated successfully, but these errors were encountered: