Q1) SophiaH, AdaHessian optimizers give RuntimeError: ~ tensors does not require grad and does not have a grad_fn
in compute_hutchinson_hessian()
.
create_graph
must be set True
when calling backward()
. here's an example.
torch.autograd.grad
with complex gradient flows sometimes leads memory leak issues, and you might encounter OOM issue. related issue