Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow v2, parameters contains 'Tensor' so cannot be numpy()'d #131

Open
cyan-at opened this issue Sep 10, 2022 · 3 comments
Open

tensorflow v2, parameters contains 'Tensor' so cannot be numpy()'d #131

cyan-at opened this issue Sep 10, 2022 · 3 comments

Comments

@cyan-at
Copy link

cyan-at commented Sep 10, 2022

Hi all,

We are trying to compute the wasserstein distance (minimize cx s.t. Ax = b) where b is the neural network output. We aim to get d(wasserstein dist) / d(nn theta) so we can train

As tensorflow is the backend, and we are using deepxde, we are working with Tensorflowv2's eagertensor vs tensor. EagerTensors have .numpy(), Tensors do not.

I've used your example code to achieve what I want in a small example like below. Here y_pred (which is a placeholder for the network output), is an EagerTensor, so concat'd with another EagerTensor = EagerTensor, numpy()-able

rho_0_tf = tf.constant(rho_0, shape=(100,))
y_pred = tf.Variable(rho_I, shape=(100,))

with tf.GradientTape() as tape:
  # solve the problem, setting the values of A, b to A_tf, b_tf
  param = tf.concat([rho_0_tf, y_pred], 0)
  print(type(param))
  x_sol, = cvxpylayer(param)
  wass_dist1 = tf.tensordot(cvector.T, x_sol, 1)
#   wass_dist2 = tf.matmul(cvector2, x_sol)

wass_dist3 = cvector.T @ x_sol.numpy()

print("wass_dist1=", wass_dist1)
# print("wass_dist2=", wass_dist2)
print("wass_dist3=", wass_dist3)

# compute the gradient of the summed solution with respect to A, b
grad_ypred = tape.gradient(wass_dist1, [y_pred])
print("grad_ypred", grad_ypred[0].numpy())

Here is the intent implemented as a loss function:

def rho0_WASS(y_true, y_pred):
    param = tf.concat([rho_0_tf, y_pred], 0)
    print(type(param))
    x_sol, = cvxpylayer(param)
    # TODO(handle infeasible)
    wass_dist = tf.tensordot(cvector.T, x_sol, 1)
    return wass_dist

But this throws the following exception:

site-packages/cvxpylayers/tensorflow/cvxpylayer.py", line 154, in _compute  *
        params = [p.numpy() for p in params]

    AttributeError: 'Tensor' object has no attribute 'numpy'

I know there is a way to evaluate a tfv2 Tensor into something that can be numpy()'d, by defining a py_func, and that is what we did before but because we could not get a gradient of the wass dist wrt network param so the loss never converged, that is why we came to your library.

This is related to #121, in that f(network params) => Parameters. But I think the core of our problem is cvxpylayers as I understand can only take Parameters that are 'eager' / numpy()-able, or is there some way around this?
Maybe we should use a different backend?
Please share any insights / advice, thank you in advance

@cyan-at
Copy link
Author

cyan-at commented Sep 11, 2022

I switched to a pytorch (the version against CUDA 11.3) and with a pytorch loss function:

rho_0_tensor = torch.from_numpy(
    rho_0
).requires_grad_(False)

cvector_tensor = torch.from_numpy(
    cvector.reshape(-1)
).requires_grad_(False)

rho_0_tensor = rho_0_tensor.to(cuda0)
cvector_tensor = cvector_tensor.to(cuda0)

print(type(rho_0_tensor))

def rho0_WASS(y_true, y_pred):
    param = torch.cat((rho_0_tensor, y_pred), 0)
    print(type(param))
    x_sol, = cvxpylayer(param)
    # TODO(handle infeasible)
    wass_dist = torch.matmul(cvector_tensor, x_sol)
    return wass_dist

I get this error while training, similar to above:

ges/cvxpylayers/tensorflow/cvxpylayer.py:154, in <listcomp>(.0)
    152 def _compute(self, params, solver_args={}):
    153     tf_params = params
--> 154     params = [p.numpy() for p in params]
    156     # infer whether params are batched
    157     batch_sizes = []

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

@cyan-at
Copy link
Author

cyan-at commented Sep 11, 2022

And if I convert y_pred to the host machine

y_pred = y_pred.to(cpu)

Then I see this:

--> 154     params = [p.numpy() for p in params]
    156     # infer whether params are batched
    157     batch_sizes = []

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

This is roughly the same problem as in tfv2, where you cannot evaluate a Parameter as the network output without taking it out of the graph and losing the gradient

@cyan-at
Copy link
Author

cyan-at commented Sep 11, 2022

When I try to find a autodiff gradient with pytorch on a simple case (based on your examples), it seems possible to get a gradient of a cvxpy solution w.r.t a cvxpy Parameter. Note in the print output the device=cpu, suggesting this operation is happening on the cpu.

wass_dist1 = torch.matmul(cvector_tensor, x_sol)
print("wass_dist1=", wass_dist1)
wass_dist1.backward()
print(y_pred.grad)
wass_dist1= tensor(0.6330, device='cpu', dtype=torch.float64, grad_fn=<DotBackward0>)
tensor([-70.2294, -59.9991, -47.2306, -31.3524, -11.6804,  -1.8298,  -1.9436,
         -2.0575,  -2.1714,  -2.2852,  -2.3991,  -2.5129,  -2.6267,  -2.7405,
         -2.8542,  -2.9680,  -3.0817,  -3.1954,  -3.3090,  -3.4227,  -3.5364,
         -3.6500,  -3.7637,  -3.8773,  -3.9909,  -4.1045,  -4.2180,  -4.3315,
         -4.4451,  -4.5586,  -4.6721,  -4.7856,  -4.8991,  -5.0125,  -5.1260,
         -5.2395,  -5.3529,  -5.4663,  -5.5796,  -5.6929,  -5.8062,  -5.9195,
         -6.0327,  -6.1460,  -6.2593,  -6.3725,  -6.4858,  -6.5991,  -6.7123,
         -6.8259,  -6.9322,  -7.0382,  -7.1443,  -7.2503,  -7.3564,  -7.4625,
         -7.5685,  -7.6746,  -7.7808,  -7.8869,  -7.9930,  -8.0991,  -8.2053,
         -8.3115,  -8.4178,  -8.5169,  -8.6159,  -8.7149,  -8.8138,  -8.9128,
         -9.0118,  -9.1108,  -9.2026,  -9.2943,  -9.3860,  -9.4777,  -9.5694,
         -9.6539,  -9.7383,  -9.8226,  -9.9070,  -9.9841, -10.0612, -10.1383,
        -10.2081, -10.2778, -10.3403, -10.4027, -10.4578, -10.5129, -10.5607,
        -10.6085, -10.6489, -10.6893, -10.7223, -10.7480, -10.7737, -10.7921,
        -10.8031, -10.8068], device='cpu', dtype=torch.float64)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant