tensorflow v2, parameters contains 'Tensor' so cannot be numpy()'d #131

cyan-at · 2022-09-10T19:26:56Z

Hi all,

We are trying to compute the wasserstein distance (minimize cx s.t. Ax = b) where b is the neural network output. We aim to get d(wasserstein dist) / d(nn theta) so we can train

As tensorflow is the backend, and we are using deepxde, we are working with Tensorflowv2's eagertensor vs tensor. EagerTensors have .numpy(), Tensors do not.

I've used your example code to achieve what I want in a small example like below. Here y_pred (which is a placeholder for the network output), is an EagerTensor, so concat'd with another EagerTensor = EagerTensor, numpy()-able

rho_0_tf = tf.constant(rho_0, shape=(100,))
y_pred = tf.Variable(rho_I, shape=(100,))

with tf.GradientTape() as tape:
  # solve the problem, setting the values of A, b to A_tf, b_tf
  param = tf.concat([rho_0_tf, y_pred], 0)
  print(type(param))
  x_sol, = cvxpylayer(param)
  wass_dist1 = tf.tensordot(cvector.T, x_sol, 1)
#   wass_dist2 = tf.matmul(cvector2, x_sol)

wass_dist3 = cvector.T @ x_sol.numpy()

print("wass_dist1=", wass_dist1)
# print("wass_dist2=", wass_dist2)
print("wass_dist3=", wass_dist3)

# compute the gradient of the summed solution with respect to A, b
grad_ypred = tape.gradient(wass_dist1, [y_pred])
print("grad_ypred", grad_ypred[0].numpy())

Here is the intent implemented as a loss function:

def rho0_WASS(y_true, y_pred):
    param = tf.concat([rho_0_tf, y_pred], 0)
    print(type(param))
    x_sol, = cvxpylayer(param)
    # TODO(handle infeasible)
    wass_dist = tf.tensordot(cvector.T, x_sol, 1)
    return wass_dist

But this throws the following exception:

site-packages/cvxpylayers/tensorflow/cvxpylayer.py", line 154, in _compute  *
        params = [p.numpy() for p in params]

    AttributeError: 'Tensor' object has no attribute 'numpy'

I know there is a way to evaluate a tfv2 Tensor into something that can be numpy()'d, by defining a py_func, and that is what we did before but because we could not get a gradient of the wass dist wrt network param so the loss never converged, that is why we came to your library.

This is related to #121, in that f(network params) => Parameters. But I think the core of our problem is cvxpylayers as I understand can only take Parameters that are 'eager' / numpy()-able, or is there some way around this?
Maybe we should use a different backend?
Please share any insights / advice, thank you in advance

The text was updated successfully, but these errors were encountered:

cyan-at · 2022-09-11T06:18:55Z

I switched to a pytorch (the version against CUDA 11.3) and with a pytorch loss function:

rho_0_tensor = torch.from_numpy(
    rho_0
).requires_grad_(False)

cvector_tensor = torch.from_numpy(
    cvector.reshape(-1)
).requires_grad_(False)

rho_0_tensor = rho_0_tensor.to(cuda0)
cvector_tensor = cvector_tensor.to(cuda0)

print(type(rho_0_tensor))

def rho0_WASS(y_true, y_pred):
    param = torch.cat((rho_0_tensor, y_pred), 0)
    print(type(param))
    x_sol, = cvxpylayer(param)
    # TODO(handle infeasible)
    wass_dist = torch.matmul(cvector_tensor, x_sol)
    return wass_dist

I get this error while training, similar to above:

ges/cvxpylayers/tensorflow/cvxpylayer.py:154, in <listcomp>(.0)
    152 def _compute(self, params, solver_args={}):
    153     tf_params = params
--> 154     params = [p.numpy() for p in params]
    156     # infer whether params are batched
    157     batch_sizes = []

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

cyan-at · 2022-09-11T06:21:14Z

And if I convert y_pred to the host machine

y_pred = y_pred.to(cpu)

Then I see this:

--> 154     params = [p.numpy() for p in params]
    156     # infer whether params are batched
    157     batch_sizes = []

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

This is roughly the same problem as in tfv2, where you cannot evaluate a Parameter as the network output without taking it out of the graph and losing the gradient

cyan-at · 2022-09-11T07:37:52Z

When I try to find a autodiff gradient with pytorch on a simple case (based on your examples), it seems possible to get a gradient of a cvxpy solution w.r.t a cvxpy Parameter. Note in the print output the device=cpu, suggesting this operation is happening on the cpu.

wass_dist1 = torch.matmul(cvector_tensor, x_sol)
print("wass_dist1=", wass_dist1)
wass_dist1.backward()
print(y_pred.grad)

wass_dist1= tensor(0.6330, device='cpu', dtype=torch.float64, grad_fn=<DotBackward0>)
tensor([-70.2294, -59.9991, -47.2306, -31.3524, -11.6804,  -1.8298,  -1.9436,
         -2.0575,  -2.1714,  -2.2852,  -2.3991,  -2.5129,  -2.6267,  -2.7405,
         -2.8542,  -2.9680,  -3.0817,  -3.1954,  -3.3090,  -3.4227,  -3.5364,
         -3.6500,  -3.7637,  -3.8773,  -3.9909,  -4.1045,  -4.2180,  -4.3315,
         -4.4451,  -4.5586,  -4.6721,  -4.7856,  -4.8991,  -5.0125,  -5.1260,
         -5.2395,  -5.3529,  -5.4663,  -5.5796,  -5.6929,  -5.8062,  -5.9195,
         -6.0327,  -6.1460,  -6.2593,  -6.3725,  -6.4858,  -6.5991,  -6.7123,
         -6.8259,  -6.9322,  -7.0382,  -7.1443,  -7.2503,  -7.3564,  -7.4625,
         -7.5685,  -7.6746,  -7.7808,  -7.8869,  -7.9930,  -8.0991,  -8.2053,
         -8.3115,  -8.4178,  -8.5169,  -8.6159,  -8.7149,  -8.8138,  -8.9128,
         -9.0118,  -9.1108,  -9.2026,  -9.2943,  -9.3860,  -9.4777,  -9.5694,
         -9.6539,  -9.7383,  -9.8226,  -9.9070,  -9.9841, -10.0612, -10.1383,
        -10.2081, -10.2778, -10.3403, -10.4027, -10.4578, -10.5129, -10.5607,
        -10.6085, -10.6489, -10.6893, -10.7223, -10.7480, -10.7737, -10.7921,
        -10.8031, -10.8068], device='cpu', dtype=torch.float64)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorflow v2, parameters contains 'Tensor' so cannot be numpy()'d #131

tensorflow v2, parameters contains 'Tensor' so cannot be numpy()'d #131

cyan-at commented Sep 10, 2022

cyan-at commented Sep 11, 2022

cyan-at commented Sep 11, 2022

cyan-at commented Sep 11, 2022

tensorflow v2, parameters contains 'Tensor' so cannot be numpy()'d #131

tensorflow v2, parameters contains 'Tensor' so cannot be numpy()'d #131

Comments

cyan-at commented Sep 10, 2022

cyan-at commented Sep 11, 2022

cyan-at commented Sep 11, 2022

cyan-at commented Sep 11, 2022