Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test "python3 train.py --model='WaveNet' ",get exception "Conv2DCustomBackpropInputOp only supports NHWC." #140

Open
DayanJ opened this issue Aug 10, 2018 · 10 comments

Comments

@DayanJ
Copy link

DayanJ commented Aug 10, 2018

I used LJSpeech-1.1 data to test
1.After I have run 'Python3 wavenet_preprocess.py', I can get these files.
default
2.I have modifed 'hparams.py' , set "train_with_GTA" to False.
3.After I have run "Python3 train --model='WaveNet' ,I got these errors.
default

My tensorflow version is 1.7.1 and I can't fix this error.

@DanRuta
Copy link

DanRuta commented Aug 10, 2018

What are your:

OS version,
CUDA version,
Python version,
GPU,
and have you got tensorflow gpu or cpu installed (or both)?

You can track this issue here, also:
#73
#87

@gloriouskilka
Copy link

gloriouskilka commented Aug 11, 2018

I have the same issue when I try to run this on CPU.

@DayanJ According to your log, the device placement is CPU, and, I guess, CPU version of Op supports only NHWC order.

If you're going to use your GPU, you should fix whatever prevents TensorFlow from placing your OP to GPU, probably by uninstalling CPU version of TensorFlow.

If you're going to use CPU, I guess you can fastfix this by reordering before this Op using tf.transpose like this https://stackoverflow.com/questions/37689423/convert-between-nhwc-and-nchw-in-tensorflow

Upd: tried fastfix, didn't work, I am TensorFlow noob, don't believe me.
Upd2: something like that, probably. gloriouskilka@a56b300

@Rayhane-mamah
Copy link
Owner

Hello @DayanJ, as suggested by @gloriouskilka, please make sure you only have tensorflow gpu version installed. This is most likely a bug that occurs when you are trying to use CPU on Wavenet.

@Hayes515
Copy link

Hi, @gloriouskilka , I am DayanJ, this is my new account. I didn't install tf gpu before.
I had tried for this advice( gloriouskilka/Tacotron-2-fork@a56b300). It works for only using cpu.
@Rayhane-mamah @DanRuta @gloriouskilka Thanks for your help.

@gloriouskilka
Copy link

@Hayes515 Hi! My fastfix is a bad idea, just proof of concept. You should switch to Nvidia GPU, if you have one, because you will train your network on CPU until the end of the days, I think.

Usually people install both tensorflow and tensorflow-gpu, and sometimes CPU version of tensorflow prevents GPU to be used, so the main advice is: uninstall tensorflow, install only tensorflow-gpu.

@Hayes515
Copy link

@gloriouskilka Hi! you are right,I have switched to Nvidia GPU, but it took me two days to finish it. I installed some packages by Anaconda3 in a new environment T3.This way is convinient.
default
default

The condition of my GPU is below.
default

Thank you!

@gloriouskilka
Copy link

@Hayes515 Yay! You're welcome!

I guess we can close this issue, because it alredy contains all possible solutions with nice screenshots.

@Rayhane-mamah
Copy link
Owner

One last thing before closing this, @Hayes515 you may want to keep your 2nd gpu free as it is holding the model graph for no particular reason. To do this, please add os.environ["CUDA_VISIBLE_DEVICES"] = "0" in the following location:

Tacotron-2/train.py

Lines 36 to 37 in e244457

os.environ['TF_CPP_MIN_LOG_LEVEL'] = str(args.tf_log_level)
run_name = args.name or args.model

That will prevent the run from seeing your 2nd GPU, it seems your graphic display is handled by it so there you go :) Naturally if you want to make multiple runs in parallel you can follow my comment here.

Feel free to close the issue if no other problems are related to this issue. Thanks for using our work ;)

@yvt
Copy link

yvt commented Aug 24, 2018

I needed to run the model on CPU for a testing purpose (because a machine with GPU is currently occupied by another variation of this model) so I would be glad if it could run on CPU.

It looks like the "channel" part of the transposed convolution input is temporarily inserted here:

#[batch_size, 1, cin_channels, time_length]
c = tf.expand_dims(c, axis=1)
for transposed_conv in self.upsample_conv:
c = transposed_conv(c)
#[batch_size, cin_channels, time_length]
c = tf.squeeze(c, [1])
with tf.control_dependencies([tf.assert_equal(tf.shape(c)[-1], tf.shape(x)[-1])]):
c = tf.identity(c, name='control_c_and_x_shape')

And here:

#[batch_size, 1, channels, time_length]
c = tf.expand_dims(c, axis=1)
for upsample_conv in self.upsample_conv:
c = upsample_conv(c)
#[batch_size, channels, time_length]
c = tf.squeeze(c, [1])

I guess that this issue, the restriction of the CPU implementation of Conv2DTranspose, can be worked around by inserting a new dimension as the last dimension (axis=3, NHWC) instead of as the second dimension (axis=1, NCHW). (Also don't forget to change data_format to channels_last)

I'm not entirely sure because this is based on the assumption that these are the only instances where Conv2DTranspose is used and I haven't gotten used to this code base yet. Also I'm not sure how this is fundamentally different from "fastfix"s mentioned by @gloriouskilka. I would really appreciate if someone could confirm if this is the right way to go.

@KarolinaPondel
Copy link

Hi guys,

I'm trying to get WaveNet training working and I keep getting this problem. I can't find its location and don't know how to fix it. Or is there an update on this issue? I only have tansorflow-gpu installed. Tacotron workout went through without any problems.

Exiting due to exception: Conv2DCustomBackpropInputOp only supports NHWC.
[[node WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/Conv2DBackpropInput (defined at /notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py:557) = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/ShapeN, WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/ExpandDims_1, WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Squeeze_grad/Reshape)]]

Caused by op 'WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/Conv2DBackpropInput', defined at:
File "train.py", line 138, in
main()
File "train.py", line 130, in main
wavenet_train(args, log_dir, hparams, args.wavenet_input)
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 346, in wavenet_train
return train(log_dir, args, hparams, input_path)
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 230, in train
model, stats = model_train_mode(args, feeder, hparams, global_step)
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 178, in model_train_mode
model.add_optimizer(global_step)
File "/notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py", line 557, in add_optimizer
gradients = optimizer.compute_gradients(self.tower_loss[i])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 814, in
lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_grad.py", line 517, in _Conv2DGrad
data_format=data_format),
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1229, in conv2d_backprop_input
dilations=dilations, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

...which was originally created as op 'WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D', defined at:
File "train.py", line 138, in
main()
[elided 2 identical lines from previous traceback]
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 230, in train
model, stats = model_train_mode(args, feeder, hparams, global_step)
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 176, in model_train_mode
feeder.input_lengths, x=feeder.inputs)
File "/notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py", line 277, in initialize
y_hat_train = self.step(tower_x[i], tower_c[i], tower_g[i], softmax=False) #softmax is automatically computed inside softmax_cross_entropy if needed
File "/notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py", line 719, in step
x = conv(x)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in call
outputs = self.call(inputs, *args, **kwargs)
File "/notebooks/Tacotron-2/wavenet_vocoder/models/modules.py", line 382, in call
return super(Conv1D1x1, self).call(inputs, incremental=incremental, convolution_queue=convolution_queue)
File "/notebooks/Tacotron-2/wavenet_vocoder/models/modules.py", line 319, in call
outputs = self.layer.call(inputs_)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/layers/convolutional.py", line 384, in call
return super(Conv1D, self).call(inputs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 868, in call
return self.conv_op(inp, filter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 520, in call
return self.call(inp, filter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 204, in call
name=self.name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 193, in _conv1d
name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 553, in new_func
return func(*args, **kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants