Test "python3 train.py --model='WaveNet' ",get exception "Conv2DCustomBackpropInputOp only supports NHWC." #140

DayanJ · 2018-08-10T02:50:33Z

I used LJSpeech-1.1 data to test
1.After I have run 'Python3 wavenet_preprocess.py', I can get these files.

2.I have modifed 'hparams.py' , set "train_with_GTA" to False.
3.After I have run "Python3 train --model='WaveNet' ,I got these errors.

My tensorflow version is 1.7.1 and I can't fix this error.

DanRuta · 2018-08-10T07:28:55Z

What are your:

OS version,
CUDA version,
Python version,
GPU,
and have you got tensorflow gpu or cpu installed (or both)?

You can track this issue here, also:
#73
#87

gloriouskilka · 2018-08-11T08:22:41Z

I have the same issue when I try to run this on CPU.

@DayanJ According to your log, the device placement is CPU, and, I guess, CPU version of Op supports only NHWC order.

If you're going to use your GPU, you should fix whatever prevents TensorFlow from placing your OP to GPU, probably by uninstalling CPU version of TensorFlow.

If you're going to use CPU, I guess you can fastfix this by reordering before this Op using tf.transpose like this https://stackoverflow.com/questions/37689423/convert-between-nhwc-and-nchw-in-tensorflow

Upd: tried fastfix, didn't work, I am TensorFlow noob, don't believe me.
Upd2: something like that, probably. gloriouskilka@a56b300

Rayhane-mamah · 2018-08-12T05:11:36Z

Hello @DayanJ, as suggested by @gloriouskilka, please make sure you only have tensorflow gpu version installed. This is most likely a bug that occurs when you are trying to use CPU on Wavenet.

Hayes515 · 2018-08-13T07:03:49Z

Hi, @gloriouskilka , I am DayanJ, this is my new account. I didn't install tf gpu before.
I had tried for this advice( gloriouskilka/Tacotron-2-fork@a56b300). It works for only using cpu.
@Rayhane-mamah @DanRuta @gloriouskilka Thanks for your help.

gloriouskilka · 2018-08-13T08:40:10Z

@Hayes515 Hi! My fastfix is a bad idea, just proof of concept. You should switch to Nvidia GPU, if you have one, because you will train your network on CPU until the end of the days, I think.

Usually people install both tensorflow and tensorflow-gpu, and sometimes CPU version of tensorflow prevents GPU to be used, so the main advice is: uninstall tensorflow, install only tensorflow-gpu.

Hayes515 · 2018-08-16T01:47:34Z

@gloriouskilka Hi! you are right,I have switched to Nvidia GPU, but it took me two days to finish it. I installed some packages by Anaconda3 in a new environment T3.This way is convinient.

The condition of my GPU is below.

Thank you!

gloriouskilka · 2018-08-16T09:37:25Z

@Hayes515 Yay! You're welcome!

I guess we can close this issue, because it alredy contains all possible solutions with nice screenshots.

Rayhane-mamah · 2018-08-16T09:50:27Z

One last thing before closing this, @Hayes515 you may want to keep your 2nd gpu free as it is holding the model graph for no particular reason. To do this, please add os.environ["CUDA_VISIBLE_DEVICES"] = "0" in the following location:

Tacotron-2/train.py

Lines 36 to 37 in e244457

    
           os.environ['TF_CPP_MIN_LOG_LEVEL'] = str(args.tf_log_level) 
        
           run_name = args.name or args.model

That will prevent the run from seeing your 2nd GPU, it seems your graphic display is handled by it so there you go :) Naturally if you want to make multiple runs in parallel you can follow my comment here.

Feel free to close the issue if no other problems are related to this issue. Thanks for using our work ;)

yvt · 2018-08-24T03:30:00Z

I needed to run the model on CPU for a testing purpose (because a machine with GPU is currently occupied by another variation of this model) so I would be glad if it could run on CPU.

It looks like the "channel" part of the transposed convolution input is temporarily inserted here:

Tacotron-2/wavenet_vocoder/models/wavenet.py

Lines 467 to 475 in d13dbba

    
           #[batch_size, 1, cin_channels, time_length] 
        
           c = tf.expand_dims(c, axis=1) 
        
           for transposed_conv in self.upsample_conv: 
        
           	c = transposed_conv(c) 
        
           #[batch_size, cin_channels, time_length] 
        
           c = tf.squeeze(c, [1]) 
        
           with tf.control_dependencies([tf.assert_equal(tf.shape(c)[-1], tf.shape(x)[-1])]): 
        
           	c = tf.identity(c, name='control_c_and_x_shape')

And here:

Tacotron-2/wavenet_vocoder/models/wavenet.py

Lines 549 to 554 in d13dbba

    
           #[batch_size, 1, channels, time_length] 
        
           c = tf.expand_dims(c, axis=1) 
        
           for upsample_conv in self.upsample_conv: 
        
           	c = upsample_conv(c) 
        
           #[batch_size, channels, time_length] 
        
           c = tf.squeeze(c, [1])

I guess that this issue, the restriction of the CPU implementation of Conv2DTranspose, can be worked around by inserting a new dimension as the last dimension (axis=3, NHWC) instead of as the second dimension (axis=1, NCHW). (Also don't forget to change data_format to channels_last)

I'm not entirely sure because this is based on the assumption that these are the only instances where Conv2DTranspose is used and I haven't gotten used to this code base yet. Also I'm not sure how this is fundamentally different from "fastfix"s mentioned by @gloriouskilka. I would really appreciate if someone could confirm if this is the right way to go.

KarolinaPondel · 2022-02-11T23:46:44Z

Hi guys,

I'm trying to get WaveNet training working and I keep getting this problem. I can't find its location and don't know how to fix it. Or is there an update on this issue? I only have tansorflow-gpu installed. Tacotron workout went through without any problems.

Exiting due to exception: Conv2DCustomBackpropInputOp only supports NHWC.
[[node WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/Conv2DBackpropInput (defined at /notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py:557) = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/ShapeN, WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/ExpandDims_1, WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Squeeze_grad/Reshape)]]

Caused by op 'WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/Conv2DBackpropInput', defined at:
File "train.py", line 138, in
main()
File "train.py", line 130, in main
wavenet_train(args, log_dir, hparams, args.wavenet_input)
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 346, in wavenet_train
return train(log_dir, args, hparams, input_path)
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 230, in train
model, stats = model_train_mode(args, feeder, hparams, global_step)
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 178, in model_train_mode
model.add_optimizer(global_step)
File "/notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py", line 557, in add_optimizer
gradients = optimizer.compute_gradients(self.tower_loss[i])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 814, in
lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_grad.py", line 517, in _Conv2DGrad
data_format=data_format),
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1229, in conv2d_backprop_input
dilations=dilations, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

...which was originally created as op 'WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D', defined at:
File "train.py", line 138, in
main()
[elided 2 identical lines from previous traceback]
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 230, in train
model, stats = model_train_mode(args, feeder, hparams, global_step)
File "/notebooks/Tacotron-2/wavenet_vocoder/train.py", line 176, in model_train_mode
feeder.input_lengths, x=feeder.inputs)
File "/notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py", line 277, in initialize
y_hat_train = self.step(tower_x[i], tower_c[i], tower_g[i], softmax=False) #softmax is automatically computed inside softmax_cross_entropy if needed
File "/notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py", line 719, in step
x = conv(x)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in call
outputs = self.call(inputs, *args, **kwargs)
File "/notebooks/Tacotron-2/wavenet_vocoder/models/modules.py", line 382, in call
return super(Conv1D1x1, self).call(inputs, incremental=incremental, convolution_queue=convolution_queue)
File "/notebooks/Tacotron-2/wavenet_vocoder/models/modules.py", line 319, in call
outputs = self.layer.call(inputs_)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/layers/convolutional.py", line 384, in call
return super(Conv1D, self).call(inputs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 868, in call
return self.conv_op(inp, filter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 520, in call
return self.call(inp, filter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 204, in call
name=self.name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 193, in _conv1d
name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 553, in new_func
return func(*args, **kwargs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test "python3 train.py --model='WaveNet' ",get exception "Conv2DCustomBackpropInputOp only supports NHWC." #140

Test "python3 train.py --model='WaveNet' ",get exception "Conv2DCustomBackpropInputOp only supports NHWC." #140

DayanJ commented Aug 10, 2018

DanRuta commented Aug 10, 2018

gloriouskilka commented Aug 11, 2018 •

edited

Loading

Rayhane-mamah commented Aug 12, 2018

Hayes515 commented Aug 13, 2018

gloriouskilka commented Aug 13, 2018

Hayes515 commented Aug 16, 2018

gloriouskilka commented Aug 16, 2018

Rayhane-mamah commented Aug 16, 2018

yvt commented Aug 24, 2018 •

edited

Loading

KarolinaPondel commented Feb 11, 2022

Test "python3 train.py --model='WaveNet' ",get exception "Conv2DCustomBackpropInputOp only supports NHWC." #140

Test "python3 train.py --model='WaveNet' ",get exception "Conv2DCustomBackpropInputOp only supports NHWC." #140

Comments

DayanJ commented Aug 10, 2018

DanRuta commented Aug 10, 2018

gloriouskilka commented Aug 11, 2018 • edited Loading

Rayhane-mamah commented Aug 12, 2018

Hayes515 commented Aug 13, 2018

gloriouskilka commented Aug 13, 2018

Hayes515 commented Aug 16, 2018

gloriouskilka commented Aug 16, 2018

Rayhane-mamah commented Aug 16, 2018

yvt commented Aug 24, 2018 • edited Loading

KarolinaPondel commented Feb 11, 2022

gloriouskilka commented Aug 11, 2018 •

edited

Loading

yvt commented Aug 24, 2018 •

edited

Loading