Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

03_03_vae_digits_train: TypeError: unsupported format string passed to numpy.ndarray.__format__ #73

Open
jdinkla opened this issue Oct 10, 2020 · 6 comments

Comments

@jdinkla
Copy link

jdinkla commented Oct 10, 2020

I am running on Ubuntu 18.04 with Python 3.6.9 and when running 03_03_vae_digits_train I encounter the following error:

vae.train(     
    x_train
    , batch_size = BATCH_SIZE
    , epochs = EPOCHS
    , run_folder = RUN_FOLDER
    , print_every_n_batches = PRINT_EVERY_N_BATCHES
    , initial_epoch = INITIAL_EPOCH
)

I installed using the newest pip with pip install -r requirements.txt and no errors occured and i had to install graphviz.

BTW numpy is 1.17.2 as required.

$ pip freeze | grep numpy
numpy==1.17.2
`

```log
Epoch 1/200
1874/1875 [============================>.] - ETA: 0s - loss: 58.4866 - reconstruction_loss: 55.2065 - kl_loss: 3.2801
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-a0cdb3ff19b5> in <module>
      5     , run_folder = RUN_FOLDER
      6     , print_every_n_batches = PRINT_EVERY_N_BATCHES
----> 7     , initial_epoch = INITIAL_EPOCH
      8 )

~/GDL_code/models/VAE.py in train(self, x_train, batch_size, epochs, run_folder, print_every_n_batches, initial_epoch, lr_decay)
    224             , epochs = epochs
    225             , initial_epoch = initial_epoch
--> 226             , callbacks = callbacks_list
    227         )
    228 

~/GDL_code/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
     64   def _method_wrapper(self, *args, **kwargs):
     65     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
---> 66       return method(self, *args, **kwargs)
     67 
     68     # Running inside `run_distribute_coordinator` already.

~/GDL_code/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
    874           epoch_logs.update(val_logs)
    875 
--> 876         callbacks.on_epoch_end(epoch, epoch_logs)
    877         if self.stop_training:
    878           break

~/GDL_code/env/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in on_epoch_end(self, epoch, logs)
    363     logs = self._process_logs(logs)
    364     for callback in self.callbacks:
--> 365       callback.on_epoch_end(epoch, logs)
    366 
    367   def on_train_batch_begin(self, batch, logs=None):

~/GDL_code/env/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in on_epoch_end(self, epoch, logs)
   1175           self._save_model(epoch=epoch, logs=logs)
   1176       else:
-> 1177         self._save_model(epoch=epoch, logs=logs)
   1178     if self.model._in_multi_worker_mode():
   1179       # For multi-worker training, back up the weights and current training

~/GDL_code/env/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in _save_model(self, epoch, logs)
   1194                   int) or self.epochs_since_last_save >= self.period:
   1195       self.epochs_since_last_save = 0
-> 1196       filepath = self._get_file_path(epoch, logs)
   1197 
   1198       try:

~/GDL_code/env/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in _get_file_path(self, epoch, logs)
   1242         # `{mape:.2f}`. A mismatch between logged metrics and the path's
   1243         # placeholders can cause formatting to fail.
-> 1244         return self.filepath.format(epoch=epoch + 1, **logs)
   1245       except KeyError as e:
   1246         raise KeyError('Failed to format this callback filepath: "{}". '

TypeError: unsupported format string passed to numpy.ndarray.__format__
@jdinkla
Copy link
Author

jdinkla commented Oct 10, 2020

On the tensorflow_2 branch.

@jdinkla
Copy link
Author

jdinkla commented Oct 10, 2020

It works on the master branch!

@karaage0703
Copy link

@jdinkla

I change this line like below.

       - checkpoint_filepath=os.path.join(run_folder, "weights/weights-{epoch:03d}-{loss:.2f}.h5")
       + checkpoint_filepath=os.path.join(run_folder, "weights/weights.h5")

Then I can run 03_03_vae_digits_train with no error.

I create google colab notebook based on 03_03_vae_digits_train.

I hope this notebook helps you.

@MarkusMiller
Copy link

Considering the Code around this line:

checkpoint_filepath=os.path.join(run_folder, "weights/weights-{epoch:03d}-{loss:.2f}.h5") checkpoint1 = ModelCheckpoint(checkpoint_filepath, save_weights_only = True, verbose=1) checkpoint2 = ModelCheckpoint(os.path.join(run_folder, 'weights/weights.h5'), save_weights_only = True, verbose=1)

replacing the "weights/weights-{epoch:03d}-{loss:.2f}.h5" with "weights/weights.h5" is sort of pointless, because checkpoint1 and checkpoint2 would be exactly the same...

I tried to figure out what exactly caused the problem but I'm quite unfamiliar with formatting, so I have kind of an idea what {epoch:03d}-{loss:.2f} does (putting a variable 'epoch' formatted with a leading 0 and 3 digits and a variable 'loss' with 2 decimal places into the string?) but not why. So I'm having the same issue and would be very grateful for a fix. Also branch tensorflow_2

@rk-ka
Copy link

rk-ka commented Dec 8, 2020

I faced this same problem. As far as I can tell, the error occurs due to the fact that the return of the loss function is rewritten in the form of a dictionary. To avoid the error, you can remove the last {loss:.2f} In my case:
checkpoint_filepath=os.path.join(run_folder, "weights/weights-{epoch:02d}.h5")

However, in the module "03_04_vae_digits_analysis" I came across the fact that the saved weights in h5 are not loaded into the model. Therefore, I save the weights in .ckpt format.

Working on TF2 branch https://github.com/kubokoHappy/GDL_code_kuboko
Using TF 2.3 with gpu

@olegboev
Copy link

The problem is that the loss value is a vector of batch size, so it is required to calculate its mean.
This fragment:

return {
    "loss": total_loss,
    "reconstruction_loss": reconstruction_loss,
    "kl_loss": kl_loss,
}

should be replaced by this:

return {
    "loss": tf.reduce_mean(total_loss),
    "reconstruction_loss": tf.reduce_mean(reconstruction_loss),
    "kl_loss": tf.reduce_mean(kl_loss),
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants