[bug] TextVectorization + Sequential model doesn't work #20479

WeichenXu123 · 2024-11-11T11:00:56Z

Tensorflow version:
2.19.0-dev20241108

Keras version:
3.7.0.dev2024111103

Installation command: pip install --pre tf-nightly

Reproducing code:

import numpy as np
import tensorflow as tf


def get_text_vec_model(train_samples):
    from tensorflow.keras.layers import TextVectorization
    VOCAB_SIZE = 10
    SEQUENCE_LENGTH = 16
    EMBEDDING_DIM = 16
    vectorizer_layer = TextVectorization(
        max_tokens=VOCAB_SIZE,
        output_mode="int",
        output_sequence_length=SEQUENCE_LENGTH,
    )
    vectorizer_layer.adapt(train_samples)
    model = tf.keras.Sequential(
        [
            vectorizer_layer,
            tf.keras.layers.Embedding(
                VOCAB_SIZE,
                EMBEDDING_DIM,
                name="embedding",
                mask_zero=True,
            ),
            tf.keras.layers.GlobalAveragePooling1D(),
            tf.keras.layers.Dense(16, activation="relu"),
            tf.keras.layers.Dense(1, activation="tanh"),
        ]
    )
    model.compile(optimizer="adam", loss="mse", metrics=["mae"])
    return model


train_samples = np.array(["this is an example", "another example"], dtype=object)
train_labels = np.array([0.4, 0.2])
model = get_text_vec_model(train_samples)


# Error: ValueError: Invalid dtype: object 
model.fit(train_samples, train_labels, epochs=1)

Error stack:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/weichen.xu/miniconda3/envs/mlflow/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/weichen.xu/miniconda3/envs/mlflow/lib/python3.9/site-packages/optree/ops.py", line 747, in tree_map
    return treespec.unflatten(map(func, *flat_args))
ValueError: Invalid dtype: object

The same code works in "keras==3.6.0"

The text was updated successfully, but these errors were encountered:

fchollet · 2024-11-11T17:35:09Z

It seems we're no longer detecting object arrays as string arrays, probably because we've upgraded our numpy dependency. Object arrays are ambiguous since they can contain anything, not just strings.

I recommend instead using tf.string tensors, which are explicitly strings and are also much more memory efficient:

train_samples = tf.convert_to_tensor(["this is an example", "another example"])

This would fix your code example.

fchollet · 2024-11-11T17:51:19Z

I fixed it at HEAD, regardless.

google-ml-butler · 2024-11-11T17:51:21Z

Are you satisfied with the resolution of your issue?
Yes
No

github-actions bot assigned sachinprasadhs Nov 11, 2024

WeichenXu123 mentioned this issue Nov 11, 2024

Fix test_autolog_text_vec_model test mlflow/mlflow#13745

Merged

39 tasks

mehtamansi29 added the type:Bug label Nov 11, 2024

fchollet closed this as completed Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] TextVectorization + Sequential model doesn't work #20479

[bug] TextVectorization + Sequential model doesn't work #20479

WeichenXu123 commented Nov 11, 2024

fchollet commented Nov 11, 2024

fchollet commented Nov 11, 2024

google-ml-butler bot commented Nov 11, 2024

[bug] TextVectorization + Sequential model doesn't work #20479

[bug] TextVectorization + Sequential model doesn't work #20479

Comments

WeichenXu123 commented Nov 11, 2024

fchollet commented Nov 11, 2024

fchollet commented Nov 11, 2024

google-ml-butler bot commented Nov 11, 2024