parseq: 'torch.Size' object has no attribute 'rank' #1354

temiwale88 · 2023-10-16T21:15:23Z

temiwale88
Oct 16, 2023

Bug description

Thanks team for including parseq and vitstr!
It's not yet clear how to pass in our PIL Images to parseq and vitstr_base.
When I follow baudm's implementation for transforming the image first, I get an error (see below).

Also, do we get parseq weights from baudm if we set pretrained=False?

Code snippet to reproduce the bug

def get_transform(img_size: Tuple[int], augment: bool = False, rotation: int = 0):
    transforms = []

    if augment:
        transforms.append(rand_augment_transform())  # Assuming you have your own augment function

    if rotation:
        transforms.append(T.RandomRotation(rotation))  # Apply random rotation

    transforms.extend([
        T.Resize((img_size[1], img_size[0]), T.InterpolationMode.BICUBIC), #note height and width
        T.ToTensor(),
        T.Normalize(0.5, 0.5)  # Normalize for RGB images
    ])
    
    return T.Compose(transforms)


# Per baudm/parseq: Model expects a batch of images with shape: (B, C, H, W) but if you're using parseq from docTR it's B, H, W, C
img_size = (128, 32)

# Load your PIL image
pil_image = Image.open('<image_here>.png').convert('RGB')

transform = get_transform(img_size)

transformed_image = transform(pil_image)
transformed_image = transformed_image.permute(1, 2, 0) #move channel to the end

# Use the model
end_of_pipeline_detect_model = parseq(pretrained=False)
end_of_pipeline_detect_model(transformed_image.unsqueeze(0))

Error traceback

AttributeError: Exception encountered when calling layer 'patch_embedding_1' (type PatchEmbedding).

'torch.Size' object has no attribute 'rank'

Environment

#skip

Deep Learning backend

is_tf_available: True
is_torch_available: True

Answered by felixT2K

Oct 17, 2023

Hi @temiwale88 👋🏼 ,

If you only want to predict already cropped images you can do the following:

from doctr.models import recognition_predictor
from doctr.io import DocumentFile

doc = DocumentFile.from_images(['/home/felix/Desktop/text1.png', '/home/felix/Desktop/text2.png'])
reco_predictor = recognition_predictor("parseq", pretrained=True, batch_size=128)

res = reco_predictor(doc)
print(res)

For example:

Output:
[('Text', 0.9996758699417114), ('Data', 0.9975508451461792)]

I see you have installed both backends (TF and PT - this is not recommended for any prod system only for develop and testing)
You can switch between by doing:
torch: USE_TORCH=1 python3 /path/to/your/script.py
tens…

View full answer

felixT2K · 2023-10-17T08:37:06Z

felixT2K
Oct 17, 2023

Hi @temiwale88 👋🏼 ,

If you only want to predict already cropped images you can do the following:

from doctr.models import recognition_predictor
from doctr.io import DocumentFile

doc = DocumentFile.from_images(['/home/felix/Desktop/text1.png', '/home/felix/Desktop/text2.png'])
reco_predictor = recognition_predictor("parseq", pretrained=True, batch_size=128)

res = reco_predictor(doc)
print(res)

For example:

Output:
[('Text', 0.9996758699417114), ('Data', 0.9975508451461792)]

I see you have installed both backends (TF and PT - this is not recommended for any prod system only for develop and testing)
You can switch between by doing:
torch: USE_TORCH=1 python3 /path/to/your/script.py
tensorflow: USE_TF=1 python3 /path/to/your/script.py

Keep in mind the pretrained version of parseq (PyTorch) is only available on the main branch currently and will be public available with the next release.

All models are pretained on a mindee internal dataset (~11M real world word crops).
We don't provide the original checkpoints only our from scratch trained ones.
Loading the original checkpoints from baudm will not work because we don't use the vit implementation from timm as backbone we have our own transformer / vit implementation.

Additional:
If you want to fine tune for example parseq to your own needs take a look at:
https://mindee.github.io/doctr/using_doctr/custom_models_training.html

0 replies

felixT2K · 2023-10-17T08:38:31Z

felixT2K
Oct 17, 2023

Moving this to a discussion because it's not a bug :)

4 replies

temiwale88 Oct 17, 2023
Author

Thanks @felixT2K ! This works! How can I get the prediction for each detected text though like in the original baudm/parseq (vs. a confidence for the overall text)?

temiwale88 Oct 17, 2023
Author

For posterity, I dug into the ...doctr\models\recognition\parseq\tensorflow.py (or pytorch if that's your implementation) and outputted:

print(decoded_strings_pred, [
            preds_prob[i, : len(word)] if word else 0.0
            for i, word in enumerate(word_values)
        ])

in this code block for the PARSeqPostProcessor class:

class PARSeqPostProcessor(_PARSeqPostProcessor):
    """Post processor for PARSeq architecture

    Args:
        vocab: string containing the ordered sequence of supported characters
    """

    def __call__(
        self,
        logits: tf.Tensor,
    ) -> List[Tuple[str, float]]:
        # compute pred with argmax for attention models
        out_idxs = tf.math.argmax(logits, axis=2)
        preds_prob = tf.math.reduce_max(tf.nn.softmax(logits, axis=-1), axis=-1)
        
        # decode raw output of the model with tf_label_to_idx
        out_idxs = tf.cast(out_idxs, dtype="int32")
        embedding = tf.constant(self._embedding, dtype=tf.string)
        decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(embedding, out_idxs), axis=-1)
        decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
        decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value="not valid")[:, 0]
        word_values = [word.decode() for word in decoded_strings_pred.numpy().tolist()]

        # compute probabilties for each word up to the EOS token
        probs = [
            preds_prob[i, : len(word)].numpy().clip(0, 1).mean().item() if word else 0.0
            for i, word in enumerate(word_values)
        ]
        print(decoded_strings_pred, [
            preds_prob[i, : len(word)] if word else 0.0
            for i, word in enumerate(word_values)
        ])
        return list(zip(word_values, probs))

@felixT2K, let me know if there's an easier api to pull this. If not, I'm good with this. Thanks so much for integrating parseq and vitstr_b & vit_str_s into docTR. Perfect timing!

felixT2K Oct 18, 2023

Hey :)
Ah ok now i got what you want :)
But no there is currently no other way to get the probability for each character in the word.

temiwale88 Oct 18, 2023
Author

Thanks @felixT2K! I'm all good for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parseq: 'torch.Size' object has no attribute 'rank' #1354

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

parseq: 'torch.Size' object has no attribute 'rank' #1354

temiwale88 Oct 16, 2023

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Deep Learning backend

Replies: 2 comments · 4 replies

felixT2K Oct 17, 2023

felixT2K Oct 17, 2023

temiwale88 Oct 17, 2023 Author

temiwale88 Oct 17, 2023 Author

felixT2K Oct 18, 2023

temiwale88 Oct 18, 2023 Author

temiwale88
Oct 16, 2023

Replies: 2 comments 4 replies

felixT2K
Oct 17, 2023

felixT2K
Oct 17, 2023

temiwale88 Oct 17, 2023
Author

temiwale88 Oct 17, 2023
Author

temiwale88 Oct 18, 2023
Author