Exporting OCR Predictor (det+rec) as ONNX (inference on CPU) #1243

zohaib-khan5040 · 2023-07-05T16:30:37Z

zohaib-khan5040
Jul 5, 2023

Hello!

I'm using the ocr_predictor class to run an end-to-end text recognition task but am running into a lot of errors and warnings when I try to perform this. My aim is to export to ONNX to run the end-to-end inference in half precision.

If there are any examples related to exporting such a model, please help me out - I was unable to find anything in the repo. If there are any alternatives for running half precision inference on the CPU, that would be very appreciated (I was unable to run bfloat16 inference with the same model).

The code:

model = ocr_predictor(
    det_arch="db_resnet50",
    reco_arch="crnn_vgg16_bn",
    pretrained=True,
    assume_straight_pages=True
)

x = torch.randn(1, 3, 128, 128, dtype=torch.float32)

export_model_to_onnx(
    model,
    "test",
    dummy_input=x
)

felixT2K · 2023-07-06T07:11:01Z

felixT2K
Jul 6, 2023

Hi @zohaib-khan5040 👋🏼 ,

Using half precision is currently only possible on the main branch:
(bfloat16 can raise some issues - i haven't had the chance to test it on cpu yet)

predictor = (
    ocr_predictor(reco_arch="crnn_vgg16_bn", det_arch="db_resnet50", pretrained=True).half()
)

# with gpu
predictor = (
    ocr_predictor(reco_arch="crnn_vgg16_bn", det_arch="db_resnet50", pretrained=True).cuda().half()
)

# bfloat16 

predictor = (
    ocr_predictor(reco_arch="crnn_vgg16_bn", det_arch="db_resnet50", pretrained=True).to(dtype=torch.bfloat16)
)

The whole ocr_predictor can't be exported to ONNX because there is some logic down in OpenCV / shapely / numpy / etc. which is not exportable.

But you can export the detection and recognition model seperatly (Attention: the models are exported up to the logits there is no postprocessing included - this would be on your own at the moment)

detection:

doctr/tests/pytorch/test_models_detection_pt.py

Line 119 in e04e183

@pytest.mark.parametrize(

recognition:

doctr/tests/pytorch/test_models_recognition_pt.py

Line 115 in e04e183

@pytest.mark.parametrize(

We have on track to build something like a onnx_predictor which will work with the exported models but that's a long term task and unfortunately i don't have the time to work on it currently 😅

0 replies

zohaib-khan5040 · 2023-07-06T19:59:38Z

zohaib-khan5040
Jul 6, 2023
Author

Thank you so much for the respone!

Yes, only after a few hours did I realize that exporting only works for the individual models, not the Predictor as a whole.

Could you please help me with understanding how I could ONNX each of the components of the OCR Predictor? I understand that there are DetectionPredictor and RecognitionPredictor classes that do the work for this higher class, and that they consist of the actual detector/recognition model and the corresponding Preprocessor class.

If there is any example you could direct me to within this repo that could help with ONNXing the entire pipeline, I would appreciate this immensely! 🙏

0 replies

zohaib-khan5040 · 2023-07-06T22:54:16Z

zohaib-khan5040
Jul 6, 2023
Author

If there's any convenient ways of using half precision on the CPU, whether that be converting to ONNX then using OpenVINO or using bfloat16, for the end-to-end pipeline specifically, please let me know here.

I've been trying to convert the DBResNet50 and CRNN models to ONNX but there're so many moving pieces with the huge number of classes and nuances with the output of the detector being logits vs. preds, I'm hoping there's something I'm missing out which is an easier fix. My only goal is to run the Detector + Recognition pipeline in half precision on CPU (ideally through ONNX).

2 replies

felixT2K Jul 10, 2023

Hi @zohaib-khan5040 👋,

I have provided a snippet how you can use float16 | bfloat16 precision with the whole ocr_predicor what's not working with this example ? :) (keep in mind this snippet works only on the main branch)

About ONNX as explained there is currently no "easy" way to export the whole model including the postprocessing. You can export the model itself but pre - and post - processing is on your side. We have on track to provide something like onnx_predictor(reco_arch="crnn_vgg16_bn", det_arch="db_resnet50", device="cpu") in a future release but this is still a open task.

zohaib-khan5040 Jul 11, 2023
Author

Helloo.

Regarding the half-precision matter,

I tried using the .half() method on the model and the model did reduce by half in size (MB). However when I tried to perform inference using this on an image whose dtype=torch.float16, there was still a Type Error which could be due to some casting done outside the models within the Predictor's flow.
When I try to use bfloat16 and Torch's autocast (as is recommended in the forums), the cell takes way too long to run and crashes again - TypeError: Got unsupported ScalarType BFloat16. The code I ran was:

model.to(torch.bfloat16)
with torch.amp.autocast(device_type="cpu", dtype=torch.bfloat16):
    model(img)

Regarding the ONNX matter, I understand that now and am trying to implement it based off the OCRPredictor class so it won't be an issue.

Apologies if I am taking much of your time but I'm struggling to understand the sources of these errors after going through the source code and not finding any dtype conversions. I am re-implementing pieces one by one but would still immensely appreciate your help if I missed out something due to carelessness.

felixT2K · 2023-07-11T14:18:11Z

felixT2K
Jul 11, 2023

Helloo.

Regarding the half-precision matter,

I tried using the .half() method on the model and the model did reduce by half in size (MB). However when I tried to perform inference using this on an image whose dtype=torch.float16, there was still a Type Error which could be due to some casting done outside the models within the Predictor's flow.

When I try to use bfloat16 and Torch's autocast (as is recommended in the forums), the cell takes way too long to run and crashes again - TypeError: Got unsupported ScalarType BFloat16. The code I ran was:
model.to(torch.bfloat16)
with torch.amp.autocast(device_type="cpu", dtype=torch.bfloat16):
    model(img)
Regarding the ONNX matter, I understand that now and am trying to implement it based off the OCRPredictor class so it won't be an issue.

Apologies if I am taking much of your time but I'm struggling to understand the sources of these errors after going through the source code and not finding any dtype conversions. I am re-implementing pieces one by one but would still immensely appreciate your help if I missed out something due to carelessness.

I was able to reproduce this issue (half + cuda works well but on cpu some layers doesn't support fp16 on cpu) i will try to find a solution for this. About bfloat16 hard to handle for me because i don't have a cpu which supports bfloat16 ^^

Have you tested if the autocast works with float16 instead of bfloat16 on cpu ?

2 replies

zohaib-khan5040 Jul 11, 2023
Author

That's true I also found some discussions talking about certain layers being difficult to deal with in that regard.

Regarding the autocast, I used this snippet:

with torch.amp.autocast(device_type="cpu", dtype=torch.float16):
    print("half precision")
    find_size(model)
    model(img)

and got this error on the first line (context manager):
RuntimeError: Currently, AutocastCPU only support Bfloat16 as the autocast_cpu_dtype

felixdittrich92 Jul 11, 2023
Maintainer

Hey @zohaib-khan5040 👋 ,

After stepping a bit deeper i think we will not have success with bfloat16 or float16 on cpu there are to much operations only supported on cuda... unfortunately (but with float16 and cuda all should work as expected).

About the ONNX topic if you want to work on it i would be super happy to help and give support. (This would be also a great benefit for the whole community)

Our long term goal is to provide a docTR build which does not depends on TF or PT (and some other extra dependencies like matplotlib) especially for production usage.

A first draft would be to create a working Prof of Concept (PoC).
Some steps to be done for this would be to encapsulate the pre - and post-processing to work out of the box without TF or PT requirements. Afterwards we could prepare a onnx_predictor similar to the ocr_predictor. The onnx export functions would need some additions to ship required metadata but this should be the easiest part 😅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exporting OCR Predictor (det+rec) as ONNX (inference on CPU) #1243

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Exporting OCR Predictor (det+rec) as ONNX (inference on CPU) #1243

zohaib-khan5040 Jul 5, 2023

Replies: 4 comments · 4 replies

felixT2K Jul 6, 2023

zohaib-khan5040 Jul 6, 2023 Author

zohaib-khan5040 Jul 6, 2023 Author

felixT2K Jul 10, 2023

zohaib-khan5040 Jul 11, 2023 Author

felixT2K Jul 11, 2023

zohaib-khan5040 Jul 11, 2023 Author

felixdittrich92 Jul 11, 2023 Maintainer

zohaib-khan5040
Jul 5, 2023

Replies: 4 comments 4 replies

felixT2K
Jul 6, 2023

zohaib-khan5040
Jul 6, 2023
Author

zohaib-khan5040
Jul 6, 2023
Author

zohaib-khan5040 Jul 11, 2023
Author

felixT2K
Jul 11, 2023

zohaib-khan5040 Jul 11, 2023
Author

felixdittrich92 Jul 11, 2023
Maintainer