This release adds support of rotated documents, and extends both the model & dataset zoos.

Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

🙃 😃 Rotation-aware text detection 🙃 😃

It's no secret: this release focus was to bring the same level of performance to rotated documents!

docTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:

Straightening pages before text detection

Developing a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to @Rob192 for his contribution on this part 🙏

This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness.

Text detection training with rotated images

The core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.

Crop orientation resolution

Finally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!

🦓 A wider pretrained classification model zoo 🦓

The stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated 🚀
Those were trained using our synthetic character classification dataset, for more details cf. Character classification training

🖼️ New public datasets join the fray

Thanks to @felixdittrich92, the list of supported datasets has considerably grown 🥳
This includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here 👉 #587

Synthetic text recognition dataset

Additionally, we followed up on the existing CharGenerator by introducing WordGenerator:

generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.
you can even pass a list of fonts so that each word font family is randomly picked among them

Below are some samples using a font_size=32:

📑 New notebooks

Two new notebooks have made their way into the documentation:

producing searchable PDFs from docTR analysis results
introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR

Breaking changes

Revamp of classification models

With the retraining of all classification backbones, several changes have been introduced:

Model naming: linknet16 --> linknet_resnet18
Architecture changes: all classification backbones are available with a classification head now.

Enforcing relative coordinates in datasets

In order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!

0.4.1	0.5.0
`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD(train=True, download=True)` `>>> img, target = ds[0]` `>>> print(target['boxes'].dtype, target['boxes'].max())` `(dtype('int64'), 862)`	`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD(train=True, download=True)` `>>> img, target = ds[0]` `>>> print(target['boxes'].dtype, target['boxes'].max())` `(dtype('float32'), 0.98341835)`

Full changelog

Breaking Changes 🛠

refacto: 🔧 postprocessing with rotated boxes by @charlesmindee in #641
refactor: Refactored LinkNet by @fg-mindee in #733
refactor: Renamed DataLoader arg "workers" into "num_workers" by @fg-mindee in #737
refactor: Unified return_preds flags across all tasks by @fg-mindee in #741
refactor: Introduces img + target transforms in Datasets by @fg-mindee in #750
refactor: refactoring rotated boxes by @charlesmindee in #731
refactor: Enforced relative coordinates for all dataset geometries by @fg-mindee in #775

New Features

SynthText dataset integration by @felixdittrich92 in #624
[notebooks] add export_as_pdfa notebook by @felixdittrich92 in #650
ICDAR2003 dataset integration by @felixdittrich92 in #653
feat: Implements erosion & dilation in PyTorch & TF by @fg-mindee in #669
Rotate page by @Rob192 in #488
feat: Added option to use AMP with TF scripts by @fg-mindee in #682
feat: Added support of FasterRCNN for PyTorch by @fg-mindee in #691
ICDAR2013 dataset integration by @felixdittrich92 in #662
feat: Added LR finder option in PyTorch training scripts by @fg-mindee in #703
feat: Added line reading for source PDFs by @fg-mindee in #707
feat: Added plot_samples support to visualize the images along with the targets by @SiddhantBahuguna in #704
SVHN dataset integration by @felixdittrich92 in #634
feat: Added checkpoint for obj_detection by @SiddhantBahuguna in #713
feat: add classification module for crop orientation by @charlesmindee in #721
feat: Added inference+post processing script for artefact detection by @SiddhantBahuguna in #728
feat: Added latency evaluation scripts for all tasks by @fg-mindee in #746
docs: Added colab link in the Read me for artefact detection by @SiddhantBahuguna in #755
feat: Added LR Finder for TensorFlow scripts by @fg-mindee in #747
feat: Added latency evaluation & benchmark for image classification by @fg-mindee in #757
feat: Adds GaussianBlur, random font for CharGenerator and improves training scripts by @fg-mindee in #758
feat: Added WordGenerator dataset by @fg-mindee in #760
feat: Added dedicated evaluation scripts for text detection by @fg-mindee in #761
feat: Refactored & retrained all classification models by @fg-mindee in #763
feat: add rotated ckpts for pytorch DBNet + fix line resolution for rotated pages by @charlesmindee in #743
feat: Added torchvision photometric augmentations in artefact detection training by @SiddhantBahuguna in #764
feat: Added random noise augmentation to object detection by @SiddhantBahuguna in #654
feat: add rotation option to both detection training scripts by @charlesmindee in #765
feat: Added ChannelShuffle transformation and fixes RandomCrop by @fg-mindee in #768
feat: Added Gaussian Noise implementation in Tensorflow by @SiddhantBahuguna in #771
feat: Added Random Horizontal Flip augmentation by @SiddhantBahuguna in #773
ci: Added release helper actions by @fg-mindee in #776

Bug Fixes

docs: Fixed documentation build by @fg-mindee in #644
fix: 🐛 bug canvas dtype for threshold target by @charlesmindee in #645
fix: 🐛 assume_straight_pages in predictor by @charlesmindee in #647
ci: Fixed silent isort failure by @fg-mindee in #655
fix: Fixed W&B config log by @fg-mindee in #656
fix: Updates Makefile to match CI by @fg-mindee in #661
docs: Fixed typo in the docstrings of metrics by @fg-mindee in #664
fix: rotation arg in training scripts by @charlesmindee in #657
feat: Added missing output classes param in DBNet by @fg-mindee in #666
fix: Fixed LinkNet target & loss computation by @fg-mindee in #670
fix: box angle rectification according to the quadrant by @charlesmindee in #667
fix: rotate_boxes angle by @charlesmindee in #678
fix: Fixed param override of backbone by @fg-mindee in #689
fix: Added missing AMP flags in training scripts by @fg-mindee in #690
fix: Added a 0-sized crop safeguard in split_crops by @fg-mindee in #693
fix: Fixed MASTER recognition architecture by @fg-mindee in #687
fix: Added safeguard for extreme aspect ratio in Resize by @fg-mindee in #695
fix: Fixed W&B logger in object detection training script by @fg-mindee in #697
fix: Fixed geometry utils for polygon <--> rbox conversions by @fg-mindee in #700
fix: Fixed build_target for detection models with rotated targets by @fg-mindee in #698
fix: box computing when assume straight pages is false by @charlesmindee in #720
test: Fixed TF loss unittest by @fg-mindee in #725
fix: Fixed edge cases of DB loss in PyTorch by @fg-mindee in #726
fix: Fixed computation of Mean IoU by @fg-mindee in #734
fix: Fixed detection training script by @fg-mindee in #742
fix: Fixed the bin_thresh of LinkNet by @fg-mindee in #745
test: Increased flexibility of loss test by @fg-mindee in #744
fix: Fixed mask computation of DBNet by @fg-mindee in #753
test: Fixed TensorFlow predictor unittest by @fg-mindee in #767
fix: Fixed the box cropping from RandomCrop by @fg-mindee in #772
ci: Fixed CI training job for TF by @fg-mindee in #770
docs: Fixed README link & update documentation by @fg-mindee in #774
fix: target DB by @charlesmindee in #777

Improvements

style: Fixed isort and typing checks by @fg-mindee in #643
docs: Added TFJS demo ref in README by @fg-mindee in #651
fix: Added automatic worker resolution to remaining training scripts by @fg-mindee in #649
feat: Added rbox_iou function with a memory-savy option by @fg-mindee in #659
style: Cleaned codebase with Codacy hints by @fg-mindee in #665
feat: Added file existence check in DetectionDataset by @fg-mindee in #672
fix: pymupdf version by @charlesmindee in #673
[refactor] SROIE dataset by @felixdittrich92 in #660
fix: target_ar split crops by @charlesmindee in #681
feat: add line resolution for rotated boxes by @charlesmindee in #677
feat: add rboxes rectification in Linknet postprocessing by @charlesmindee in #679
docs: Added minimal docstring sanity check by @fg-mindee in #686
fix: Fixed deprecation warnings from numpy & PyMuPDF by @fg-mindee in #692
refactor: Removed postprocessor from high-level init by @fg-mindee in #688
feat: Added possibility to change the cache dir of datasets by @fg-mindee in #694
Mock Sroie / Funsd / Cord / Synthtext / DocArtefacts / IIIT5K / SVT / IC03 (all ^^) by @felixdittrich92 in #722
refactor: Refactored detection post-processing by @fg-mindee in #724
ci: Fixed CI job name and ignored .idea files by @fg-mindee in #727
feat: integration of the classifier in the ocr predictor by @charlesmindee in #723
test: Switch to a fully mocked PDF for unittests by @fg-mindee in #735
test: Silenced PyMuPDF warnings by @fg-mindee in #740
refactor: Removed contiguous param since it's included in torch>=1.7 by @fg-mindee in #756
feat: add preserve aspect ratio to predictor and vizualisation utils by @charlesmindee in #766
ci: Optimized CI jobs to speed up development process by @fg-mindee in #759
feat: Updated timing to more accurate one by @fg-mindee in #769

Miscellaneous

chore: Applied post release modifications by @fg-mindee in #642

Full Changelog: v0.4.1...v0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0: Skew-aware OCR & extended model/dataset zoo