v0.5.0: Skew-aware OCR & extended model/dataset zoo
This release adds support of rotated documents, and extends both the model & dataset zoos.
Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
🙃 😃 Rotation-aware text detection 🙃 😃
It's no secret: this release focus was to bring the same level of performance to rotated documents!
docTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:
Straightening pages before text detection
Developing a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to @Rob192 for his contribution on this part 🙏
This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness.
Text detection training with rotated images
The core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.
Crop orientation resolution
Finally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!
🦓 A wider pretrained classification model zoo 🦓
The stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated 🚀
Those were trained using our synthetic character classification dataset, for more details cf. Character classification training
🖼️ New public datasets join the fray
Thanks to @felixdittrich92, the list of supported datasets has considerably grown 🥳
This includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here 👉 #587
Synthetic text recognition dataset
Additionally, we followed up on the existing CharGenerator
by introducing WordGenerator
:
- generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.
- you can even pass a list of fonts so that each word font family is randomly picked among them
Below are some samples using a font_size=32
:
📑 New notebooks
Two new notebooks have made their way into the documentation:
- producing searchable PDFs from docTR analysis results
- introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR
Breaking changes
Revamp of classification models
With the retraining of all classification backbones, several changes have been introduced:
- Model naming:
linknet16
-->linknet_resnet18
- Architecture changes: all classification backbones are available with a classification head now.
Enforcing relative coordinates in datasets
In order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!
0.4.1 | 0.5.0 |
---|---|
>>> from doctr.datasets import FUNSD >>> ds = FUNSD(train=True, download=True) >>> img, target = ds[0] >>> print(target['boxes'].dtype, target['boxes'].max()) (dtype('int64'), 862) |
>>> from doctr.datasets import FUNSD >>> ds = FUNSD(train=True, download=True) >>> img, target = ds[0] >>> print(target['boxes'].dtype, target['boxes'].max()) (dtype('float32'), 0.98341835) |
Full changelog
Breaking Changes 🛠
- refacto: 🔧 postprocessing with rotated boxes by @charlesmindee in #641
- refactor: Refactored LinkNet by @fg-mindee in #733
- refactor: Renamed DataLoader arg "workers" into "num_workers" by @fg-mindee in #737
- refactor: Unified return_preds flags across all tasks by @fg-mindee in #741
- refactor: Introduces img + target transforms in Datasets by @fg-mindee in #750
- refactor: refactoring rotated boxes by @charlesmindee in #731
- refactor: Enforced relative coordinates for all dataset geometries by @fg-mindee in #775
New Features
- SynthText dataset integration by @felixdittrich92 in #624
- [notebooks] add export_as_pdfa notebook by @felixdittrich92 in #650
- ICDAR2003 dataset integration by @felixdittrich92 in #653
- feat: Implements erosion & dilation in PyTorch & TF by @fg-mindee in #669
- Rotate page by @Rob192 in #488
- feat: Added option to use AMP with TF scripts by @fg-mindee in #682
- feat: Added support of FasterRCNN for PyTorch by @fg-mindee in #691
- ICDAR2013 dataset integration by @felixdittrich92 in #662
- feat: Added LR finder option in PyTorch training scripts by @fg-mindee in #703
- feat: Added line reading for source PDFs by @fg-mindee in #707
- feat: Added plot_samples support to visualize the images along with the targets by @SiddhantBahuguna in #704
- SVHN dataset integration by @felixdittrich92 in #634
- feat: Added checkpoint for obj_detection by @SiddhantBahuguna in #713
- feat: add classification module for crop orientation by @charlesmindee in #721
- feat: Added inference+post processing script for artefact detection by @SiddhantBahuguna in #728
- feat: Added latency evaluation scripts for all tasks by @fg-mindee in #746
- docs: Added colab link in the Read me for artefact detection by @SiddhantBahuguna in #755
- feat: Added LR Finder for TensorFlow scripts by @fg-mindee in #747
- feat: Added latency evaluation & benchmark for image classification by @fg-mindee in #757
- feat: Adds GaussianBlur, random font for CharGenerator and improves training scripts by @fg-mindee in #758
- feat: Added WordGenerator dataset by @fg-mindee in #760
- feat: Added dedicated evaluation scripts for text detection by @fg-mindee in #761
- feat: Refactored & retrained all classification models by @fg-mindee in #763
- feat: add rotated ckpts for pytorch DBNet + fix line resolution for rotated pages by @charlesmindee in #743
- feat: Added torchvision photometric augmentations in artefact detection training by @SiddhantBahuguna in #764
- feat: Added random noise augmentation to object detection by @SiddhantBahuguna in #654
- feat: add rotation option to both detection training scripts by @charlesmindee in #765
- feat: Added ChannelShuffle transformation and fixes RandomCrop by @fg-mindee in #768
- feat: Added Gaussian Noise implementation in Tensorflow by @SiddhantBahuguna in #771
- feat: Added Random Horizontal Flip augmentation by @SiddhantBahuguna in #773
- ci: Added release helper actions by @fg-mindee in #776
Bug Fixes
- docs: Fixed documentation build by @fg-mindee in #644
- fix: 🐛 bug canvas dtype for threshold target by @charlesmindee in #645
- fix: 🐛 assume_straight_pages in predictor by @charlesmindee in #647
- ci: Fixed silent isort failure by @fg-mindee in #655
- fix: Fixed W&B config log by @fg-mindee in #656
- fix: Updates Makefile to match CI by @fg-mindee in #661
- docs: Fixed typo in the docstrings of metrics by @fg-mindee in #664
- fix: rotation arg in training scripts by @charlesmindee in #657
- feat: Added missing output classes param in DBNet by @fg-mindee in #666
- fix: Fixed LinkNet target & loss computation by @fg-mindee in #670
- fix: box angle rectification according to the quadrant by @charlesmindee in #667
- fix: rotate_boxes angle by @charlesmindee in #678
- fix: Fixed param override of backbone by @fg-mindee in #689
- fix: Added missing AMP flags in training scripts by @fg-mindee in #690
- fix: Added a 0-sized crop safeguard in split_crops by @fg-mindee in #693
- fix: Fixed MASTER recognition architecture by @fg-mindee in #687
- fix: Added safeguard for extreme aspect ratio in Resize by @fg-mindee in #695
- fix: Fixed W&B logger in object detection training script by @fg-mindee in #697
- fix: Fixed geometry utils for polygon <--> rbox conversions by @fg-mindee in #700
- fix: Fixed build_target for detection models with rotated targets by @fg-mindee in #698
- fix: box computing when assume straight pages is false by @charlesmindee in #720
- test: Fixed TF loss unittest by @fg-mindee in #725
- fix: Fixed edge cases of DB loss in PyTorch by @fg-mindee in #726
- fix: Fixed computation of Mean IoU by @fg-mindee in #734
- fix: Fixed detection training script by @fg-mindee in #742
- fix: Fixed the bin_thresh of LinkNet by @fg-mindee in #745
- test: Increased flexibility of loss test by @fg-mindee in #744
- fix: Fixed mask computation of DBNet by @fg-mindee in #753
- test: Fixed TensorFlow predictor unittest by @fg-mindee in #767
- fix: Fixed the box cropping from RandomCrop by @fg-mindee in #772
- ci: Fixed CI training job for TF by @fg-mindee in #770
- docs: Fixed README link & update documentation by @fg-mindee in #774
- fix: target DB by @charlesmindee in #777
Improvements
- style: Fixed isort and typing checks by @fg-mindee in #643
- docs: Added TFJS demo ref in README by @fg-mindee in #651
- fix: Added automatic worker resolution to remaining training scripts by @fg-mindee in #649
- feat: Added rbox_iou function with a memory-savy option by @fg-mindee in #659
- style: Cleaned codebase with Codacy hints by @fg-mindee in #665
- feat: Added file existence check in DetectionDataset by @fg-mindee in #672
- fix: pymupdf version by @charlesmindee in #673
- [refactor] SROIE dataset by @felixdittrich92 in #660
- fix: target_ar split crops by @charlesmindee in #681
- feat: add line resolution for rotated boxes by @charlesmindee in #677
- feat: add rboxes rectification in Linknet postprocessing by @charlesmindee in #679
- docs: Added minimal docstring sanity check by @fg-mindee in #686
- fix: Fixed deprecation warnings from numpy & PyMuPDF by @fg-mindee in #692
- refactor: Removed postprocessor from high-level init by @fg-mindee in #688
- feat: Added possibility to change the cache dir of datasets by @fg-mindee in #694
- Mock Sroie / Funsd / Cord / Synthtext / DocArtefacts / IIIT5K / SVT / IC03 (all ^^) by @felixdittrich92 in #722
- refactor: Refactored detection post-processing by @fg-mindee in #724
- ci: Fixed CI job name and ignored .idea files by @fg-mindee in #727
- feat: integration of the classifier in the ocr predictor by @charlesmindee in #723
- test: Switch to a fully mocked PDF for unittests by @fg-mindee in #735
- test: Silenced PyMuPDF warnings by @fg-mindee in #740
- refactor: Removed contiguous param since it's included in torch>=1.7 by @fg-mindee in #756
- feat: add preserve aspect ratio to predictor and vizualisation utils by @charlesmindee in #766
- ci: Optimized CI jobs to speed up development process by @fg-mindee in #759
- feat: Updated timing to more accurate one by @fg-mindee in #769
Miscellaneous
- chore: Applied post release modifications by @fg-mindee in #642
Full Changelog: v0.4.1...v0.5.0