MMOCR Release v0.4.0
Highlights
- We release a new text recognition model - ABINet (CVPR 2021, Oral). With dedicated model design and useful data augmentation transforms, ABINet achieves the best performance on irregular text recognition tasks. Check it out!
- We are also working hard to fulfill the requests from our community. OpenSet KIE is one of the achievements, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide a demo script to convert WildReceipt to open set domain, though it may not take full advantage of the OpenSet format. For more information, read our tutorial.
- APIs of models can be exposed through TorchServe. Docs
Breaking Changes & Migration Guide
Postprocessor
Some refactoring processes are still going on. For all text detection models, we unified their decode
implementations into a new module category, POSTPROCESSOR
, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the text_repr_type
argument in bbox_head
is deprecated and will be removed in the future release.
Migration Guide: Find a similar line from detection model's config:
text_repr_type=xxx,
And replace it with
postprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),
Take a snippet of PANet's config as an example. Before the change, its config for bbox_head
looks like:
bbox_head=dict(
type='PANHead',
text_repr_type='poly',
in_channels=[128, 128, 128, 128],
out_channels=6,
loss=dict(type='PANLoss')),
Afterwards:
bbox_head=dict(
type='PANHead',
in_channels=[128, 128, 128, 128],
out_channels=6,
loss=dict(type='PANLoss'),
postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),
There are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in mmocr/models/textdet/postprocess
or through our api docs.
New Config Structure
We reorganized the configs/
directory by extracting reusable sections into configs/_base_
. Now the directory tree of configs/_base_
is organized as follows:
_base_
├── det_datasets
├── det_models
├── det_pipelines
├── recog_datasets
├── recog_models
├── recog_pipelines
└── schedules
Most of model configs are making full use of base configs now, which makes the overall structural clearer and facilitates fair comparison across models. Despite the seemingly significant hierarchical difference, these changes would not break the backward compatibility as the names of model configs remain the same.
New Features
- Support openset kie by @cuhk-hbsun in #498
- Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in #497
- Support Chinese for kie show result by @cuhk-hbsun in #464
- Add TorchServe support for text detection and recognition by @Harold-lkk in #522
- Save filename in text detection test results by @cuhk-hbsun in #570
- Add codespell pre-commit hook and fix typos by @gaotongxiao in #520
- Avoid duplicate placeholder docs in CN by @gaotongxiao in #582
- Save results to json file for kie. by @cuhk-hbsun in #589
- Add SAR_CN to ocr.py by @gaotongxiao in #579
- mim extension for windows by @gaotongxiao in #641
- Support muitiple pipelines for different datasets by @cuhk-hbsun in #657
- ABINet Framework by @gaotongxiao in #651
Refactoring
- Refactor textrecog config structure by @cuhk-hbsun in #617
- Refactor text detection config by @cuhk-hbsun in #626
- refactor transformer modules by @cuhk-hbsun in #618
- refactor textdet postprocess by @cuhk-hbsun in #640
Docs
- C++ example section by @apiaccess21 in #593
- install.md Chinese section by @A465539338 in #364
- Add Chinese Translation of deployment.md. by @fatfishZhao in #506
- Fix a model link and add the metafile for SATRN by @gaotongxiao in #473
- Improve docs style by @gaotongxiao in #474
- Enhancement & sync Chinese docs by @gaotongxiao in #492
- TorchServe docs by @gaotongxiao in #539
- Update docs menu by @gaotongxiao in #564
- Docs for KIE CloseSet & OpenSet by @gaotongxiao in #573
- Fix broken links by @gaotongxiao in #576
- Docstring for text recognition models by @gaotongxiao in #562
- Add MMFlow & MIM by @gaotongxiao in #597
- Add MMFewShot by @gaotongxiao in #621
- Update model readme by @gaotongxiao in #604
- Add input size check to model_inference by @mpena-vina in #633
- Docstring for textdet models by @gaotongxiao in #561
- Add MMHuman3D in readme by @gaotongxiao in #644
- Use shared menu from theme instead by @gaotongxiao in #655
- Refactor docs structure by @gaotongxiao in #662
- Docs fix by @gaotongxiao in #664
Enhancements
- Use bounding box around polygon instead of within polygon by @alexander-soare in #469
- Add CITATION.cff by @gaotongxiao in #476
- Add py3.9 CI by @gaotongxiao in #475
- update model-index.yml by @gaotongxiao in #484
- Use container in CI by @gaotongxiao in #502
- CircleCI Setup by @gaotongxiao in #611
- Remove unnecessary custom_import from train.py by @gaotongxiao in #603
- Change the upper version of mmcv to 1.5.0 by @zhouzaida in #628
- Update CircleCI by @gaotongxiao in #631
- Pass custom_hooks to MMCV by @gaotongxiao in #609
- Skip CI when some specific files were changed by @gaotongxiao in #642
- Add markdown linter in pre-commit hook by @gaotongxiao in #643
- Use shape from loaded image by @cuhk-hbsun in #652
- Cancel previous runs that are not completed by @Harold-lkk in #666
Bug Fixes
- Modify algorithm "sar" weights path in metafile by @ShoupingShan in #581
- Fix Cuda CI by @gaotongxiao in #472
- Fix image export in test.py for KIE models by @gaotongxiao in #486
- Allow invalid polygons in intersection and union by default by @gaotongxiao in #471
- Update checkpoints' links for SATRN by @gaotongxiao in #518
- Fix converting to onnx bug because of changing key from img_shape to resize_shape by @Harold-lkk in #523
- Fix PyTorch 1.6 incompatible checkpoints by @gaotongxiao in #540
- Fix paper field in metafiles by @gaotongxiao in #550
- Unify recognition task names in metafiles by @gaotongxiao in #548
- Fix py3.9 CI by @gaotongxiao in #563
- Always map location to cpu when loading checkpoint by @gaotongxiao in #567
- Fix wrong model builder in recog_test_imgs by @gaotongxiao in #574
- Improve dbnet r50 by fixing img std by @gaotongxiao in #578
- Fix resource warning: unclosed file by @cuhk-hbsun in #577
- Fix bug that same start_point for different texts in draw_texts_by_pil by @cuhk-hbsun in #587
- Keep original texts for kie by @cuhk-hbsun in #588
- Fix random seed by @gaotongxiao in #600
- Fix DBNet_r50 config by @gaotongxiao in #625
- Change SBC case to DBC case by @cuhk-hbsun in #632
- Fix kie demo by @innerlee in #610
- fix type check by @cuhk-hbsun in #650
- Remove depreciated image validator in totaltext converter by @gaotongxiao in #661
- Fix change locals() dict by @Fei-Wang in #663
- fix #614: textsnake targets by @HolyCrap96 in #660
New Contributors
- @alexander-soare made their first contribution in #469
- @A465539338 made their first contribution in #364
- @fatfishZhao made their first contribution in #506
- @baudm made their first contribution in #497
- @ShoupingShan made their first contribution in #581
- @apiaccess21 made their first contribution in #593
- @zhouzaida made their first contribution in #628
- @mpena-vina made their first contribution in #633
- @Fei-Wang made their first contribution in #663
Full Changelog: v0.3.0...v0.4.0