MMOCR Release v0.6.0
Highlights
- A new recognition algorithm MASTER has been added into MMOCR, which was the championship solution for the "ICDAR 2021 Competition on Scientific Table Image Recognition to Latex"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe
- DBNet++ has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.
- Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo (Det & Recog) to explore further information.
- To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the doc.
- Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. Doc
- Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. (Doc) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the Evaluation section for details.
- MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit Labelme to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read tutorial docs to get started.
Lmdb Dataset
Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline LoadImageFromLMDB.
This section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.
Specifications
To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:
- The parameter describing the data volume of the dataset is
num-samples
instead oftotal_number
(deprecated). - Images and labels are stored with keys in the form of
image-000000001
andlabel-000000001
, respectively.
Usage
- Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.
-
Previously, MMOCR had a function
txt2lmdb
(deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility lmdb_converter to convert recognition datasets with both images and labels to lmdb format. -
Say that your recognition data in MMOCR's format are organized as follows. (See an example in ocr_toy_dataset).
# Directory structure ├──img_path | |—— img1.jpg | |—— img2.jpg | |—— ... |——label.txt (or label.jsonl) # Annotation format label.txt: img1.jpg HELLO img2.jpg WORLD ... label.jsonl: {'filename':'img1.jpg', 'text':'HELLO'} {'filename':'img2.jpg', 'text':'WORLD'} ...
-
Then pack these files up:
python tools/data/utils/lmdb_converter.py {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}
-
Check out tools.md for more details.
- The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:
-
Set parser as
LineJsonParser
andfile_format
as 'lmdb' in dataset config# configs/_base_/recog_datasets/ST_MJ_train.py train1 = dict( type='OCRDataset', img_prefix=train_img_prefix1, ann_file=train_ann_file1, loader=dict( type='AnnFileLoader', repeat=1, file_format='lmdb', parser=dict( type='LineJsonParser', keys=['filename', 'text'], )), pipeline=None, test_mode=False)
-
Use
LoadImageFromLMDB
in pipeline:# configs/_base_/recog_pipelines/crnn_pipeline.py train_pipeline = [ dict(type='LoadImageFromLMDB', color_type='grayscale'), ...
- You are good to go! Start training and MMOCR will load data from your lmdb dataset.
New Features & Enhancements
- Add analyze_logs in tools and its description in docs by @Y-M-Y in #899
- Add LSVT Data Converter by @xinke-wang in #896
- Add RCTW dataset converter by @xinke-wang in #914
- Support computing mean scores in UniformConcatDataset by @gaotongxiao in #981
- Support loading images and labels from lmdb file by @Mountchicken in #982
- Add recog2lmdb and new toy dataset files by @Mountchicken in #979
- Add labelme converter for textdet and textrecog by @cuhk-hbsun in #972
- Update CircleCI configs by @xinke-wang in #918
- Update Git Action by @xinke-wang in #930
- More customizable fields in dataloaders by @gaotongxiao in #933
- Skip CIs when docs are modified by @gaotongxiao in #941
- Rename Github tests, fix ignored paths by @gaotongxiao in #946
- Support latest MMCV by @gaotongxiao in #959
- Support dynamic threshold range in eval_hmean by @gaotongxiao in #962
- Update the version requirement of mmdet in docker by @Mountchicken in #966
- Replace
opencv-python-headless
withopen-python
by @gaotongxiao in #970 - Update Dataset Configs by @xinke-wang in #980
- Add SynthText dataset config by @xinke-wang in #983
- Automatically report mean scores when applicable by @gaotongxiao in #995
- Add DBNet++ by @xinke-wang in #973
- Add MASTER by @JiaquanYe in #807
- Allow choosing metrics to report in text recognition tasks by @gaotongxiao in #989
- Add HierText converter by @Mountchicken in #948
- Fix lint_only in CircleCI by @gaotongxiao in #998
Bug Fixes
- Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in #927
- Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in #944
- Fix a Bug in ResNet plugin by @Mountchicken in #967
- revert a wrong setting in db_r18 cfg by @gaotongxiao in #978
- Fix TotalText Anno version issue by @xinke-wang in #945
- Update installation step of
albumentations
by @gaotongxiao in #984 - Fix ImgAug transform by @gaotongxiao in #949
- Fix GPG key error in CI and docker by @gaotongxiao in #988
- update label.lmdb by @Mountchicken in #991
- correct meta key by @garvan2021 in #926
- Use new image by @gaotongxiao in #976
- Fix Data Converter Issues by @xinke-wang in #955
Docs
- Update CONTRIBUTING.md by @gaotongxiao in #905
- Fix the misleading description in test.py by @gaotongxiao in #908
- Update recog.md for lmdb Generation by @xinke-wang in #934
- Add MMCV by @gaotongxiao in #954
- Add wechat QR code to CN readme by @gaotongxiao in #960
- Update CONTRIBUTING.md by @gaotongxiao in #947
- Use QR codes from MMCV by @gaotongxiao in #971
- Renew dataset_types.md by @gaotongxiao in #997
New Contributors
Full Changelog: v0.5.0...v0.6.0