Release MMOCR Release v0.6.0 · open-mmlab/mmocr

Highlights

A new recognition algorithm MASTER has been added into MMOCR, which was the championship solution for the "ICDAR 2021 Competition on Scientific Table Image Recognition to Latex"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe
DBNet++ has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.
Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo (Det & Recog) to explore further information.
To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the doc.
Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. Doc
Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. (Doc) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the Evaluation section for details.
MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit Labelme to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read tutorial docs to get started.

Lmdb Dataset

Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline LoadImageFromLMDB.
This section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.

Specifications

To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:

The parameter describing the data volume of the dataset is num-samples instead of total_number (deprecated).
Images and labels are stored with keys in the form of image-000000001 and label-000000001, respectively.

Usage

Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.

Previously, MMOCR had a function txt2lmdb (deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility lmdb_converter to convert recognition datasets with both images and labels to lmdb format.

Say that your recognition data in MMOCR's format are organized as follows. (See an example in ocr_toy_dataset).

# Directory structure

├──img_path
|      |—— img1.jpg
|      |—— img2.jpg
|      |—— ...
|——label.txt (or label.jsonl)

# Annotation format

label.txt:  img1.jpg HELLO
            img2.jpg WORLD
            ...

label.jsonl:    {'filename':'img1.jpg', 'text':'HELLO'}
                {'filename':'img2.jpg', 'text':'WORLD'}
                ...

Then pack these files up:

python tools/data/utils/lmdb_converter.py  {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}

Check out tools.md for more details.

The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:

Set parser as LineJsonParser and file_format as 'lmdb' in dataset config

# configs/_base_/recog_datasets/ST_MJ_train.py
train1 = dict(
    type='OCRDataset',
    img_prefix=train_img_prefix1,
    ann_file=train_ann_file1,
    loader=dict(
        type='AnnFileLoader',
        repeat=1,
        file_format='lmdb',
        parser=dict(
            type='LineJsonParser',
            keys=['filename', 'text'],
        )),
    pipeline=None,
    test_mode=False)

Use LoadImageFromLMDB in pipeline:

# configs/_base_/recog_pipelines/crnn_pipeline.py
train_pipeline = [
    dict(type='LoadImageFromLMDB', color_type='grayscale'),
    ...

You are good to go! Start training and MMOCR will load data from your lmdb dataset.

New Features & Enhancements

Add analyze_logs in tools and its description in docs by @Y-M-Y in #899
Add LSVT Data Converter by @xinke-wang in #896
Add RCTW dataset converter by @xinke-wang in #914
Support computing mean scores in UniformConcatDataset by @gaotongxiao in #981
Support loading images and labels from lmdb file by @Mountchicken in #982
Add recog2lmdb and new toy dataset files by @Mountchicken in #979
Add labelme converter for textdet and textrecog by @cuhk-hbsun in #972
Update CircleCI configs by @xinke-wang in #918
Update Git Action by @xinke-wang in #930
More customizable fields in dataloaders by @gaotongxiao in #933
Skip CIs when docs are modified by @gaotongxiao in #941
Rename Github tests, fix ignored paths by @gaotongxiao in #946
Support latest MMCV by @gaotongxiao in #959
Support dynamic threshold range in eval_hmean by @gaotongxiao in #962
Update the version requirement of mmdet in docker by @Mountchicken in #966
Replace opencv-python-headless with open-python by @gaotongxiao in #970
Update Dataset Configs by @xinke-wang in #980
Add SynthText dataset config by @xinke-wang in #983
Automatically report mean scores when applicable by @gaotongxiao in #995
Add DBNet++ by @xinke-wang in #973
Add MASTER by @JiaquanYe in #807
Allow choosing metrics to report in text recognition tasks by @gaotongxiao in #989
Add HierText converter by @Mountchicken in #948
Fix lint_only in CircleCI by @gaotongxiao in #998

Bug Fixes

Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in #927
Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in #944
Fix a Bug in ResNet plugin by @Mountchicken in #967
revert a wrong setting in db_r18 cfg by @gaotongxiao in #978
Fix TotalText Anno version issue by @xinke-wang in #945
Update installation step of albumentations by @gaotongxiao in #984
Fix ImgAug transform by @gaotongxiao in #949
Fix GPG key error in CI and docker by @gaotongxiao in #988
update label.lmdb by @Mountchicken in #991
correct meta key by @garvan2021 in #926
Use new image by @gaotongxiao in #976
Fix Data Converter Issues by @xinke-wang in #955

Docs

Update CONTRIBUTING.md by @gaotongxiao in #905
Fix the misleading description in test.py by @gaotongxiao in #908
Update recog.md for lmdb Generation by @xinke-wang in #934
Add MMCV by @gaotongxiao in #954
Add wechat QR code to CN readme by @gaotongxiao in #960
Update CONTRIBUTING.md by @gaotongxiao in #947
Use QR codes from MMCV by @gaotongxiao in #971
Renew dataset_types.md by @gaotongxiao in #997

New Contributors

@Y-M-Y made their first contribution in #899

Full Changelog: v0.5.0...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMOCR Release v0.6.0