Chemical Segmentation Training Data Generation

Set up the training data generation

Download the PubLayNet dataset

wget -O training_data_generation/PubLayNet_PDF.tar.gz https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/PubLayNet_PDF.tar.gz
Unpack the PubLayNet dataset. The dataset should be located at training_data_generation/publaynet/
Download the COCO 2017 dataset to use its content as random (non-chemical) images using the Kaggle API (https://github.com/Kaggle/kaggle-api).

kaggle dataset download awsaf49/coco-2017-dataset
Unpack the COCO dataset. The images should be located at training_data_generation/random_images/. We used the images from the train subset.
Download SMILES list from https://zenodo.org/record/5155037#.Y6r-9HbMK38 and save it as ´smiles.txt´ in training_data_generation.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
mask_expansion_visualisation		mask_expansion_visualisation
mrcnn		mrcnn
training_data_generation		training_data_generation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md