This is a starter kit for the League of Nations archives digitization challenge on crowdAI.
This challenge is an image classification problem, where in the training set you are given 4692 images belonging to either english
or french
, and then you are provided 14216 images in the test set, where you are supposed to predict the class the said image belongs to.
The datasets are available in the Dataset section of the challenge page, and on following the links, you will have two files :
train.tar.gz
test.tar.gz
train.tar.gz
expands into a folder containing two subfolders, of the form :
.
└── train
├── en (contains *.jpg images)
└── fr (contains *.jpg images)
The folders en
and fr
have .jpg
images belonging to the respective class.
For the rest of this starter kit you are encourage to download both the files, and extract them and place them in the data/
directory to make the directory structure look like :
.
└── data
├── test_images (contains *.jpg images)
└── train
├── en (contains *.jpg images)
└── fr (contains *.jpg images)
The predictions should be a valid CSV file with 14216 rows (one for each of the images in the test set), and the following headers :
filename, prob_en, prob_fr
where :
filename
: filename of a single test fileprob_en
: the confidence[0,1]
that this image belongs to the classenglish
prob_fr
: the confidence[0,1]
that this image belongs to the classfrench
The sum of of prob_en
and prob_fr
for a single row should be less than 1.
The you can use the script below to generate a sample submission, which should be saved at random_prediction.csv
.
#!/usr/bin/env python
import numpy as np
import os
import glob
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0) # only difference
LINES = []
LINES.append("filename,prob_en,prob_fr")
for _file_path in glob.glob("data/test_images/*.jpg"):
probs = softmax(np.random.rand(2))
LINES.append("{},{},{}".format(
os.path.basename(_file_path),
probs[0],
probs[1]
))
fp = open("random_prediction.csv", "w")
fp.write("\n".join(LINES))
fp.close()
Then you can submit on crowdAI, by going to the challenge page and clicking on Create Submission
:
and then upload the file by clicking on Browse file
at the bottom of the screen:
and then finally, your submission should either be accepted, or the error shown :
Best of Luck
Sharada Mohanty [email protected]