Where is the Visual Wake Word test set? #135

LucaUrbinati44 · 2023-01-12T15:59:51Z

I would like to evaluate the pretrained MobileNet model on the preprocessed COCO2014 test set, but I am not able to find this preprcessed test set anywhere in the repo. Where can I find it? For the other three datasets (AD, IC, KS) it has been already provided in the repo.

I suspect I have to generate it by myself using this script setting dataType='test2014', because this should be the same script that has been used to create the training+validation dataset that is used for the training and that can be downloaded here.

Moreover, the paper entitled "MLPerf Tiny Benchmark" mentions the presence of this test set for the VWW problem at paragraph 4.1.

Finally, why is there no test.py (or evaluated.py) script to run the model on the test set, while for all the other three datasets (AD, IC, KS) there are such scripts?

Thank you,
Regards,
Luca Urbinati

The text was updated successfully, but these errors were encountered:

colbybanbury · 2023-03-08T19:43:39Z

Good question!

MS-COCO does not publish the labels (aka annotations) for the test set and holds competitions oriented around the test set. This means that Visual Wake Words does not contain an explicit test set.

It's traditionally best practice to use the Val set as the test set and use a small percentage of the training set for validation if needed. MLPerf Tiny should potentially move to adopt this practice, including an update to the paper.

@cskiraly and @jeremy-syn, who currently owns the VWW benchmark? I'm happy to help make the change if needed.

LucasFischer123 · 2023-03-20T14:37:49Z

Hi @colbybanbury @LucaUrbinati44

Any news on this issue ?

Thanks

Lucas

LucaUrbinati44 · 2023-03-20T16:12:41Z

Hi @LucasFischer123,

Short answer
We "solved" it by using 10% of the whole dataset as "validation set" during training (according to the train_vww.py script) and then using these 1000 images for testing.

Long answer
We discovered that these 1000 images correspond to 1000 images of the provided dataset.
So, as first experiment, we removed those 1000 images from the dataset and we used the remaining dataset to train a floating point model from scratch using train_vww.py (without changing anything in this training script) and then we made inference on the 1000 images for testing. The result was around 83%, smaller than the 86% mentioned in the paper.

Then, as second experiment, we trained the model again from scratch, but this time on the whole dataset, i.e. without removing the 1000 images. This time the testing result on the 1000 images was 86%, as the paper.

Since the second experiment gave the same results of the paper, we decided to go for this second “solution” (see “Short answer”).

However, we know that this procedure is not 100% correct since the model saw the 1000 images twice (during training and during testing).

Thus, we hope the organizers' could solve this issue soon, both in the repo instructions and in the paper.

Thank you all,
Luca Urbinati and Marco Terlizzi

NilsGraf · 2023-08-23T21:48:50Z

Hi @LucaUrbinati44 @colbybanbury @LucasFischer123 @cskiraly and @jeremy-syn

I had a similar question on how to evaluate accuracy. I created this Jupyter notebook, which you can run in your browser (or use this script if you prefer running locally).

This script downloads the dataset from Silabs and runs both TFLite reference models (int8-model and float-model) with the 1000 images listed in y_labels.csv to measure their accuracy. I get below results:

float accuracy: 85.2   
int8 accuracy : 85.9  
image count   : 1000

Does this look correct?

BTW, I get 86.0% for int8 accuracy (instead of 85.9%) when I run on M1 MacBook instead of colab.

NilsGraf · 2023-08-23T22:13:28Z

One more note: For the int8-accuracy, a few of the testcases in y_labels.csv produce a probability of exactly 0.5 (i.e. signed int8 value of 0, or unsigned int8 value of 128). In my script I assume that probability-of-person = 0.5 indicates a person. Changing this to non-person reduces the int8-accuracy by 0.3%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where is the Visual Wake Word test set? #135

Where is the Visual Wake Word test set? #135

LucaUrbinati44 commented Jan 12, 2023 •

edited

Loading

colbybanbury commented Mar 8, 2023

LucasFischer123 commented Mar 20, 2023

LucaUrbinati44 commented Mar 20, 2023 •

edited

Loading

NilsGraf commented Aug 23, 2023 •

edited

Loading

NilsGraf commented Aug 23, 2023

Where is the Visual Wake Word test set? #135

Where is the Visual Wake Word test set? #135

Comments

LucaUrbinati44 commented Jan 12, 2023 • edited Loading

colbybanbury commented Mar 8, 2023

LucasFischer123 commented Mar 20, 2023

LucaUrbinati44 commented Mar 20, 2023 • edited Loading

NilsGraf commented Aug 23, 2023 • edited Loading

NilsGraf commented Aug 23, 2023

LucaUrbinati44 commented Jan 12, 2023 •

edited

Loading

LucaUrbinati44 commented Mar 20, 2023 •

edited

Loading

NilsGraf commented Aug 23, 2023 •

edited

Loading