Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between records and records_shuffled #34

Open
haiyangdeperci opened this issue Feb 26, 2021 · 0 comments
Open

Inconsistency between records and records_shuffled #34

haiyangdeperci opened this issue Feb 26, 2021 · 0 comments

Comments

@haiyangdeperci
Copy link

It seems that some tf.records are present in the records_shuffled directory but not in records. I believe this is an unintended discrepancy. Essentially, out of 10532 tf.record files in records_shuffled only 10448 remain in records. You can investigate the 84 missing records with the following excerpt:

import tensorflow as tf

def fetchFileNames(dir_names):
    filepaths = []
    for name in dir_names:
        filepaths += tf.io.gfile.glob(f"{name}/*")
    return filepaths


record_dirs = tf.io.gfile.glob("gs://objectron/v1/records/*")
record_filepaths = fetchFileNames(record_dirs)
shuffled_dirs = tf.io.gfile.glob("gs://objectron/v1/records_shuffled/*")
shuffled_filepaths = fetchFileNames(shuffled_dirs)

assert len(record_filepaths) < len(shuffled_filepaths)

shuffled_filepaths = [fp.replace("_shuffled", "") for fp in shuffled_filepaths]
record_filepaths = set(record_filepaths)
shuffled_filepaths = set(shuffled_filepaths)
missing = shuffled_filepaths - record_filepaths

These are the missing filepaths:

{'gs://objectron/v1/records/camera/camera_test-00137-of-00163', 'gs://objectron/v1/records/laptop/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_box_train-
00169-of-00819', 'gs://objectron/v1/records/chair/chair_train-00953-of-01106', 'gs://objectron/v1/records/chair/chair_train-00526-of-01106', 'gs://objectron/v1/records
/cereal_box/cereal_box_train-00192-of-00819', 'gs://objectron/v1/records/camera/camera_train-00434-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00
080-of-00819', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00174-of-00819', 'gs://objectron/v1/records/camera/camera_test-00101-of-00163', 'gs://objectron/v
1/records/cereal_box/cereal_box_train-00152-of-00819', 'gs://objectron/v1/records/chair/chair_train-00833-of-01106', 'gs://objectron/v1/records/cereal_box/cereal_box_t
rain-00284-of-00819', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00228-of-00322', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00063-of-00322', 'gs
://objectron/v1/records/bottle/bottle_train-00215-of-00920', 'gs://objectron/v1/records/shoe/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00213-
of-00322', 'gs://objectron/v1/records/camera/camera_train-00539-of-00552', 'gs://objectron/v1/records/bottle/bottle_train-00273-of-00920', 'gs://objectron/v1/records/c
amera/camera_train-00140-of-00552', 'gs://objectron/v1/records/camera/camera_train-00463-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00036-of-008
19', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00025-of-00819', 'gs://objectron/v1/records/cup/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_
box_test-00247-of-00322', 'gs://objectron/v1/records/camera/camera_train-00013-of-00552', 'gs://objectron/v1/records/camera/camera_train-00252-of-00552', 'gs://objectr
on/v1/records/camera/camera_train-00408-of-00552', 'gs://objectron/v1/records/camera/camera_train-00440-of-00552', 'gs://objectron/v1/records/camera/camera_train-00148
-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00010-of-00819', 'gs://objectron/v1/records/camera/camera_test-00152-of-00163', 'gs://objectron/v1/r
ecords/chair/chair_train-00947-of-01106', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00249-of-00819', 'gs://objectron/v1/records/chair/chair_train-00207-of
-01106', 'gs://objectron/v1/records/chair/chair_train-00647-of-01106', 'gs://objectron/v1/records/camera/summary.txt', 'gs://objectron/v1/records/camera/camera_train-0
0040-of-00552', 'gs://objectron/v1/records/chair/chair_train-01068-of-01106', 'gs://objectron/v1/records/chair/chair_train-01087-of-01106', 'gs://objectron/v1/records/
chair/chair_train-01048-of-01106', 'gs://objectron/v1/records/camera/camera_test-00018-of-00163', 'gs://objectron/v1/records/chair/summary.txt', 'gs://objectron/v1/rec
ords/cereal_box/cereal_box_train-00095-of-00819', 'gs://objectron/v1/records/chair/chair_train-00361-of-01106', 'gs://objectron/v1/records/camera/camera_train-00474-of
-00552', 'gs://objectron/v1/records/camera/camera_train-00452-of-00552', 'gs://objectron/v1/records/camera/camera_train-00282-of-00552', 'gs://objectron/v1/records/cer
eal_box/cereal_box_train-00237-of-00819', 'gs://objectron/v1/records/chair/chair_train-01097-of-01106', 'gs://objectron/v1/records/bottle/bottle_train-00746-of-00920',
 'gs://objectron/v1/records/camera/camera_train-00256-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00103-of-00322', 'gs://objectron/v1/records/chai
r/chair_train-00444-of-01106', 'gs://objectron/v1/records/chair/chair_train-00904-of-01106', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00160-of-00322', 'gs
://objectron/v1/records/chair/chair_train-01090-of-01106', 'gs://objectron/v1/records/camera/camera_train-00073-of-00552', 'gs://objectron/v1/records/cereal_box/cereal
_box_train-00050-of-00819', 'gs://objectron/v1/records/camera/camera_train-00111-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00049-of-00322', 'gs:
//objectron/v1/records/chair/chair_train-00480-of-01106', 'gs://objectron/v1/records/chair/chair_train-01023-of-01106', 'gs://objectron/v1/records/cereal_box/summary.t
xt', 'gs://objectron/v1/records/camera/camera_train-00509-of-00552', 'gs://objectron/v1/records/book/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_box_tra
in-00115-of-00819', 'gs://objectron/v1/records/bottle/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00068-of-00819', 'gs://objectron/v1/records/
cereal_box/cereal_box_test-00134-of-00322', 'gs://objectron/v1/records/chair/chair_train-00873-of-01106', 'gs://objectron/v1/records/chair/chair_train-00197-of-01106',
 'gs://objectron/v1/records/chair/chair_train-00350-of-01106', 'gs://objectron/v1/records/camera/camera_test-00038-of-00163', 'gs://objectron/v1/records/cereal_box/cer
eal_box_test-00073-of-00322', 'gs://objectron/v1/records/camera/camera_train-00304-of-00552', 'gs://objectron/v1/records/bike/summary.txt', 'gs://objectron/v1/records/
camera/camera_test-00046-of-00163', 'gs://objectron/v1/records/camera/camera_train-00396-of-00552', 'gs://objectron/v1/records/camera/camera_train-00062-of-00552', 'gs
://objectron/v1/records/cereal_box/cereal_box_train-00255-of-00819', 'gs://objectron/v1/records/camera/camera_test-00048-of-00163'}

Ideally, the assertion statement in the gist above would fail and the number of records in these two directories in the bucket would be equal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant