-
I am trying to understand how BucketingSampler work with tar data. But the efficiency in tars is that they are read sequentially, no random access, but then if they are ready sequentially how can you verify that all samples in a batch have similar length (to avoid padding)? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Search for |
Beta Was this translation helpful? Give feedback.
-
Thanks @pzelasko, So it keeps s buffer or different sizes in Ram and then select from that, the buffer is already close to what is needed because it draw from buckets - Smart!!. |
Beta Was this translation helpful? Give feedback.
Search for
DynamicBucketingSampler
. It holds an in-memory buffer for data that is partitioned intonum_buckets
.