Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix chunking equal number of videos for each thread. #40

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pulinagrawal
Copy link

Previously NUM_THREADS sized chunks were created out of the videos list. Now NUM_THREADS chunks of almost equal size are created.

@YodaEmbedding
Copy link

YodaEmbedding commented Jul 17, 2020

That's one way but I think the following way is nicer since a single thread processes contiguous array elements (from which it is easier to trace any errors which might have occurred during the process):

def split(xs, n):
    """Yields n roughly even-sized chunks from xs."""
    size = len(xs)
    q = size // n
    r = size % n
    offset = 0
    for i in range(r):
        yield xs[offset : offset + (q + 1)]
        offset += q + 1
    for i in range(n - r):
        yield xs[offset : offset + q]
        offset += q
    assert offset == size

One should also sort the list of files beforehand via:

video_list = os.listdir(VIDEO_ROOT)
video_list.sort(key=lambda x: int(x.split(".")[0]))

EDIT: On the other hand, your method makes it so that the dataset is mostly processed "in order", assuming the threads are roughly synced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants