Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate parallel_upload_from_S3 chunking logic #273

Open
alanking opened this issue Sep 17, 2024 · 0 comments
Open

Investigate parallel_upload_from_S3 chunking logic #273

alanking opened this issue Sep 17, 2024 · 0 comments
Labels

Comments

@alanking
Copy link
Collaborator

# Separating file into chunks of equal size for each thread
bytes_per_thread = total_bytes // num_threads
# Creating a range of bytes for each thread
chunk_ranges = [
range(j * bytes_per_thread, (j + 1) * bytes_per_thread)
for j in range(num_threads - 1)
]
# The last thread's range should end at the size of the file: total_bytes
chunk_ranges.append(range((num_threads - 1) * bytes_per_thread, total_bytes))

The use of range in the chunk_ranges list is a little bit confusing.

The ranges are used to determine the number of threads to use and the byte ranges to use for each of those threads so that they can be dispatched in a loop:

# Loop to open an Io object for each of the threads and call the thread copy function
# The initial thread uses the Io from above
for thread_id, byte_range in enumerate(chunk_ranges):

Please investigate to see whether this can be improved for clarity.

Created because of this review comment: #267 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

1 participant