Investigate `parallel_upload_from_S3` chunking logic #273

alanking · 2024-09-17T15:19:24Z

irods_capability_automated_ingest/irods_capability_automated_ingest/scanner.py

Lines 385 to 393 in 52f9835

    
           # Separating file into chunks of equal size for each thread 
        
           bytes_per_thread = total_bytes // num_threads 
        
           # Creating a range of bytes for each thread 
        
           chunk_ranges = [ 
        
               range(j * bytes_per_thread, (j + 1) * bytes_per_thread) 
        
               for j in range(num_threads - 1) 
        
           ] 
        
           # The last thread's range should end at the size of the file: total_bytes 
        
           chunk_ranges.append(range((num_threads - 1) * bytes_per_thread, total_bytes))

The use of range in the chunk_ranges list is a little bit confusing.

The ranges are used to determine the number of threads to use and the byte ranges to use for each of those threads so that they can be dispatched in a loop:

irods_capability_automated_ingest/irods_capability_automated_ingest/scanner.py

Lines 405 to 407 in 52f9835

    
           # Loop to open an Io object for each of the threads and call the thread copy function 
        
           # The initial thread uses the Io from above 
        
           for thread_id, byte_range in enumerate(chunk_ranges):

Please investigate to see whether this can be improved for clarity.

Created because of this review comment: #267 (comment)

The text was updated successfully, but these errors were encountered:

alanking added the refactor label Sep 17, 2024

alanking mentioned this issue Sep 17, 2024

New Celery app, remove kubernetes / flask, update tests to iRODS 4.3.3 #267

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate `parallel_upload_from_S3` chunking logic #273

Investigate `parallel_upload_from_S3` chunking logic #273

alanking commented Sep 17, 2024

Investigate parallel_upload_from_S3 chunking logic #273

Investigate parallel_upload_from_S3 chunking logic #273

Comments

alanking commented Sep 17, 2024

Investigate `parallel_upload_from_S3` chunking logic #273

Investigate `parallel_upload_from_S3` chunking logic #273