You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a list of blob names, ~37.000 stored in a txt file.
I'd like to download them, utilizing the parallel processing of azcopy.
Here is what I tried:
Pass all blob names seperated by semi-colon to azcopys --include-path parameter as documented here
$ azcopy --version
azcopy version 10.27.1
$ include_path=$(paste -sd';' blob_list.txt)
$ azcopy copy \
"https://${storage_account_name}.blob.core.windows.net/${container_name}?${sas_key}"\
"$dest_dir" \
--include-path "$include_path"
/usr/bin/azcopy: Argument list too long
I can't use --include-pattern because the blob names are too different to really be processed into a reasonable number of patterns.
I can loop over all the lines in my blob list, but then it's not parallel.
For some reason ChatGPT thinks I can pass "@blob_list.txt" to the --from-to parameter, but that doesn't seem to match anything in the documentation. I can see someone on StackOverflow is trying to use azcopy jobs create but that looks like it is deprecated at least since version 10.
The text was updated successfully, but these errors were encountered:
You can either filter by name or date/time (as in all files updated after a given time or something like that) if possible.
If filters are not possible in your case then running a script and firing multiple azcopy commands is the only option here.
I'm not sure of you mean looping each file individually. That is not really feasible for 10s of thousands of files, as azcopy spends a few seconds before download starts.
How to loop over multiple files together with azcopy is also not obvious at all.
Using xargs to chunk a list of filenames and loop over these chunks causes weird behaviour in azcopy, where it will print hundreds if not thousands of lines of "INFO: Discarding incorrectly formatted input message" per iteration.
xargs -a filelist.txt | tr ' ' ';' | while read -r chunk; do
azcopy copy \
"https://${storage_account_name}.blob.core.windows.net/${container_name}?${sas_key}"\
"$dest_dir" \
--include-path "$chunk"
done
Surprisingly, if you do echo | azcopy ... inside the while body, that will not be printed - I think that is related to issue #974
I have a list of blob names, ~37.000 stored in a txt file.
I'd like to download them, utilizing the parallel processing of azcopy.
Here is what I tried:
Pass all blob names seperated by semi-colon to azcopys
--include-path
parameter as documented hereI can't use
--include-pattern
because the blob names are too different to really be processed into a reasonable number of patterns.I can loop over all the lines in my blob list, but then it's not parallel.
For some reason ChatGPT thinks I can pass "@blob_list.txt" to the
--from-to
parameter, but that doesn't seem to match anything in the documentation. I can see someone on StackOverflow is trying to useazcopy jobs create
but that looks like it is deprecated at least since version 10.The text was updated successfully, but these errors were encountered: