Skip to content

Performance: Large files in general

John Rusk [MSFT] edited this page May 28, 2020 · 4 revisions

When running an AzCopy job that contains a small number of large files, AzCopy may sometimes mis-report the nature of the performance bottleneck. Specifically, if MD5 hashes are being computed for the content of the files, AzCopy may report disk as the bottleneck, when really the problem is the CPU load of computing the MD5 hash. This mis-reporting happens because the hashing and the file operations are deliberately very closely linked in AzCopy, to make sure that the hash reflects exactly what was read from or written to the disk.

If you are transferring one, two or three large files, and AzCopy displays a message that says "Disk may be limiting speed", then take a look at the actual speed you are seeing. If it is about 3 or 4 Gbps per file, then you may be experiencing this problem, or it may be an actual disk bottleneck. If the speed is substantially lower than 3 or 4 Gbps per file, then the issue is almost certainly an actual disk bottleneck, rather than anything to do with MD5 hashing.

There's no way to make MD5 computations go faster, because computing an MD5 hash is an inherently sequential process, so doesn't lend itself to being parallelized.

However, if you don't need MD5 hashes for integrity checks, you can turn them off. See the documentation for the --put-md5 and --check-md5 flags.