-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explorer Zip uncompress very slow #91
Comments
Unfortunately, 3rd party tools for ZIP-archives are quite a mixed bag. A high quality tool that is fast, secure, user-friendly and free does not seem to exist. |
see this thread, turns out the ancient code from 1998 isn't very efficient, it's reading one byte at a time |
There are a number of core performance problems with the ZIP code. The one related to "Move" operations (mentioned just above), is more fully described here: https://textslashplain.com/2021/06/02/leaky-abstractions/ |
I would say libarchive fits the job. It has already part of Windows since https://techcommunity.microsoft.com/t5/containers/tar-and-curl-come-to-windows/ba-p/382409. |
See also #27 This ticket has good sample zip file to work with. My personal experience is my work Dell Precision laptop with SSD, unzipping a 50MB file with 25k files inside:
The gap is ... yawning. I disabled all the AV and Windows time came down to 6m. Might be bitlocker is affecting the speed (or one of the other security agents). |
I doubt BitLocker is affecting performance to that degree -- on modern CPUs (iirc Haswell+, aka Intel Core 4th gen) the perf hit should be <1% since the encryption can use specialized CPU instructions (I think I read this on Anandtech somewhere). On less-than-modern CPUs, I'd guess 5 or maybe 10% at worst. Definitely not two orders of magnitude! |
I tried unzipping the #27 flutter zip above, took around 20m. Inside linux VM on same laptop, a few seconds. These kinds of basic slowdowns are the most valuable and most like the hardest to fix in Windows. |
Well, this not 100% related to Windows ZIp but I thought I'd post this here for others that might have a similar issue. I got our IT department to temporarily disable BeyondTrust, as Avecto Dependpoint Service (defendpointservice.exe) was consuming a large amount of CPU during unzip and particularly delete and copy operations. The difference was dramatic on my dell precision laptop with 2TB ssd. This was most obvious when deleting or copying the files once unzipped:
Also impacts on ZIP speed but not as dramatic, I suspect due to the previously noted poor implementation of Windows Explorer Zip somewhat masking the BT issue. Interestingly, I have a separate Anti Virus installed, Coretex, which seems to have much smaller filesystem performance impact than BT. Oh, the irony! |
Yikes. I always avoided doing any IO heavy tasks under the shell for reasons like this. It would be nice to see this overhauled, perhaps with support for some modern compression algorithms such as zstd. Time for @PlummersSoftwareLLC to come out of retirement. |
I'd say the zip tools need a proper overhaul, not only performance fixes, as the UX alone is bad enough to make me install 3p tools, absolute minimum would be 7-zip context menu equivalent, adding smart compression that skips compression of files that doesn't reduce size enough and double-click-to-extract (with option to enable removal of source file when done) would be another step And I absolutely never wanted the wizard that adds few extra clicks to my flow |
God no. The current implementation is already problematic in enough ways. Making compression inherently slower by forcing an additional large section read after throwing out CPU time on a deflation operation, just because the result didn't match an arbitrary (probably hard-coded 'sane') compression ratio, deployed across the most potato spinning disks, Intel Shitrons, and the likes, just to save a few ms-to-seconds when deflating massive blobs on high end machines is stupid. Do you want to be responsible for the hack that throws away and recalculates archive dictionaries over late and very much arbitrary deflate/store selection? Also noting that, an optimal multi-threaded implementation would have an even harder time rolling back. It's a nice idea, but in reality, it's not practical, it's going to require way too many man hours to implement, the ux is going to be too hard for the average normie to understand, and it's going to make the likes of your mom compressing family photos more stressed because their budget laptop is going to be wasting resources to meet an arbitrary store-instead-of-deflate threshold instead of just archiving with the intent to compress. PS; do you have any data whatsoever that suggests this method of 'smart' compression would benefit anyone or is just an idea |
regarding data I performed quick and dirty experiment using Bandizip to compress my downloads folder (4 946 928KB) containing mostly poorly compressible files (installers, videos and so on) with as they call it High speed archiving active and not, using Deflate setting maximum available compression (except for the HSA switch)
with this out of the way I decided to try some well compressible data like a source code with almost no binaries, 1 029 669KB
after that I added some video files and archives to the mix:
so in a mixed bag it definitely works and is worth it, for everything else it looks like the way they determine if they should use Deflate or Store is not efficient enough to be worth it, at least on my machine to dive deeper I tried constraining the process to single thread while compressing my downloads:
so it looks like sometimes it's worth it and sometimes not, possibly with improved detection it could be made to be worth it almost every time, but I'm not intending to push this idea any further but it seems hat these weaker machines you are worried about would be the least punished in cases when it doesn't work as well |
The problem I see with this is that, the detection requires you to run the deflate algorithm anyway. One could probably buffer the raw streams to solve the double read issue; however, you're still stuck with this problem of "detection." The only non-hack way to do solve this would be to accept the CPU overhead of compressing once & buffer the output + changes to the compression dictionary before committing. Most compressors don't scale to multiple threads and the non-hack solution would toll the CPU just as much. Worse, I still suspect those few multi-threaded compressors would suffer the most - if their dictionaries have to be thrown away to account for dropped chunks on late file omission. From my tests with zstd, flushing at the end of each substream doesn't obliterate the compression ratio (+1GB, end: 18GB +- 1GB, total in: ~30GB raw assets), which might be indicative of the extent to which this could scale. My concern with these numbers is they all come from proprietary software whose baseline and implementation efficiencies are unknown -- we have no idea what they did to fudge those numbers. Needless to say, I would suspect bandizip of inefficiently compressing files, delegating throwaway compression operations to an otherwise wasted thread to solve the store/deflate question, and using preemptive hacks to throw away likely-uncompressible files (not so smart. you wouldn't want this advertised as the "smart" method uninformed users will treat like the defacto compress button). I think we should be focused on the state of parallelization in the world of archives rather than cheap hacks to throw away probably-uncompressible files. Considering accurate detection could be as expensive as the deflation itself, those resources would be best spent on compressing another chunk or file. Off the top of my head: Containers: I believe constraints of the containers are a bit of a problem if you wanted to take multi-threaded compression seriously. An entity as large as Microsoft could look into rolling their own to solve this UX issue for once and for all. Even an OSP solution to solve this issue once and for all would be nice. 15 competing standards |
zip container, despite all shortcomings, is used for basically everything nowadays and is the go to container format for anyone intending to create their own container but as there already exists a decent MT capable algorithm and container it would be nice to have support for it too |
It is kind of hilarious when I have to wait ~10 mins. to extract a specific folder from a 1GB zip file. The explorer Window shows transfer rates of a few KB/s. On a 6 core laptop with NVMe. |
It's also kind of ridiculous that numerous folks at Microsoft know about this, they also talk and complain about it on Twitter, and yet none of them has just gone in and fixed it. Even changing the (If nobody at Microsoft is going to take action on the issues filed in this repo, why does this repo even exist?) |
Actually, this will probably be the reason to migrate to Linux. This is not the first time Linux is faster, but in my new project, all data is in zipped csv updated daily). I can't believe unzipping is so slow. |
7-zip is a lot faster at uncompressing zip files than Windows Explorer. |
@nmoreaud, Are you running a version of Windows 11 that supports decompressing |
@Bosch-Eli-Black I thought so but I was wrong. I'll wait for the update. |
@nmoreaud Sounds good :) Would be awesome if this update also made ZIP files much faster! :D |
Libarchive support was added to FileExplorer in the Sept 2023 & 23H2 update for extraction. This complements the support added at the commandline via tar.exe. |
I would keep this issue open as apparently PowerShell's Expand-Archive is still faster than Explorer.exe's (even with the current Win11 build - 22631.3007): https://www.reddit.com/r/PowerShell/comments/1972k40/why_is_expandarchive_significantly_faster_than/ |
With this zip : https://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/2023-12/R/eclipse-java-2023-12-R-win32-x86_64.zip Build 22621.2715 |
Appreciate the metrics! |
Extracting https://github.com/dotnet/msbuild/archive/refs/tags/v17.10.4.zip (10 MB, ~2500 files)
Windows 11 Pro 23H2 22631.3593 |
@AdamBraden @AvriMSFT Please could you reopen the issue? |
Re-opening to track File Explorer zip perf. Was incorrectly closed earlier. |
Tagging @DHowett for visibility. |
Perhaps not directly related, but i tried using the tar.gz decompression support in the windows 11 file explorer today on https://github.com/Kitware/CMake/releases/download/v3.29.8/cmake-3.29.8-linux-x86_64.tar.gz, Windows wanted to spend 1hr 45 minutes on it. Extracting the equivalent windows build https://github.com/Kitware/CMake/releases/download/v3.29.8/cmake-3.29.8-windows-x86_64.zip took 5 seconds. I'd have instead posted this comment on #27, but that issue was incorrectly closed as fixed, and then locked to additional comments. |
I don't think #27 was related to your issue, because it was testing with Unless tar.exe or Edit: I went ahead and used your two test cases locally (Microsoft Windows [Version 10.0.22631.4317]), and my measurement says tar.exe is still wildly faster than Explorer, but somehow .zip files are even slower than .tar.gz in Windows Explorer.
Windows Explorer extracted the .zip file in about 6m 20s, and the .tar.gz in about 25 seconds. I also noticed that the unzip explorer window shows the graphical progress bar, file countdown, etc, but for .tar.gz I only got a "Copying file to directory" display, even when expanding the details. So my current conclusion is that the fancier the UI, the longer the extraction takes. (At the 1-minute into the .zip extraction) (About 10 seconds into the .tar.gz extraction) Is there any chance that somehow zip is using some old (pre-libarchive) code in Windows Explorer, so didn't get the benefits intended by #91 (comment) perhaps? This machine has been continuously upgraded from Windows 10 1607, so maybe some legacy code-path or registry key is messing with me. I also tried turning off Windows Defender Real-time Protection, and the command-line tool got about 15% faster (4 seconds for both), while the Explorer extractions got significantly faster (10-15 seconds for .tar.gz, and 3m 45s for .zip). So there's definitely still something about .zip in Windows Explorer. And also something about Windows Defender (which is what #27 turned out to be). And a little bit about Windows Explorer specifically For completeness, I tried PowerShell's Edit: Dang, looks like the PowerShell progress bar actually improves performance. Or I have measurement jitter, I guess...
On the same system, the eclipse installer from January takes 3.12 seconds to extract with tar, 11.36s to extract with Expand-Archive, and 2m 16 seconds with Windows Explorer. It's a larger archive (7-8x) but with much fewer files (about 3:1) so I suspect Windows Explorer extraction is scaling with file count, while tar and Expand-Archive are scaling by archive size. Dumb idea: With the More Details tab closed, Windows Explorer extracted the eclipse zip file in 1m53s, which is... 15% faster? The cmake zip file extracted in 5m 51s, which is about 8% faster? (I tried the cmake tar.gz file which might have been faster, but it's already fast enough that a 10% improvement could be test jitter. And it uses a different dialog box.) Oh no. No no no. Noæ even. Is Windows explorer trying to show every filename in the dialog, and hence being limited by WinUI update rate? (Even if that's somehow the case, it's not the majority of the performance difference, but it's still not a desirable situation.) |
I believe the whole process needs an overhaul from UX perspective, in many cases going through the Explorer wizard takes long enough I wouldn't bother using it even if it was faster than 7-zip, and if that overhaul takes care of some GUI related slowdowns that'd be even better |
Yeah, the 7-zip explorer integration (particularly the dialog-free options) is terrific. I recall that the right-click-and-drag actions needed to extract into a temporary directory first and then move into place due to some limitation of the Explorer integration, but I don't know if that's still the case (or maybe it never was, and I'm misremembering...) That said, I know the Windows design direction has been to reduce the amount of right-click stuff, amongst other reasons because it has a performance cost and with sufficient extensions active the right-click menu takes time to visibly appear. I thought the abbreviated menu plus the "Show More Options" menu was intended to address that, but even that list is growing and at least on my machine has a visible gap between right-click and the abbreviated menu appearing. Anyway, UX is wandering off from the repo topic, and is a very open-ended discussion in itself. |
Isssue description
The zip uncompression in Windows Explorer is not very performant. Depending on the zip archive it is so painfully slow that it is unusable. It does not utilize modern multi-core CPU cores and does IO in incredibly inefficient ways.
This is a very known issue (https://twitter.com/BruceDawson0xB/status/1399931424585650180, https://devblogs.microsoft.com/oldnewthing/20180515-00/?p=98755) and has been for years. There is no real diagnosis here need: The library doing the unzipping, is from 1998 and nobody at Microsoft knows how it works.
Developers work with zip files very often. Want to quickly look at an archive of zipped log files from something? Might as well make a coffee while you wait.
For me the process is often like this: Start unzip from Explorer, get frustrated that it takes so long, open 7-zip (or similar), uncompress there again, 3rd party tool finishes unzip, I check over to Explorer which is still at around 10% done.
Steps to reproduce
Uncompress a zip file, especially one containing many small files
Expected Behavior
Unzip should be very fast, to enable a developer to stay in the flow and not be slowed down by unnecessary wait times.
Actual Behavior
The zip compression tool in Explorer is so painfully slow that for most devs it is unusable.
Windows Build Number
10.0.19043.0
Processor Architecture
AMD64
Memory
8GB
Storage Type, free / capacity
512GB SSD
Relevant apps installed
None
Traces collected via Feedback Hub
I can if needed
The text was updated successfully, but these errors were encountered: