-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git checkout of large repositories is very slow #87
Comments
Thanks for filing! Good news - We're actively investigating the performance of Git on Windows and are working with internal teams here at Microsoft and external teams at Github to diagnose the problem and find solutions for reducing the performance gaps between Windows and Mac/Linux. I will update this thread as fixes are identified and complete. In the meantime, can you repro the issue on Windows and capture a trace via Feedback Hub? We can use this data to better understand how Git performs in the wild and fuel our investigations. |
My testing cloning Chromium locally between 2 SSDs (Intel 660p & WD SN550). Windows Terminal PowerShell Core window Days : 0 Windows Terminal Ubuntu WSL real 82m14.742s Seems like WSL does NOT help the Git performance in Windows. |
@itoleck are you using WSL2? if so cross-os filesystem performance is slow, https://docs.microsoft.com/en-us/windows/wsl/compare-versions#performance-across-os-file-systems |
@AvriMSFT posted feedback hub capture at https://aka.ms/AAcdkct. Although not sure how useful that is, as it's just even |
Ping, the label here still says "Needs Author Feedback", but i'm not sure what other information could i provide. |
Hi @kaidokert and @itoleck This is a bit tangent to the actual issue, but: the next Git release (2.32.0) will come with the "parallel checkout" feature, which allows I haven't tested it on WSL, but I'm getting around 2x speedup when cloning the linux repository with Git for Windows on a SSD. If you would be interested in testing it out before the release, you'd need to compile Git on your machine. The code for this feature has already landed in the As one of the developers of this feature, I'd be very interesting in any feedback you have about it. So if you do run the tests with parallel checkout or have any question/suggestion about it, please let me know :) |
It does seem like majority of the slowness here doesn't actually come from |
Hey @kaidokert I didn't see this called out in the template but do you have a Windows Defender exclusion enabled for this folder? It would be interesting to see how the cloning timing might vary with defender out of the picture. |
This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. |
@asklar Does Windows Defender normally examine files when they're being permanently deleted? Just curious; not too familiar with how antivirus works under the hood 🙂 |
I don't work on Defender so I don't know for sure, but I wouldn't be surprised 🤷 :) |
@asklar Haha, okay, thanks 😄 |
I have Defender turned off for the entire drive, yes. |
Update: The filesystem and performance teams are working on some major improvements to git which we think will directly impact your issue. Once fixes are in, I'll update the thread :). |
Note that running If you're in Linux/WSL, clone repos locally in the Linux filesystem. If your files are in Windows, run |
Periodic ping for any interesting updates on this issue ? |
There have already been some updates made to Git, so hopefully the original benchmarks are improved. More updates coming at Build next month ;) |
I'll do a re-test on Azure VMs side by side on Linux and Windows. Any particular base images to recommend for verifying improvements ? |
I had Terraform deploy 2 identical HW VMs in Azure, gist with config here Both free tier Standard B1s VMs, same disk config. Windows Server 2022 Azure Datacenter edition. All commands below were run in respective temporary directories, assuming those should default to decently fast out of the box. I used a smaller and lesser known repo as a benchmark just to cut down on the wait time. Its ~2GB.
Updated timings;
Windows:
About ~6x slower Switch branch, Linux:
Windows:
Again about 6-7x slower Linux delete directory:
Windows:
~80x slower If there are any obvious tunings or tweaks that should be done for disk performance I'd be really happy to know. Also if there's a faster disk config I'd be also happy to try it out - i have this in terraform and can redeploy different VMs/disk with a push. Hoping to make the repo publicly available on Github. Of course, maybe B1s aren't the best representative performance because they get throttled. They are free though. |
I don't know how much can be done with Windows Server, but certainly for client the best first thing to do is to run on a separate volume (partition on the same drive is fine, even a mounted VHDX is an improvement, just don't use paths starting with But I'd say going from the 25x+ difference to 7x difference is about what we'd expect right now. A non-OS disk might be better, but we've been looking more at real client machines rather than virtual server. Anti virus scanners are also a large impact that we believe we've reduced, though again, it should already have been less significant on Server so the improvement will be reduced. |
These are certainly better results, yes. Thank you ! As i mentioned i set this up as a Terraform repo, so i can easily test with Win 11 desktop rather than server ( e.g. win11-22h2-pro SKU / Windows-11 MicrosoftWindowsDesktop ) as well. I'll try the tips with structuring the disks better, at the moment i'm simply doing
and using c:\temp from there. But i'll mount a separate drive and see what that does. Thanks for the tips ! |
@zooba Can you elaborate on why doing Git operations on a non-OS partition would be faster? |
In brief (and I believe we have more detailed documentation on this coming), the system drive will have additional file system filter drivers installed in order to do certain tasks, such as OneDrive sync and system file protection. And a filter driver intercepts every file system operation on a volume to see if it needs to do anything. Generally this is quick, but not doing it is even quicker. So on a clean volume, you'll have far fewer file system filters in the way, which means that overhead is reduced. |
@zooba, Wow, very interesting! Thanks 🙂 |
@zooba, I took your recommendation and split my Interestingly, copying or deleting large folders via command line on Thank you very much for the tip! ^_^ |
BTW, if anyone else wants to try this, I'd recommend checking out https://learn.microsoft.com/en-us/windows/dev-drive/#what-should-i-put-on-my-dev-drive Those instructions are for the new DevDrive feature in Windows 11, but I think they also apply well to what one should put on a non-OS partition in Windows 10 🙂 |
Although DevDrive is only available on Windows 11, ReFS is available on Windows 10, so I figured I'd give it a shot. @zooba In my tests, copying files via
I would have figured the copy-on-write mechanism would have made copying on ReFS super fast. Maybe it's because the drive has BitLocker enabled? (BitLocker is enabled on both drives) OS: Windows 10 Enterprise, Build 19045 |
It does, but I'm not sure it's automatically enabled (and I'd be surprised if We're not done with perf work yet - getting devs onto ReFS is just the first step - so you can expect future updates to have more improvements over time. We did also ship a few perf improvements to ReFS specifically with the Dev Drive update, so you won't have those on Win10. |
REFS Block Cloning support built-into the copy engine is enabled in the latest Windows Insiders Preview (WIP) - Canary-External release. |
Windows Build Number
10.0.18363.0
Processor Architecture
AMD64
Memory
200 Gb
Storage Type, free / capacity
SSD 200GB/ 1TB
Relevant apps installed
git version 2.31.1.windows.1
Traces collected via Feedback Hub
N/A
Isssue description
Checking out large repos even from a local mirror is slow, compared to Linux / Mac.
Even more importantly, so is switching branches / tags.
Steps to reproduce
Let's download a sample well-known repo, about 23 Gb
git clone --mirror https://github.com/chromium/chromium.git chromium-mirror
Now let's check out a source tree from local mirror:
Powershell, NTFS drive:
Just over 12 minutes.
On Linux, ext4, similar hardware
About 24 seconds.
Now, let's check out a bit older tag:
Powershell:
15 minutes to switch a tag.
On Linux:
Again, about 22 seconds
Finally, let's delete these experiment directories:
Powershell:
7 minutes
Linux:
5 seconds
Expected Behavior
Would expect checkout speed on similar disks to be at least on the same order of magnitude.
Actual Behavior
The operations in this example are about 25-30x slower, on almost identical hardware.
Of course the problem doesn't seem inherent to Git, it's a similar I/O problem when working with large directory trees with many files, i.e. Node node_modules issues ( #21 ) and others ( #17 #27 ), as evidenced by the fact that
rm -r
took over 60x longer.The text was updated successfully, but these errors were encountered: