-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow requests for static Media files when a lot of them are requested at the same time under Azure App Services (Web Apps) #14859
Comments
OrchardCore/src/OrchardCore.Modules/OrchardCore.Media/Services/MediaFileStoreResolverMiddleware.cs Lines 101 to 104 in a3a2b07
Maybe good to keep it in the cache a minimum of time, at least to be sure the cached file is recognized as existing (I saw that in Otherwise I would suggest to use a regular async semaphore with a regular double checks. |
Hmm, I'm uncertain about the whole worker code here. // Locks are used to ensure that only a single thread can initialize a System.Lazy`1
// instance in a thread-safe manner. Effectively, the initialization method is executed
// in a thread-safe manner (referred to as Execution in the field name). Publication
// of the initialized value is also thread-safe in the sense that only one value
// may be published and used by all threads. If the initialization method (or the
// parameterless constructor, if there is no initialization method) uses locks internally,
// deadlocks can occur. Will this still work if the supplied delegate closes over different variables in different requests? It seems to me that it's not guaranteed here that it'll recognize both as being the same initialization method. Do you think I can get some more data with further tracing here before using a profiler in prod? |
Good catch I think about closure, as I can see there is at least one error.
Should be replaced by
Worth to just try this first, I will submit a PR. |
Adding this here because it's not related to the PR now. I did some testing with the code under #14869 and added some tracing. It turns out that what's slow isn't happening in I fired up the Application Insights profiler, and this was the hot path: The whole thread time here was 10s, and a lot of time is added all around the place to contribute to that, but in the end, it comes down to Drilling into this further with framework dependencies enabled we can see this:
Note that Hmm, can it be that Media Cache is actually making things slower, since the file I/O of an Azure App Service is slow? I also checked out the App Service's IO metrics, but there is nothing really, with it maxing out at <6 MB/s. |
Didn't use Azure App Service for a while but as I remember, depending on the plan (dedicated VM or not) it was a network file system whose access times may depend on the usage of other apps and may not be so good, particularly when writing a file, like we do for caching. |
That's a great point, and yes, that's indeed the case with App Services. I'm looking into using a local drive for such caches to see if it makes a difference. |
It seems that we'd need to put the whole |
So there're multiple approaches to this, perhaps depending on whether you run Windows or Linux, because why would it be simple:
Further useful docs:
Next, I'll try each of the caching options though I think 2 GB of Local Cache will be too low for anything useful. |
Very good discussion, thanks. We might need some options to get the caches in specific locations on App Service. We do recommend to put App_Data is a persistent folder when running in Docker. The cached files should be placed in local (ephemeral) physical disks. |
It seems that we'd only need to set I wanted to check if our app will fit into the 2 GB Local Cache limit but every attempt of mine to get the size of I've done some more profiling, and charts like this consistently come up as the hot path: BTW sometimes I also see JIT being on the hot path, but that's something I'd expect: Sometimes events-related things again: And: This again suggests IO issues to me. |
First I tried out App Cache by setting the App Service configuration So, I tried to set
I'm not sure what to try next, since apparently, the local drive is slower than the network one, which makes little sense to me (it doesn't make sense that writing a couple dozen 200 KB images would cause an IO bottleneck either). In the meantime, it turned out that 2 GB will definitely be a lot less than what we need, so Local Cache is out of the question. I have the following ideas next:
|
This has been designed this way, we can't stream the content of the files we serve. One repro though could be to setup a site with static files and a static file provider for each different location, then to a test on each of these endpoints. I would expect it to behave the same way, but I worry that it would not show up as slow as you are describing, or more people would have mentioned it. |
I see, so we always need local files. Do you mean, to test static file performance as raw as possible in multiple Azure regions? I'm not sure I have that kind of motivation in me :). |
@Piedone no, I meant testing the different disks/folders we can use locally with just static files, no orchard middleware. In the same region/subscription you are already using. Just to repro the problem: |
I see what I can do. For now, I did some testing with file mounts, following the docs. All of the tests below are with the same App Service as before, without any traffic apart from me.
So, this is not a solution. However, I continued testing the Premium tier because (since you're closer to renting a piece of hardware than with the default App Service storage) you ought to get more consistent performance. Opening a page after a cache purge with 26 resized images (https://ikwileentaart.nl/gebakjes), causes, according to the metrics of the file share, almost 2k transactions (and ~3 MB egress/ingress), while a page with 7 resized images (https://ikwileentaart.nl/mini-gebakjes) causes 735 transactions (1 MB egress/ingress). Opening the same pages once the caches are warm causes 286 transactions (900 KB egress, 55 KB ingress) and 155 transactions (675 KB egress, 35 KB ingress), respectively. This seems like excessive storage usage in terms of transactions, which as I understand, are basically file operations: I also checked simply opening the Media Library admin (also after a cache purge, but having visited it before so not JIT compilation), to rule out anything custom. Note that the images there are thumbnails and thus also resized. Loading the 10 images on a page there each has a latency of around 1s usually, with up to 5s. This causes 450-600 transactions per page (with 0.5-1 MB egress and ingress). Opening the second page of Media Library (with empty browser cache but with a warm Asset cache), just to be sure that the only thing that happens is loading those 10 images, causes 113 transactions (with 140 KB egress that corresponds to the file size of the images loaded, and 20 KB ingress for some reason). The third page with 8 images caused 91 transactions. The server-side latencies (previously I talked about client-side latency, i.e. what you see in a browser) of these requests were around 50-150 ms (averaging at 97 ms). While this is not huge, we're talking about 10-25 KB images that should be served a lot faster by an idle server/storage IMO. Finally, I also loaded a single, unresized but at that point uncached 10 KB image (https://ikwileentaart.nl/media/VADERDAGCAKE.jpg), which took (the server-side latency) 86 ms with 17 storage transactions. https://ikwileentaart.nl/media/Unknown-21.jpeg took 40 ms and 8 transactions BTW. Doing the same again for the now cached images took 10 transactions and 56 ms, and 7 transactions and 63 ms, respectively. Keep in mind that the only thing this Web App did was serve these images. So, it seems that OC is doing something excessive because a handful of small files cause a large amount of transactions. I can't comment much on the image resizing that ImageSharp does, but for the simple asset caching I'd expect perhaps 3 operations per file (one for existence check, one to write it to the cache, and one to load it), but the metrics rather show at least 8. For reads, I'd expect at most 2 (existence check, load) but I see around 7. Now I'm not sure if the amount of transactions is the bottleneck here, but I can only see that and egress/ingress as the metric, and the latter two certainly don't seem like something that should even register. Most possibly the storage transaction latency dominates (which is supposed to be single-digit ms for Premium storage). In this sense, storage here is more like database access: it's less of an issue what you do in a transaction, and more how many transactions you have, because the latency of each transaction will add up. Since Azure App Services use shared network drives, it's not just that you're accessing shared slow HDDs, but those are also across a network (and if you use Premium storage with SSDs, those are still across a network). Even with local HDDs (which is not unusual for a simple webserver) you can expect ~5 ms latencies at least, so these can quickly add up. |
For info, for image resizing, related to the For example to check if the cached image has expired and need to be re-processed. But when there is no command (e.g. resizing commands) normally the This knowing that our middleware to manage the |
I'm not sure what to try next. It seems that without diving into how ImageSharp manages file caching, and how ASP.NET serves static files, we can't really optimize file access. While I'm quite sure, I'm not 100% sure that file IO is indeed our problem, or whether simply raw single file caching/loading performance is. The latter doesn't look good, but if it were the only issue, then we'd see consistent ~100 ms response times for file requests (though I'd want no more than 10 ms). A large part of the time we do see that, but a lot of times it's rather 1-2 or 10-20s, so there should be something else too. My hunch is that there's some storage usage burst throttling going on behind the scenes. I didn't find any info about the storage of an App Service, but for file shares, which they kind of use, this is a documented feature (see "burst credits" for Premium storage, and "Max IOPS" for standard storage). Then this would cause these huge latencies on pages when a bunch of images are loaded all at once. This would kind of explain the issue, since e.g. the above-mentioned 2000 storage operations for a cold page view would go over a standard file share's max IOPS for 100 ms (which is "1,000 or 100 requests per 100 ms", whatever this "or" means). So, there in the worst case we'd be delayed for 2 s for such a page view, which is in line with what we see (and keep in mind that this is a single page view, a lot else is happening on the web server storage-wise, also for other requests). Why is local storage even slower? I didn't find info about what kind of VM an App Service runs on but for Standard App Services they most possibly use HDDs (and Premium definitely uses SSDs). I have no idea which HDD tier would be uses, but looking at the size table most possibly we'd get one around 500 max IOPS. This being shared among everything that happens on that VM (and running Windows, IIS, and everything else apart from serving that one request) can be exhausted immediately. The other app where we see this issue runs on the Premium P3v3 plan, and thus its backing SSD might have the same performance as a 64, or at most 128 GB Premium SSD (since the free local storage on such App Services starts at around 62 GB). This would offer at most 500 IOPS as a baseline with 3500 IOPS bursting. This is much lower than premium file shares, but due to a lower latency can still be interesting. Next, I'll check out whether using a local webroot will help on a Premium App Service. |
BTW we consistently see low performance (with 4-5 s server response times) for The stats of the slowest requests for us are almost all static files. |
Sorry for not being able to help more, except for sharing some hypothetical thoughts. Yes, all middlewares use the file system, our middleware, the IS middleware (how we configure it) and the static file middleware. This because we assume that the file system is faster than the blob storage without taking into account Azure App which is only considered as a specific case among others. But my feeling is that OC should be well adapted to Azure App because it is often used. So maybe we should provide the ability to override the cache configurations to use for example |
Those could definitely help with the resizing use case and we could try them. We still need to do something with the simple static file access use case. Perhaps a similar Blob Storage-specific implementation would help there. |
Did some further testing: Higher file share IOPS: I thought about increasing the size of the premium file share from 100 GB, since this would also increase the max IOPS from 3100 and the burst IOPS from 10000. However, I didn't do this because even at 1000 GB the burst IOPS would stay the same, and the base max IOPS would only increase to 4000. Linux: I did some perf testing with Linux App Services before in 2021 and found that Linux on the same tier served requests with 25% larger latency on average (in a production scenario for DotNest, running it for about a week). Nevertheless, I wanted to see if it makes and difference now. While the file share type used by Linux and Windows App Services seem to be the same kind (though not sure about the file system), the locally used file system is surely different (EXT4 vs NTFS) even if the hardware is the same, so I figured there might be better performance (since EXT4 is really good with a lot of small files). Here are the results. I used the Code publish model, i.e. not Docker, S1, the same as for the Windows testing. Premium file share (see Windows results here): It's roughly the same but a bit slower (1-6s responses), and the reason is that interestingly, it issues a lot more storage transactions. The previously mentioned https://ikwileentaart.nl/gebakjes page with 26 resized images causes 5910 transactions instead of ~2k under Windows, while https://ikwileentaart.nl/mini-gebakjes produces ~700 ms responses with 1.7k storage transactions (vs 735 under Windows). The increased storage usage is very curious. Egress/ingress is in the same ballpark. I also repeated the Media Library test: The first page seemed slightly faster, with images loading in 1-2s (1k transactions vs <600 on Windows). The second page was slower with 1-5s responses and 2k transactions... Local temp folder (see Windows results here): This doesn't seem to be as well supported as under Windows, since Kudu doesn't display storage usage for the local folder, nor do the docs go into any details (as opposed to with Windows. Anyway, I used a subfolder of This, just as with Windows, was much slower, with resized files loading in 1-55s, mostly closer to the latter (and even for small pages 1-6s). The whole process was visibly throttled. I also tried App Cache without any special further config (i.e. OC used the default webroot). This seems to be the same perf as the Windows default. |
Sorry didn't have so much time but to be sure I'm following you. So, as I understand the biggest issue is when many images are resized, but there is still an issue even if the images have no resizing commands and even if they are already cached. Would be interesting to know how non media static files are served but maybe you already tried it. Not sure at all but I may have found something but first are you sure that the What I saw is in OrchardCore/src/OrchardCore/OrchardCore.Mvc.Core/ShellFileVersionProvider.cs Lines 99 to 108 in 644a8d9
So maybe all or part of the problem is that you have many file watchers, if that's the case we could find a way to not use them or at least change the way these files are watched.
|
No worries, thank you for helping, JT. Yes, the biggest issue is when many (>10) images are resized simultaneously, like when loaded on one page (which is the case for any gallery/brochure-like page with thumbnails). A smaller, but still similarly pronounced issue is when such images are simply loaded at the same time, when their backing storage is Azure Blob Storage. While I didn't specifically test non-media static files, no such files are there in the top 100 slowest URLs for DotNest. In the tests I've elaborated above, Hmm, interesting idea about the file watchers. While I can definitely see these becoming a problem when the app runs for a while, note that in my tests the performance issue is immediately apparent when opening a page with many images after a new app start. I wanted to see what the actual storage operations are. So, under the storage account's diagnostics settings I added one config that enabled all the logging. There are what accessing an uncached image ( This is after it was cached: Not much time is spent, but it's 13 transactions, roughly what I've seen earlier. If this is a single request, then no problem. however, if dozens like this are issued simultaneously, the app will get throttled. A single resized image is requested for the first time directly via its URL ( After it was cached: I then also clicked around on Media Library admin, opening its first page, then the other two too, with 28 resized images being shown and freshly resized. This, as before, caused some requests to take seconds. So, all in all, a lot of storage transactions Note that these are not just file read/writes. |
Yes, many operations, interesting to see all operations, for example on I tried to analyze the simplest use case of non resized and already cached media files. At the end all static file providers are involved until one find a file, we can see this, there are operations on The So for me, file exists or directory exists => I pass So checking if a file exists and then reading it => So under In fact I think that a transaction may be related to multiple operations, anyway there are many operations, the weirdest being the |
Interesting... We do need to do something with this though, since it's quite a peculiarity that we can serve complex content pages with a high r/s within 10s of ms, but serving a 10 KB image routinely takes seconds (which issue can only be alleviated if you use a CDN, which is a good practice in any case, but still). @sebastienros would you be able to kindly ask some App Services people to chime in here? |
Another workaround is to use Response Caching in a reverse proxy, or Output Caching within the app to cache /media requests (with care for not caching authenticated requests but still cache the ones setting cookies). This would essentially add an in-memory layer of caching to Media files. A better approach would be to do that directly, within the middleware instead. |
We could have something like a |
That can help with reads (i.e. when unresized Media is accessed, or a cached resized image) but not writes (including accessing resized images the first time), since Since there are existing in-memory Or alternatively, at a smaller scale, we could replace The Hmm, blanket output caching for /media might just be easier. |
Hmm, perhaps a cached webroot
I'll check this out. |
Well, no... I think I only now start to really grasp how the whole thing works, in the myriad of providers and middlewares. When it comes to standard unresized Media files, the For resized images, So... For some in-memory caching, we'd need the following:
An alternative is still (in-memory) output caching. |
I'm testing the Azure Blob cache of ImageSharp. It seems quite useful, so we could have a feature for it: #15016. |
I did some longer experiments with DotNest with various IS cache approaches. I tried combinations of S1 and P0V3 tier Azure App Services, as well as storing the wwwroot in the standard way locally (which for App Services means using the Standard-tier file share it uses under the hood), as well as the ImageSharp cache, or in mounted separate Standard or Premium (SSD) file shares, or the ImageSharp cache in separate Standard/Premium Blob Storage accounts (see #15016). Results: You get the best performance by putting the wwwroot folder, including the ImageSharp cache under it, onto a Premium file share, mounted under that folder path in the App Service. Slow requests did still happen, but only occasionally, and those seem to be due to the CPU of the webserver being a bottleneck (when resizing images), or the Blob Storage account used for Media storage. Moving the latter to a Premium tier would most possibly eliminate the IO issues completely, without significantly impacting the costs |
Not much to do in OC here, in the end. You can do the following to make Media requests faster, all just basically throwing money at the problem, I don't see any glaring opportunity for optimization:
|
Describe the bug
We're seeing strangely slow requests for static Media files. This can't be explained by slow storage, and while both sites I've seen these on use Azure Blob Storage, I ruled out that being slow (and the requests are also slow if the files are already cached on the server's file system).
Perhaps related: #1634.
To Reproduce
I don't have a 100% repro, but this seems to be the rough playbook when the issue happens:
Note that while the linked pages use image resizing, I've seen exactly the same issue on another site that loads images without any resizing.
The issue seems to correlate with increased CPU usage, but well below the server's capacity (<10%). Other metrics are either uncorrelated/normal or show the effect (like increased client receive time), not a probable cause. So, it doesn't seem that simply there are too many requests coming in and the server is too low spec to handle it.
I ruled out recent shell restarts with tracing. I added tracing to measure the runtime of the bodies of
IMediaFileStore.GetFileInfoAsync(string path)
,GetFileStreamAsync(string path)
,GetFileStreamAsync(IFileStoreEntry fileStoreEntry)
, andIMediaFileStoreCacheFileProvider.IsCachedAsync(string path)
,SetCacheAsync(Stream stream, IFileStoreEntry fileStoreEntry, CancellationToken cancellationToken)
. Nothing was slow enough (at most ~200ms sometimes, but nothing in the order of magnitude of seconds, let alone 10s of seconds).Something seems to throttle requests. Perhaps the lazy workers?
Expected behavior
Static Media files are served within milliseconds of the underlying storage's latency, so for locally cached files <1ms.
The text was updated successfully, but these errors were encountered: