Remove recursive locking from variable caching #52
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current Pressflow6 has a recursive call to variables_init trying to obtain a lock. It has a bailout at 50. But when high burst traffic is received by a site, this recursion causes very long response times on sites that do a lot of variable setting. Each recursive call has a 30sec timeout. Adding a watchdog to our site shows recursion levels >30 deep during some bot bursts. Since this is called at shutdown, it causes many worker threads to stay busy and ultimately causing 503 varnish timeout errors on a site.
The attempt of this patch is to add some ideas from the following Drupal.org threads: http://drupal.org/node/973436 , http://drupal.org/node/1595070. These threads have interesting discussion of the problem, but tend to get sidetracked into D8 implementation. Because this is a very serious problem for a couple of our sites, I wanted to bring a more practical solution back to the table for Pressflow users. Porting this patch to Drupal7 should be straight-forward.
My approach is to completely remove the lock recursion. After a variable_set or variable_del, a static variable is marked, and at shutdown it will attempt to acquire a lock to write-thru the Drupal cache. However, if the lock cannot be easily acquired, it simply clears the cache. This potentially adds delay to the next request where variable_init will need to rebuild the cache, but that response time is better capped them any sort of recursion lock delay.
I've done jmeter benchmarks on a site with and without this patch and on normal sites I don't see any noticeable differences. However if I construct a test page that does a lot of variable_sets, I find the performance of that page to be significantly better with this patch (factor of 2 speed boost) and I no longer get any busy/locked worker threads and no longer experience the very long response times that led to 503 errors.