-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce number of DB queries #118
Conversation
Thoughts on this method? I'm seeing multiple db queries on an initial population of the jobs, then a single query for most page views thereafter. |
ba3705b
to
8ac9687
Compare
I'd say it makes more sense to be querying this data within MySQL; I'm not sure why doing this in PHP is preferential? |
Would love to see some benchmarks, as well as peak memory usage with large jobs tables. |
@rmccue this is how I interpreted your comment here: Fetching / populating the full array of jobs once, then querying from the non-persistent cache. I'll try and figure out a way to benchmark this. |
Yes, but I wouldn't necessarily implement that within this function since this is a generic querying function. |
I've updated it to just do the generic query on the |
It's not really a solid benchmark test but using
|
@@ -382,16 +382,39 @@ function pre_get_scheduled_event( $pre, $hook, $args, $timestamp ) { | |||
} | |||
|
|||
$jobs = Job::get_jobs_by_query( [ | |||
'args' => null, | |||
'hook' => null, | |||
'limit' => 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when there's 100 jobs in the table with hook foo
and we're looking for the hook bar
that happens to be at position 101?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering the same but it's a problem we'd see in other situations. I guess that number was chosen because by default we're only looking for the waiting
and running
statuses.
See also:
https://github.com/humanmade/Cavalcade/blob/master/inc/connector/namespace.php#L294-L306
https://github.com/humanmade/Cavalcade/blob/master/inc/connector/namespace.php#L420-L430
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmccue able to comment on this? There is at least a filter so if a site has more than 100 combinations of hooks & args that are upcoming or running you could filter limit via cavalcade.get_jobs_by_query.args
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. Would be curious if there's a way to profile the change before and after but that sounds positive!
@kadamwhite I've done a basic benchmark locally but I'm not completely sure on the best way to do this. Would be good to figure out some test with a baseline performance fixture to compare changes against... Something I need to look into, benchmarking isn't something I see done that often in WP plugin land. |
Keen to get back to this one, especially while we're looking for every performance improvement we can get. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine since I last reviewed. I don't understand the benchmarks without comment, is the first one the current setup, and if so it takes a longer time but processes less requests?
Apart from whether it's an improvement the code is ok.
@svandragt it was the same number of requests for each run, the first one is indeed before making the change so, with this change it completed 100 requests 9 seconds faster. I'll have a look into how we can put some benchmarks into the CI tests. |
Had a chat through with @kovshenin who ran some deeper analysis of the db queries and edge cases we could expect. There's a balance to be had between the indexes used, the number of queries run, and the amount of data a query may return over the network. Front loading all the results in the way this PR does is could hit problems in the following scenarios:
On balance the current approach is the best we'll get. Given the primary problem we're aiming to have an impact on is TTFB for end user page requests there is an alternative approach we could take which would be to use the |
Ah, nuts, had been hoping this would scale.
That makes sense, and feels like a good compromise. |
It is a good compromise, though I'd be careful returning true for all events. Instead maybe have an explicit list of events that you expect to be scheduled at all times, and let the checks for those only run within wp-admin. |
Will note on the Altis Cloud issue I raised here humanmade/altis-cloud#702 |
Fixes #117
The DB query fetches pretty much everything, just varying on statuses and site ID which won't change unless someone needs to query the jobs directly.
The resulting array is then filtered in PHP instead so the trade off is an 0(n) process in PHP rather than additional requests to the db each time
wp_next_scheduled()
is called.