Performance issues if checking for schedules during every page load #117

roborourke · 2022-08-26T15:15:52Z

It's a common pattern in WP to check for a scheduled job and add it if it doesn't exist e.g.

add_action( 'init', function () {
    if ( ! wp_next_scheduled( 'my_hook' ) ) {
        wp_schedule_event( time(), 'hourly', 'my_hook' );
    }
} );

Typically the list of cron tasks is stored in the options table which is autoloaded and the impact is pretty minimal in terms of performance to do this check on every invocation of WP.

When using Cavalcade we're not able to rely on any persistent object caching for the database queries so there are additional uncached reads on every page load for every event being checked - this is also on a significantly larger db table than just the value from wp_options.

ideally we can solve this performance issue to avoid making multiple db queries for this common pattern.

The text was updated successfully, but these errors were encountered:

roborourke · 2022-08-26T15:53:13Z

Could we leave the option in the wp_options table intact, and use that data for checking whether an event is scheduled?

Potentially flaky and not referencing the real source of truth...

roborourke · 2022-08-26T16:12:56Z

Maybe set an autoloaded option for each scheduled event and check for that before hitting the cavalcade jobs table. Could be keyed on hook and args - issue is the timestamp argument. Maybe bypass option check if timestamp isn't null?

svandragt · 2022-08-30T15:21:09Z

Was looking into this for a little bit (distractions). So without Cavalcade, WordPress gets the crons from an option within _get_cron_array() (docs). Cavalcade uses a db query inside the same pre_reschedule_event hook (source and pre_get_scheduled_event so perhaps it should move to a model where it updates the option AFTER the cron event completes? Then it can also use the option at init time, and it'd just have a db call if the option doesn't exist yet.

roborourke · 2022-09-02T11:41:25Z

The option is completely bypassed currently, I'm not sure I follow the benefit or process of updating the option after the cron event completes? Can you elaborate a bit more?

johnbillion · 2022-09-02T11:54:15Z

When wp_next_scheduled() is called, could Cavalcade fetch all events with one query (SELECT * FROM cavalcade_jobs) and cache the result for the current page load? This would mean O(1) query per page load instead of O(n). Probably need to invalidate the cache if wp_schedule_event() is called.

roborourke · 2022-09-02T12:03:24Z

The jobs table gets absolutely massive as each run is a new row. Could perhaps do so with a where clause for nextrun > now() at least but yeah - calls to wp_schedule_event() will be an issue.

johnbillion · 2022-09-02T12:07:13Z

Ah yeah in that case it would need a clause to only fetch future events. The cron option used by default in WordPress contains all future events so the data stored for future events in the cavalcade_jobs table should be of similar size.

rmccue · 2022-09-05T17:23:05Z

Not sure if I addressed elsewhere, but: Cavalcade's design does intentionally query the database for this data rather than using the object cache (valuing freshness over performance), and database queries are not a problem per se, so we need to make sure any solution is worth the cost. I am assuming from this issue that we've directly identified it as an actual performance bottleneck, but I don't recall seeing the discussion anywhere.

Querying for all pending jobs (potentially deferred to the first wp_next_scheduled() check) makes sense to me, as the marginal query time for getting all jobs isn't much higher than getting one; for the majority of cases, the connection overhead is more likely to dominate than the query time I imagine (for n < 10k).

We may only want to do that for recurring jobs rather than all jobs, where wp_next_scheduled() makes sense to call. This will also cull the size of data down, at the likely cost of two DB queries for a missing job vs currently one (since we'd need to fall back to querying individually if not in the pre-fetched data). May improve the common case nonetheless.

The cron option used by default in WordPress contains all future events so the data stored for future events in the cavalcade_jobs table should be of similar size.

That's an assumption that may not hold; Cavalcade is designed to be able to scale beyond WP's limits, so we wouldn't want to assume the array would still be small.

It's not likely to be at the too-big-for-PHP-memory size, but it could be at the too-big-for-one-query size.

rmccue · 2022-09-05T17:35:34Z

Looks like this was a performance regression ultimately from when we switched to the preflight filters in https://github.com/humanmade/Cavalcade/pull/91/files; previously, we populated the full array once per page load, and the non-persistent cache was then used for follow-ups.

Restoring that functionality would likely fix this issue.

roborourke · 2022-09-06T13:01:01Z

Ah nice. Not totally sure how to go about that with the get_jobs_by_query() implementation... Filtering the array after the actual db query?

roborourke · 2022-11-22T11:59:06Z

I don't think this plugin is the right place for it but I've written up an alternative approach to avoid the problem described here on frontend requests that we could add to Altis Cloud, and that others could just implement if they wish:

Avoid Cavalcade lookups on front-end requests altis-cloud#702

Closing this out as the current setup is the best on balance between efficiency and scalability.

roborourke mentioned this issue Sep 7, 2022

Reduce number of DB queries #118

Closed

roborourke mentioned this issue Nov 22, 2022

Avoid Cavalcade lookups on front-end requests humanmade/altis-cloud#702

Open

roborourke closed this as completed Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues if checking for schedules during every page load #117

Performance issues if checking for schedules during every page load #117

roborourke commented Aug 26, 2022

roborourke commented Aug 26, 2022

roborourke commented Aug 26, 2022

svandragt commented Aug 30, 2022 •

edited

Loading

roborourke commented Sep 2, 2022

johnbillion commented Sep 2, 2022

roborourke commented Sep 2, 2022

johnbillion commented Sep 2, 2022

rmccue commented Sep 5, 2022

rmccue commented Sep 5, 2022

roborourke commented Sep 6, 2022

roborourke commented Nov 22, 2022

Performance issues if checking for schedules during every page load #117

Performance issues if checking for schedules during every page load #117

Comments

roborourke commented Aug 26, 2022

roborourke commented Aug 26, 2022

roborourke commented Aug 26, 2022

svandragt commented Aug 30, 2022 • edited Loading

roborourke commented Sep 2, 2022

johnbillion commented Sep 2, 2022

roborourke commented Sep 2, 2022

johnbillion commented Sep 2, 2022

rmccue commented Sep 5, 2022

rmccue commented Sep 5, 2022

roborourke commented Sep 6, 2022

roborourke commented Nov 22, 2022

svandragt commented Aug 30, 2022 •

edited

Loading