Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues if checking for schedules during every page load #117

Closed
roborourke opened this issue Aug 26, 2022 · 11 comments
Closed

Comments

@roborourke
Copy link
Contributor

It's a common pattern in WP to check for a scheduled job and add it if it doesn't exist e.g.

add_action( 'init', function () {
    if ( ! wp_next_scheduled( 'my_hook' ) ) {
        wp_schedule_event( time(), 'hourly', 'my_hook' );
    }
} );

Typically the list of cron tasks is stored in the options table which is autoloaded and the impact is pretty minimal in terms of performance to do this check on every invocation of WP.

When using Cavalcade we're not able to rely on any persistent object caching for the database queries so there are additional uncached reads on every page load for every event being checked - this is also on a significantly larger db table than just the value from wp_options.

ideally we can solve this performance issue to avoid making multiple db queries for this common pattern.

@roborourke
Copy link
Contributor Author

Could we leave the option in the wp_options table intact, and use that data for checking whether an event is scheduled?

Potentially flaky and not referencing the real source of truth...

@roborourke
Copy link
Contributor Author

Maybe set an autoloaded option for each scheduled event and check for that before hitting the cavalcade jobs table. Could be keyed on hook and args - issue is the timestamp argument. Maybe bypass option check if timestamp isn't null?

@svandragt
Copy link

svandragt commented Aug 30, 2022

Was looking into this for a little bit (distractions). So without Cavalcade, WordPress gets the crons from an option within _get_cron_array() (docs). Cavalcade uses a db query inside the same pre_reschedule_event hook (source and pre_get_scheduled_event so perhaps it should move to a model where it updates the option AFTER the cron event completes? Then it can also use the option at init time, and it'd just have a db call if the option doesn't exist yet.

@roborourke
Copy link
Contributor Author

The option is completely bypassed currently, I'm not sure I follow the benefit or process of updating the option after the cron event completes? Can you elaborate a bit more?

@johnbillion
Copy link
Member

When wp_next_scheduled() is called, could Cavalcade fetch all events with one query (SELECT * FROM cavalcade_jobs) and cache the result for the current page load? This would mean O(1) query per page load instead of O(n). Probably need to invalidate the cache if wp_schedule_event() is called.

@roborourke
Copy link
Contributor Author

The jobs table gets absolutely massive as each run is a new row. Could perhaps do so with a where clause for nextrun > now() at least but yeah - calls to wp_schedule_event() will be an issue.

@johnbillion
Copy link
Member

Ah yeah in that case it would need a clause to only fetch future events. The cron option used by default in WordPress contains all future events so the data stored for future events in the cavalcade_jobs table should be of similar size.

@rmccue
Copy link
Member

rmccue commented Sep 5, 2022

Not sure if I addressed elsewhere, but: Cavalcade's design does intentionally query the database for this data rather than using the object cache (valuing freshness over performance), and database queries are not a problem per se, so we need to make sure any solution is worth the cost. I am assuming from this issue that we've directly identified it as an actual performance bottleneck, but I don't recall seeing the discussion anywhere.

Querying for all pending jobs (potentially deferred to the first wp_next_scheduled() check) makes sense to me, as the marginal query time for getting all jobs isn't much higher than getting one; for the majority of cases, the connection overhead is more likely to dominate than the query time I imagine (for n < 10k).

We may only want to do that for recurring jobs rather than all jobs, where wp_next_scheduled() makes sense to call. This will also cull the size of data down, at the likely cost of two DB queries for a missing job vs currently one (since we'd need to fall back to querying individually if not in the pre-fetched data). May improve the common case nonetheless.

The cron option used by default in WordPress contains all future events so the data stored for future events in the cavalcade_jobs table should be of similar size.

That's an assumption that may not hold; Cavalcade is designed to be able to scale beyond WP's limits, so we wouldn't want to assume the array would still be small.

It's not likely to be at the too-big-for-PHP-memory size, but it could be at the too-big-for-one-query size.

@rmccue
Copy link
Member

rmccue commented Sep 5, 2022

Looks like this was a performance regression ultimately from when we switched to the preflight filters in https://github.com/humanmade/Cavalcade/pull/91/files; previously, we populated the full array once per page load, and the non-persistent cache was then used for follow-ups.

Restoring that functionality would likely fix this issue.

@roborourke
Copy link
Contributor Author

Ah nice. Not totally sure how to go about that with the get_jobs_by_query() implementation... Filtering the array after the actual db query?

@roborourke
Copy link
Contributor Author

I don't think this plugin is the right place for it but I've written up an alternative approach to avoid the problem described here on frontend requests that we could add to Altis Cloud, and that others could just implement if they wish:

Closing this out as the current setup is the best on balance between efficiency and scalability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants