Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update recalc logic to be resumable #1005

Open
wants to merge 2 commits into
base: bugfix
Choose a base branch
from
Open

Conversation

pupi1985
Copy link
Contributor

Here are some notes that should help understand these changes.

Reindexing issues

It is incorrect to use these caches: cache_qcount, cache_acount and cache_ccount to count the amount of posts that need to be reindexed. This is because those caches count the amount of individual posts that are in NORMAL (not HIDDEN and not QUEUED) status. The thing is that a question could be hidden and its answers not. This would lead to the question not being counted in the cache_qcount but their answers being counted in the cache_acount.

I think whether this is an expected behavior or not, falls more on the requirements definition side rather than the bug-fixing side. The real issue is that when it comes to indexing status, a hidden question is not indexed and their children are not indexed either. So using the sum of all cache_[qac]count does not really match the amount of indexed posts.

In order to get the amount of indexed posts, we should use a similar filter to the one it is being used by qa_db_posts_get_for_reindexing(). My revised version took 1-2 seconds to run in a server with almost 3 million posts so I'd say it is fair in the context of the whole reindexing process.

General counting issues

There were a few processes, in addition to the one mentioned above, that were not properly counting the amount of work to be done. I added some additional code to take care of this that is backwards compatible. EG: you can check the changes made to qa_db_count_posts() and how it is being called from the StepRefillEvents class.

Regarding changes in qa_db_users_get_for_recalc_points()

The way of getting batches of elements is based on numeric IDs. This means, if the last processed ID in a batch was 105, the next batch would start searching in 106 (regardless of whether that ID exists). So fetching the next processable ID means just adding an integer.

The problem with user IDs is that they can be strings. So incrementing in one unit just won't work. This was solved by fetching one more record in the batch and starting from there. However, this could have been solved differently: instead of tracking the next processable ID and include it in the next batch, just track the last processed ID and exclude it from the next batch. In pseudo code, this means instead of doing WHERE id >= next_processable_id just do WHERE id > last_processed_id.

This is not relevant with numeric IDs. However, it is for string IDs. This is because it would avoid fetching an extra record in every batch. It is not a big deal, but the qa_db_users_get_for_recalc_points() is constantly fetching an extra record, and even forces the call to the function to have an increased $count, which results in subqueries to fetch additional records that, when added with UNIONs, might result in even more records that will be fetched unnecessarily. So the code of this function (and the call to it) was rewritten in this new way which should save some unnecessary record fetches.

Changes to how blobs are managed

Apparently, there was a bug when incrementing blob ids, which is a step performed when moving blogs to disk or DB. These IDs are BIGINTs in MySQL, which are unsigned 64-bit values. PHP integers are signed. So, even in 64-bit systems, it is not possible to reference half of the possible values and, incrementing the blob id in 1, could trigger an overflow.

So this one was another candidate to be restructured to use the last_processed_id logic as there is no need to increment anything in PHP. And it only requires a minor change to qa_db_get_next_blob_in_db().

Language changes

Yes, there were a lot. I standardized many of the strings to a format that includes the name of the process. I kept the deprecated ones, anyway.

No-JS versions

I've learned there is a qa-include/pages/admin/admin-recalc.php file. Apparently, it is a sort of controller of admin/recalc?dorecountposts=1. It is intended to be a no-js version for recounting posts. The recalc process is not completely integrated to working in this way.

Considering even crawlers in 2024 are using JS by default, I think there is no point in keeping this page alive and even having to maintain the code for browsers with JS disabled. Note there are many other features that rely on JS to be enabled (e.g. voting).

I'm not removing no-JS support from Q2A, but rather just removing it from the recalc process.

Cache trimming/clearing

When fetching the last page of the elements, there was a bug which allowed for more elements to be fetched. Not a big deal, but I fixed it.

I also changed the language from "deleted" to "processed" as not all elements will be deleted (specially for the process that deletes only expired items).

Why this refactor and not the one in 1.9

First of all, this is not (just) a refactor, but rather adding features and fixing bugs. This is mainly adding the ability to resume processes that might take one or two days to finish in big sites and they always fail, for whatever reason, without a way to resume them.

Secondly, the refactor in 1.9, even though it is quite OOP, still takes care of the state as a string, which ends up being the heart of the whole recalc solution. I've removed the state as such, and kept track of it server-side and with the appropriate data types. Also, in 1.9, even though there are OOP classes that handle most of the logic of the steps, there are also huge case statements (

private function stageLength()
) that break the modularity, or even IF statements referencing steps in generic transitions (
$this->next = (QA_FINAL_EXTERNAL_USERS && ($newOperation == 'dorecalcpoints_recalc')) ? '' : 0;
).

Lastly, the process with steps architecture I suggest, I think it reflects real life better: there exists processes, with easy-to-configure steps, and each step has a setup phase to prepare and provide information of what's going to happen, and then the execution phase, in which it executes as many times as needed until it finishes.

In conclusion, I think this approach is better than the original recalc approach and also the "mechanical" refactor that currently exists in 1.9

Random comments

  • I changed the UI without changing CSS (just a bit that was not needed anymore).
  • Whenever a recalculation is needed (e.g. after changing hotness values), it is no longer triggered in the generated page, but rather, a warning is shown asking the user to execute the process manually (same as the database initialization)
  • The only exception to the previous item is the caching recalcs, which had the UI refreshed, but was kept where it was as it seemed more practical to be there
  • A not-complex next step could be automatically resuming processes after getting a bad HTTP response
  • Some screenshots:
  • Caching:
    20240812_131002
  • Recalculation needed warning:
    20240812_131132
  • New admin/stats section
    20240812_132000
  • Sorry, I know it's a lot of code 😩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant