Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: created a new CLI cmd to backfill missing author terms for posts. #1060

Merged
merged 6 commits into from
Oct 9, 2024

Conversation

eddiesshop
Copy link
Contributor

Description

This is a new, more efficient, command to help backfill author term data for any posts that are missing it. The new command was based off of the old one. The key differences between that command and this one are:

  1. If, for some reason, the command were to get killed while it was backfilling data, re-running wp co-authors-plus create-terms-for-posts would always start back from the first post. For sites of a certain stature, this could create a situation where this command would never finish executing. Instead, this new command only looks for posts that are missing the requisite author term in the relationships table via SQL.
  2. Since only posts that are missing the author term are pulled, once the missing data is created, any subsequent command execution means that you will be processing a smaller and smaller list of posts that need to be addressed, unlike the existing command.

Deploy Notes

Are there any new dependencies added that should be taken into account when deploying to WordPress.org?
No, this is an entirely new command.

Steps to Test

  1. Check out PR.
  2. Take a DB backup.
  3. Execute this command with wp co-authors-plus create-author-terms-for-posts
  4. Once complete, run the old command: wp co-authors-plus create-terms-for-posts. Notice that no new author terms are being created.

@leogermani
Copy link
Contributor

@GaryJones do you know why we have these failing tests here? I see that we no longer support php 7.1 (

* Requires PHP: 7.4
) and for some reason these integration tests are not running in other open PRs

@GaryJones
Copy link
Contributor

do you know why we have these failing tests here?

Since https://github.com/Automattic/Co-Authors-Plus/blob/develop/.github/workflows/integrate.yml use PHP 7.4, it looks like this PR branch should be targetted to develop (not main, since that's our production release branch).

@GaryJones GaryJones removed their request for review October 8, 2024 09:05
@leogermani leogermani changed the base branch from main to develop October 8, 2024 11:46
Copy link
Contributor

@leogermani leogermani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks and tests well.

Can we just add a comment to the command description highlighting the difference between this and the old command?

I also think we could add a comment in the old command, saying that we should give preference to this new and more robust version of it.

And a NIT, not blocking: since the batching occurs on run time, there is no real difference from a user perspective running the command with or without it, from a functional point of view. The only thing that can happen is an OOM if you run without batching... In other words, there is no reason to have an option to run this command without doing batches. Is there? We could leave just the option for the batch size and always run in batches... WDYT?:

The comments are meant to clarify the key differences between the two commands, and that the new one should be preferred over the old one.
@eddiesshop
Copy link
Contributor Author

Can we just add a comment to the command description highlighting the difference between this and the old command?

I also think we could add a comment in the old command, saying that we should give preference to this new and more robust version of it.

Great idea! Done here.

And a NIT, not blocking: since the batching occurs on run time, there is no real difference from a user perspective running the command with or without it, from a functional point of view. The only thing that can happen is an OOM if you run without batching... In other words, there is no reason to have an option to run this command without doing batches. Is there? We could leave just the option for the batch size and always run in batches... WDYT?:

That batching approach really comes from a L&I point of view. Often times, our migration commands can be interrupted due to memory constraints or we can cause MySQL binlog replication issues. Although not a perfect approach, tackling a certain amount of records at a time usually mitigates those problems. I don't mind removing it though, since the command should operate just fine if it were restarted due to an interruption during execution.

@leogermani
Copy link
Contributor

That batching approach really comes from a L&I point of view. Often times, our migration commands can be interrupted due to memory constraints or we can cause MySQL binlog replication issues. Although not a perfect approach, tackling a certain amount of records at a time usually mitigates those problems. I don't mind removing it though, since the command should operate just fine if it were restarted due to an interruption during execution.

I don't mean removing the batching approach. I mean making it the default and only approach. There is no benefit in not using it

@eddiesshop
Copy link
Contributor Author

I obviously need some more coffee today!

I think we shouldn't remove the batched flag, but we should make it true by default. That way, if anyone wants to let it rip, they have that option.

@leogermani leogermani merged commit d5f9404 into develop Oct 9, 2024
13 of 15 checks passed
@leogermani leogermani deleted the add/author_term_backfill_command branch October 9, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants