Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: added migration script for migration of historic data #3299

Merged
merged 19 commits into from
Sep 25, 2024

Conversation

RafaelJohn9
Copy link
Contributor

@RafaelJohn9 RafaelJohn9 commented Aug 16, 2024

Fix #2196

It migrates data from the previous years up to (2020/1/ 7, 9:25:11:829125)

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Left a few comments below.

pontoon/actionlog/migrations/0004_auto_20240816_1045.py Outdated Show resolved Hide resolved
pontoon/actionlog/migrations/0004_auto_20240816_1045.py Outdated Show resolved Hide resolved

for translation in Translation.objects.filter(date__lt=end_date):
for attr, action_type in traslation_info.items():
if getattr(translation, attr) is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check that any date is before end_date in order to avoid duplicate entries. It could be that translation was created before we started using ActionLog, but it could have been approved afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

researched on this and saw that django.db.models.Q is a good way to do this
You'll see it in my latest update
Let me know what you think about it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Translation.date is before end_date and Translation.approved_date isn't, we'll still create the (duplicate) ActionLog item.

It should be sufficient to check Translation.date only in the QuerySet (as we did before) and then make sure that getattr(translation, attr) is before end_date.

@RafaelJohn9
Copy link
Contributor Author

Hello @mathjazz I was trying a way not to hardcode the actiontypes but right now they are just leading to Attribute errors. Is there a way to achieve this or should I just hardcode the action types

| Q(unrejected_date__lt=end_date)
)

for translation in translations:
for attr, action_type in traslation_info.items():
if getattr(translation, attr) is not None:
actions_to_log.append(
ActionLog(
action_type=action_type,
created_at=translation.date,
Copy link
Contributor Author

@RafaelJohn9 RafaelJohn9 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 42 I also noticed that since this is actionlog the creation date is not neccessarilly the creation of the translation date but it varies with the action type:

  • approved date
  • unapproved date
  • rejected date
  • unrejected date
  • creation date

Should I also change this to match that?

Copy link
Collaborator

@mathjazz mathjazz Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, created_at should indeed be the corresponding action date.

@RafaelJohn9
Copy link
Contributor Author

Hello @mathjazz made the changes that you recommended, Looking forward to your review

Copy link
Contributor Author

@RafaelJohn9 RafaelJohn9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   traslation_info = {
    translation_info = {

Would've gone unnotice for centuries and beyond 😅 thanks for that... Vs code auto giving variables made me miss that, might use vim next time 😅

@RafaelJohn9
Copy link
Contributor Author

Fix changes you requested :)

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, last two comments, then we're ready to test this with real data.

pontoon/actionlog/migrations/0004_auto_20240816_1045.py Outdated Show resolved Hide resolved
pontoon/actionlog/migrations/0004_auto_20240816_1045.py Outdated Show resolved Hide resolved
@RafaelJohn9
Copy link
Contributor Author

done 👍

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the code on stage and it turns out it was too slow to actually complete.

Inline I provided an idea how to improve performance and another edge case that we need to take into account.

pontoon/actionlog/migrations/0004_auto_20240816_1045.py Outdated Show resolved Hide resolved
pontoon/actionlog/migrations/0004_auto_20240816_1045.py Outdated Show resolved Hide resolved
@RafaelJohn9
Copy link
Contributor Author

Fixed them 👍

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to optimize the code further, because it still doesn't complete. I have some ideas and I'll do a bit of testing on stage first, so we don't run in circles for too long.

date = getattr(translation, attr)
user_id = getattr(translation, action_user)

# Skip logging 'translation:approved' if approved_date is the same as the date
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Comments are more useful when they describe the whys, not the whats.

This comment just says what the code does in an English language, but it's not clear why we are not logging such actions. Better comments would be something along the lines of If approved_date and date values are the same, translation was directly submitted as approved, in which case we only log translation:created action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I agree that comments should be more focused on explaining the reasoning behind the code rather than just describing its functionality.

I’ll update the comments to clarify why we handle logging in this way. It should make the code’s intent clearer and provide better context for future readers.

Thanks again for pointing this out!

@RafaelJohn9
Copy link
Contributor Author

We'll need to optimize the code further, because it still doesn't complete. I have some ideas and I'll do a bit of testing on stage first, so we don't run in circles for too long.

I was thinking a potential reason why it may be slow is due to creating the objects in bulk at the end instead of single processing each? Though this is just a theory, I'll have to prove it first, then I get back to you

@flodolo
Copy link
Collaborator

flodolo commented Aug 29, 2024

We'll need to optimize the code further, because it still doesn't complete.

I'm curious: what is killing the migration? Is it a setting on Heroku, a setting in Django? Can it be overridden?

@mathjazz
Copy link
Collaborator

mathjazz commented Aug 29, 2024

I'm curious: what is killing the migration? Is it a setting on Heroku, a setting in Django? Can it be overridden?

Could be a setting, but I don't think this is the right approach, at least not right now.

It took at least 30 minutes before the process was killed, and that's with bulk_create() action commented out the. Such long migrations are risky and most likely require app downtime.

We should try to make the code faster first.

),
}

# date to end migration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# date to end migration
# Timestamp of the first Translation created after the introduction of ActionLog

pontoon/actionlog/migrations/0004_auto_20240816_1045.py Outdated Show resolved Hide resolved
@mathjazz
Copy link
Collaborator

mathjazz commented Sep 24, 2024

@RafaelJohn9 r? 7ac05b9

Without this change, create_at was always set to "now", i.e. to the date and time when the migration was executed:
https://docs.djangoproject.com/en/5.1/ref/models/fields/#django.db.models.DateField.auto_now_add

Copy link
Contributor Author

@RafaelJohn9 RafaelJohn9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this fixes it 👍

Copy link
Contributor Author

@RafaelJohn9 RafaelJohn9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+

@mathjazz mathjazz merged commit 82a2f4b into mozilla:main Sep 25, 2024
2 checks passed
@RafaelJohn9 RafaelJohn9 deleted the migrate_historic_data branch September 25, 2024 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate historic data into ActionLog table
3 participants