Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates Unicode data files in a background task #7581

Merged

Conversation

Sesquipedalian
Copy link
Member

@Sesquipedalian Sesquipedalian commented Oct 27, 2022

Based on an idea that came up in previous discussion, this PR refactors the code from update_unicode_data.php into a background task so that SMF can update its Unicode data files itself.

Pros:

  • There is no longer any need to include potentially bulky amounts of changed Unicode data in SMF patch files.
  • Forums can keep up to date with the latest version of Unicode automatically and without waiting for the next SMF patch release.
  • Less work for us.

Cons:

  • ... 🤔 ... 🤔 ... 🤔 ...... nope, nothing comes to mind.

Testing

I've tried everything I can think of to break this code, and it just keeps working no matter what I do.
So now it is your turn! 😉

  1. Checkout this PR (obviously).
  2. In the admin panel, run the Weekly Maintenance scheduled task.
  3. Wait for a minute or so in order to allow the background task plenty of time to complete.
  4. Inspect the content of ./Sources/Unicode/Metadata.php to verify that SMF_UNICODE_VERSION has been updated to 15.1.0.0.

Notes

  1. The update_unicode_data.php file still exists, but it now just impersonates cron.php in order to load and run the Update_Unicode background task on demand.
  2. Although patches no longer need to include any updates to Unicode data, the install and upgrade packages should, so update_unicode_data.php should still be run as part of preparing a new release.

@Sesquipedalian
Copy link
Member Author

I figured out a way to make it even more proof against timeouts. Not that I have actually encountered any timeouts in testing, but extra defences don't hurt.

Signed-off-by: Jon Stovell <[email protected]>
@Sesquipedalian
Copy link
Member Author

Latest commit adds compatibility with #7574.

Signed-off-by: Jon Stovell <[email protected]>

# Conflicts:
#	other/update_unicode_data.php
@Sesquipedalian
Copy link
Member Author

I would love to see this merged before the next version of Unicode is released in September. Otherwise we will have to make another huge patch. (And then another the next September, and again the next, and the next, and the next...)

@Sesquipedalian
Copy link
Member Author

Unicode 15.1 has been released now. I really, really, really don't think we should put the raw data directly in a patch like we did last time. So, could someone please review this? The release of a new version of Unicode actually makes testing easier; I've updated the instructions in the original post accordingly.

@jdarwood007
Copy link
Member

My only concern is that it appears to write to the sources directory. We have instructed forums in the past that they can keep that readonly except for updates, which we supported ftp to make it writeable.
Is there a workaround we can do?

@Sesquipedalian
Copy link
Member Author

I could add a notification in the error log if any of the Sources/Unicode/*.php files are not writable, with instructions about how to fix it. Then the admin can just follow those instructions to make them writable and let the update run again.

@jdarwood007
Copy link
Member

That seems to be the easiest option at least for 2.1.

@Sesquipedalian
Copy link
Member Author

I didn't actually give instructions about how to make files writable, because that's too long for an error message. But this should do the job.

@Sesquipedalian Sesquipedalian merged commit b422e7a into SimpleMachines:release-2.1 Sep 21, 2023
8 checks passed
@Sesquipedalian Sesquipedalian deleted the unicode_auto_update branch September 21, 2023 02:08
@sbulen
Copy link
Contributor

sbulen commented Sep 21, 2023

Hmmm.... As noted earlier, I did not want to do this in this fashion.

Our pacman approach has served us well for quite some time. I see no reason to deviate from it.

The cons, as I see them, are that any changes aren't backed out if a patch causes problems. It assumes the code & all changes are perfect & never need to be backed out via our normal pacman method.

So now we have two paths to changes, and no path for reverting these particular changes should issues arise.

@sbulen
Copy link
Contributor

sbulen commented Sep 21, 2023

I thought this was clear in the last discussion.

Now I will need to consider reverting this...

@Sesquipedalian
Copy link
Member Author

You are incorrect in that regard. These are data files, pure and simple. The inputs are extremely well documented and stable, and all this does is process the data into a more convenient format for PHP usage.

Seriously, look at the data files. They contain nothing but arrays of strings. There is no logic in them.

@Sesquipedalian
Copy link
Member Author

Sesquipedalian commented Sep 21, 2023

You are also incorrect to think that the updated Unicode data should be revertable. It emphatically should not. Not ever. Never.

Why? Security. If a forum updates to a new version of Unicode and starts accepting characters that were added in that versions, but then downgrades to a previous version of Unicode, those characters become broken, illegal data that SMF no longer expects to be present in user supplied strings. A carefully crafted attack could definitely exploit this. The Unicode annex reports about security talk about this issue explicitly, and say that Unicode versions should never be downgraded after they've been updated.

In case it's not clear, I say this all with the utmost respect and the friendliest of tones, but I will go to mat on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping SMF code reorganization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants