Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utfmb4 database #5108

Closed
Underdog-01 opened this issue Nov 1, 2018 · 22 comments · May be fixed by #6409
Closed

utfmb4 database #5108

Underdog-01 opened this issue Nov 1, 2018 · 22 comments · May be fixed by #6409
Labels
Milestone

Comments

@Underdog-01
Copy link
Contributor

Underdog-01 commented Nov 1, 2018

Description

With a database that has been installed as ie. urf8mb4 the SMF 2.1 installation creates all tables as 3-Byte UTF8 and does not conform to the default database setting/structure.

Steps to reproduce

  1. Create utf8mb4_unicode_ci database
  2. Change database type in Settings.php file to utf8mb4 (set $db_mb4 variable as true)
  3. Run the SMF 2.1 installer
  4. Use a database client (ie. phpmyadmin) to view table structure of the database tables that SMF 2.1 created

Environment (complete as necessary)

  • Version/Git revision: Current
  • Database Type: MySQL / MariaDB
  • Database Version: MySQL 8.0.12 / MariaDB 10.3.10
  • PHP Versions: PHP 5.6 - PHP 7.2.11

Additional information/references

The settings variable should have the behavior of forcing the SMF 2.1 installer to create tables as utf8mb4. In my opinion it is possible to read the database default value and use that instead of relying on the variable in Settings.php. In either case the installation process fails to create the tables as utf8mb4.

@albertlast
Copy link
Collaborator

The setup process didn't read the settings file.

The user journey behind this setting is:

  • that you install your board in mysql with mb3
  • than change by using phpmyadmin all tables to mb4
  • change the settings to mb4
  • run the entity converter

From my side(i'm the guy how introduce the mb4 option in settings) is actual implementation enough.

@Underdog-01
Copy link
Contributor Author

In my opinion it would be better to install as 4-byte initially where the database default setting is mb4.
The entity converter is a great option either way but I would implement adhering to the db default.

What about mod installation behavior?
If the forum has already been converted to mb4 & a mod uses smcFunc to create its tables, are those tables initially mb4 or must the entity converter be run yet again?

@albertlast
Copy link
Collaborator

Since this feature only sugar,
without any needs to be ther (like you self notice you can safe mb4 information in a mb3 enviorment when you convert it)
When you mean it introdruce to many issue,
we can removed and you had to use postgres as rdbms when you realy want a mb4 env.

@Underdog-01
Copy link
Contributor Author

Underdog-01 commented Nov 1, 2018

Actually I see a problem with sorting when it uses html entities instead of the proper character code only native to utf8mb4 unicode. Database queries will not be sorted properly when dealing with ie. names, topic titles, etc. when those use 4-byte characters that have been converted to 3-byte decimal entities.

@albertlast
Copy link
Collaborator

this is the mysql way of live.

@Underdog-01
Copy link
Contributor Author

This is perhaps the pre - 2010 mysql way of life.

@MissAllSunday
Copy link
Contributor

@albertlast do you think this is a valid issue? Why do you think installing in mb4 when the database is already set to it isn't a feasible feature to add?

@albertlast
Copy link
Collaborator

Nope,
in my eyes in the existing solution enough.

@MissAllSunday
Copy link
Contributor

Cool, next question would be, how common is to encounter an already created DB set to mb4/ full utf8?

If this is something we can easily detect when installing/upgrading and SMF runs smoothly on either 3 or 4 bytes then I would prefer to add support for full utf8 when the db is already ready for it.

@albertlast
Copy link
Collaborator

from my pov,
you will see this on release.

when realy, something could be added later on.

@sbulen
Copy link
Contributor

sbulen commented Sep 17, 2019

I would bet that virtually 100% of all testing thus far has been in utf8, not utf8mb4, since our install script hard-codes to utf8:
https://github.com/SimpleMachines/SMF2.1/blob/e5b77f4133d81524fda48ad5ad7452ce7032fd77/other/install.php#L1177

utf8mb4 really is the way to go. In 2.1.x, I think we should mirror how we supported utf8 in 2.0.x. :

  • allow a utf8mb4 option upon install
  • provide a utf8mb4 conversion utility under forum maintenance (for upgraded forums)
  • provide an entities conversion utility

The problem is it's completely untested. So the question is WHEN...

I recommend this be in the Final release at the earliest... Maybe even a follow on.

@MissAllSunday
Copy link
Contributor

Yep, I also think we should go with mb4 and if there is an easy way for us to take advantage of an already set up DB then we should use it.

So yeah, at minimum we should offer support for installing SMF on mb4 when its available.

@MissAllSunday MissAllSunday added this to the Final milestone Sep 23, 2019
@chopinesque
Copy link

@albertlast
Copy link
Collaborator

smf 2.1 support php 5.3 and mysql 5.0,
both thinks what drupal doesn't do... they are smart.

@live627
Copy link
Contributor

live627 commented Jan 6, 2020

this requires MySQL 5.5.3 or newer

https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql

TL;DR: Four byte encoding instead of three. Good for emoji.

I vote to wait for 2.2. Or the next big version.

@albertlast
Copy link
Collaborator

Big project like drupal can do it,
but small project like smf cant...

@MissAllSunday
Copy link
Contributor

I don't mind lifting the mysql version to 5.5.x if thats what's needed to get full utf8 support. This is something we will be seen a lot of support for in the next years so having it on 2.1 seems like a good thing to do to avoid all those support requests later.

@albertlast
Copy link
Collaborator

The issue is that you call this 2.1 instead of 3 and
hurt yourself with this idea.
because this limit you.
by looking at the amount of changes is 2.1 not 2.1 any more.

@MissAllSunday
Copy link
Contributor

Nope. This is 2.1. and 2.1 is feature freeze as far as I ca see, however, I'm inclined to add support for true utf8 on mysql since it will be something very frequent in the coming years if not already.

@live627
Copy link
Contributor

live627 commented Jan 6, 2020

having it on 2.1 seems like a good thing to do to avoid all those support requests later.

oh yes we'll need to address... wait

2.0 has mb4 support form simply using $smcFunc['htmlspecialchars'] (I think 2.1 does as well, not sure)

@MissAllSunday
Copy link
Contributor

Indeed, what this issue reports is adding support for those DBs that are already created utf8mb4. On those DBs we should use whats already there for us to use.

Using htmlspecialchars to convert to entities is a nice workaround but not something we should rely on.

@MissAllSunday
Copy link
Contributor

#6409

@sbulen sbulen modified the milestones: RC4, 2.1.5 Jul 15, 2023
@sbulen sbulen mentioned this issue Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants