You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Posted full snippet, but only email is relevant here.)
It works for small subscriber lists. For ~6,500 subscribers and more, however, it seems that Faker no longer generates unique email addresses. Since MailPoet requires the email column to be UNIQUE, importing an anonymized database will fail with error Duplicate entry '[email protected]' for key 'email'.
In fact, for the ~6,500 entities, Faker seems generate ~10 email addresses twice according to the error messages.
That reduced the number of duplicates, but there were still some. Even updating Faker to v1.8 (that introduced more German email providers, fzaninotto/Faker#1320, see #25) did not solved it (and is no solution for other languages). And even if an export would run without creating duplicates, we can't say for sure that it will work for larger records.
I'm not entirely sure why Faker still creates duplicates despite the unique() call, but I think it might be related to the fact that the plugin is bootstrapped every time admin-ajax.php is called, which also reinitializes Faker every time. If it's true, I have no idea how to deal with this. @polevaultweb, do you?
Thanks!
The text was updated successfully, but these errors were encountered:
The way this problem - in general, not just for emails - is handled by Anonimatron is to maintain a list of synonyms in a separate file. The synonyms file consists of a mapping from input production data to anonymized output data. This synonyms file should be treated as sensitive production data.
The big advantage of the synonyms file is that it allows consistency across tables, and it allow one to maintain anonymized test names across multiple anonymizations, which can be very helpful for the non production QA team.
It'd be a simple check to see if the generated email is in the synonyms file.... so a non elegant fix would be, if the generated email is in the synonyms file, rerun the faker until it comes up with a unique email that is not in the synonyms file.
We anonymize MailPoet subscribers like this:
(Posted full snippet, but only
email
is relevant here.)It works for small subscriber lists. For ~6,500 subscribers and more, however, it seems that
Faker
no longer generates unique email addresses. Since MailPoet requires theemail
column to beUNIQUE
, importing an anonymized database will fail with errorDuplicate entry '[email protected]' for key 'email'
.In fact, for the ~6,500 entities,
Faker
seems generate ~10 email addresses twice according to the error messages.So I checked
Faker
's Modifiers (https://github.com/fzaninotto/Faker/#modifiers) and changed for testingwp-migrate-db-anonymization/includes/Config/Rule.php
Line 136 in cca4ad8
to
(it could be simplified by always using
unique()
but I'm not sure if this might have unwanted side effects):That reduced the number of duplicates, but there were still some. Even updating
Faker
to v1.8 (that introduced more German email providers, fzaninotto/Faker#1320, see #25) did not solved it (and is no solution for other languages). And even if an export would run without creating duplicates, we can't say for sure that it will work for larger records.I'm not entirely sure why
Faker
still creates duplicates despite theunique()
call, but I think it might be related to the fact that the plugin is bootstrapped every timeadmin-ajax.php
is called, which also reinitializesFaker
every time. If it's true, I have no idea how to deal with this. @polevaultweb, do you?Thanks!
The text was updated successfully, but these errors were encountered: