Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mailchimp upsert contacts/update tags in batch #124

Merged
merged 1 commit into from
Jul 13, 2023

Conversation

VSydor
Copy link
Contributor

@VSydor VSydor commented Apr 19, 2023

No description provided.

@VSydor VSydor requested a review from brmeyer April 19, 2023 15:55
@VSydor VSydor force-pushed the feature/mailchimp-batch-upsert branch 3 times, most recently from 7b5250e to 5c540d1 Compare April 19, 2023 17:21
@@ -32,6 +47,10 @@ public class MailchimpEmailService extends AbstractEmailService {

private static final Logger log = LogManager.getLogger(MailchimpEmailService.class);

private static final Integer BATCH_REQUEST_OPERATIONS_SIZE = 50;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me on how we landed on 50 as the size? MC API limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - just decided to go with 50. Can be any reasonable value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you've seen any defined limits in the doc? If not, 50 is still a gigantic improvement, but curious how high we can crank this up...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to example described here:
https://mailchimp.com/developer/marketing/guides/run-async-requests-batch-endpoint/

It is possible to have 1000 records in single batch (!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I would still use a smaller amount as a batch size so that we don't wait too long till contacts are inserted/updated before processing the tags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 500 as a batch size (replicated from GetMembers method)

@@ -222,6 +383,63 @@ protected Map<String, Object> getCustomFields(String listId, CrmContact crmConta
return customFieldMap;
}

protected Map<String, Map<String, Object>> getCustomFields(String listId, List<CrmContact> crmContacts, MailchimpClient mailchimpClient,
EnvironmentConfig.EmailPlatform mailchimpConfig) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there something that made this additional method necessary, instead of calling the original getCustomFields from a loop?

Copy link
Contributor Author

@VSydor VSydor May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to create custom fields for all contacts at once (instead of for each contact separately).
But please let me know if I should switch to original method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I just mean most of this code (and the one in the other comment I made) duplicates the original, right? Helper methods might DRY it up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely - the general idea was to adopt existing methods to batch-like processing (ex, get all tags in 1 shot)
and then remove original ones (after/during code-review phase).
Please let me know which original ones you want to keep (getCustomFields?) and I will clean it all up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched back to original method (wrapped in for-each)

List<CrmContact> crmContacts,
Map<String, List<String>> activeTags,
Map<String, List<String>> inactiveTags,
MailchimpClient mailchimpClient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto -- was this additional method needed, as opposed to using the original updateTags called from a loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to call update-in-batch for all contacts at once (instead of many update-tags calls)

@brmeyer
Copy link
Contributor

brmeyer commented Apr 28, 2023

Functionally, looks great! Just some design pattern questions. I'm going to give this a test run on Save the Storks to see how it handles a sync of 55k+...

@VSydor VSydor force-pushed the feature/mailchimp-batch-upsert branch 6 times, most recently from 2c730ca to 03498d5 Compare May 2, 2023 13:56
@VSydor VSydor requested a review from brmeyer May 2, 2023 14:00
@brmeyer
Copy link
Contributor

brmeyer commented May 5, 2023

@VSydor Hey! This is working well so far -- giving it a shot with our own HubSpot->Mailchimp sync. One thing I noticed that isn't necessarily hurting anything, but would be good to fix:

[WARN ] 2023-05-04 22:24:22.316 [Thread-13] MailchimpEmailService - Failed to deserialize batch operations! Cannot deserialize value of type `java.lang.String` from Object value (token `JsonToken.START_OBJECT`)
 at [Source: (String)"{"status_code":200,"response":{"list_id":"9c168e9869","_links":[{"targetSchema":"https://us1.api.mailchimp.com/schema/3.0/Definitions/Lists/Members/Response.json","method":"GET","rel":"self","href":"https://us1.api.mailchimp.com/3.0/lists/9c168e9869/members/1ddf93db4d70f42a3a78cf49e3f26fd2"},{"schema":"https://us1.api.mailchimp.com/schema/3.0/Paths/Lists/Members/Collection.json","targetSchema":"https://us1.api.mailchimp.com/schema/3.0/Definitions/Lists/Members/CollectionResponse.json","method":""[truncated 3064 chars]; line: 1, column: 2578] (through reference chain: com.impactupgrade.nucleus.client.MailchimpClient$BatchOperation["response"]->com.impactupgrade.nucleus.client.MailchimpClient$BatchOperationResponse["merge_fields"]->java.util.LinkedHashMap["ADDRESS"])

Integer newAttemptCount = attemptCount + 1;
batchOperations = getBatchOperations(mailchimpClient, mailchimpConfig, batchStatusId, newAttemptCount);
} else {
log.info("Batch '{}' finished! (finished/total) {}/{}", batchStatus.finished_operations, batchStatus.total_operations);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VSydor This line looks to be missing an argument -- the last {} is showing up in the logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - missed batch id as one of log params

if (batchStatus.errored_operations > 0) {
log.warn("Errored operations count: {}", batchStatus.errored_operations);
}
//TODO: decide if we need to download/parse batch operations (error responses have only generic info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VSydor How hard would it be to do this? Not a huge deal, but it's sometimes helpful to see what specifically failed. Normally it's something like an invalid email address, but at times there is context helpful for debugging...

Copy link
Contributor Author

@VSydor VSydor May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done, actually: added logging for error responses.
However, error response has only generic info:
{ "type": "https://mailchimp.com/developer/marketing/docs/errors/", "title": "Invalid Resource", "status": 400, "detail": "Please provide a valid email address.", "instance": "" }
and does not contain email reference or contact id.

Logging eror responses as:
Failed Batch Operation (status: detail): 400: Please provide a valid email address.

@brmeyer
Copy link
Contributor

brmeyer commented May 5, 2023

@VSydor A few last comments, but functionally, it's working great!!!

@VSydor VSydor force-pushed the feature/mailchimp-batch-upsert branch 4 times, most recently from b93afe7 to f597adb Compare May 10, 2023 09:58
@VSydor VSydor requested a review from brmeyer May 10, 2023 09:59
@brmeyer brmeyer force-pushed the feature/mailchimp-batch-upsert branch 2 times, most recently from eeb40df to 8d05bf4 Compare June 19, 2023 16:26
@brmeyer brmeyer force-pushed the feature/mailchimp-batch-upsert branch from 8d05bf4 to d174b0f Compare July 12, 2023 21:32
@brmeyer brmeyer merged commit f355223 into master Jul 13, 2023
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants