-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degradation in large deployments after upgrade from PSMDB operator 1.14.0 to 1.16.1 #1635
Comments
Hey @MatzeScandio - thanks for sharing. We will have a look.
|
Hi @MatzeScandio, I need to know more about your clusters. For example, do you use PITR? Can you provide one of your CRs? |
Hi @spron-in - thanks for your reply. To answer your questions:
|
Hi @hors - thanks for your reply. We currently have PITR disabled.
This is after applying the workaround and disabling the cleanup of old backups. Before the workaround the backup task block was:
|
Report
Performance degradation in large deployments after upgrade from PSMDB operator 1.14.0 to 1.16.1
More about the problem
Analysis
we identified an unusual high amount of calls to the backup API (
/apis/psmdb.percona.com/v1/namespaces/mongodb/perconaservermongodbbackups
) as the main contributor to this behaviourthe actual bug seems to be in pkg/controller/perconaservermongodb/backup.go#L145
oldScheduledBackups()
also happens 90 timeswe are unsure why it didn't happen before version 1.16.1 as the backup code is mostly unchanged
Workaround
rename each backup task to have a unique identifier
disable the cleanup for each backup task by setting
keep=0
and write a custom k8s cronjob that deletes any psmdb-backup older than 30 daysworks if there is no need for individual retention periods per db
eliminates API requests alltogether, speeding up the reconcile calls significantly
Steps to reproduce
keep
attribute to something above 0/apis/psmdb.percona.com/v1/namespaces/mongodb/perconaservermongodbbackups?labelSelector=ancestor%3Ddaily%2Ccluster%3D<db-name>
With just 5 databases and a limited number of backups this will of course not result in a slowdown, but you will be able to see the repeated calls to the API endpoint.
Alternatively
keep
attribute to something above 0Versions
Anything else?
The text was updated successfully, but these errors were encountered: