Resources required to delete a pool affect subsequent test run #117

ASBishop · 2016-09-27T11:46:25Z

Many benchmarks delete and recreate the test pool between runs. However, the Ceph command to delete a pool returns immediately, and the work of deleting objects in the pool takes place in the background. Unfortunately, experience has shown that disk and CPU resources used while deleting the objects is great enough that it influences the test results for the subsequent run.

One way to avoid the problem is to have the cluster.rmpool() function wait until the disk and CPU utilization on the OSD nodes drops to a reasonable level before returning to the caller. I will be issuing a pull request with this change.

When deleting a pool, it may take a while for the OSD nodes to delete the objects in the pool. This change makes CBT wait until the OSD nodes quiesce in order to ensure they are idle before starting the next test run. Quiescing is done by waiting until the maximum disk utilization for any disk falls below 3% across a 30 second window, and waiting until the maximum CPU utilization for any ceph-osd process falls below 3%. Closes ceph#117

bengland2 · 2016-10-19T12:51:48Z

+1

When deleting a pool, it may take a while for the OSD nodes to delete the objects in the pool. This change makes CBT wait until the OSD nodes quiesce in order to ensure they are idle before starting the next test run. Quiescing is done by waiting until the maximum disk utilization for any disk falls below a threshold, and waiting until the maximum CPU utilization for any ceph-osd process falls below a threshold. The thresholds can be tuned using the following cluster configuration parameters (the default values are listed): cluster: quiesce_disk_util_max: 3 quiesce_disk_window_size: 30 quiesce_osd_cpu_max: 3 If quiesce_disk_util_max or quiesce_osd_cpu_max is zero then the corresponding disk/CPU quiescing operation is skipped. Closes ceph#117

When deleting a pool, it may take a while for the OSD nodes to delete the objects in the pool. This change makes CBT wait until the OSD nodes quiesce in order to ensure they are idle before starting the next test run. Quiescing is done by waiting until the maximum disk utilization for any disk falls below a threshold, and waiting until the maximum CPU utilization for any ceph-osd process falls below a threshold. The thresholds can be tuned using the following cluster configuration parameters (the default values are listed): cluster: quiesce_disk_util_max: 3 quiesce_disk_window_size: 30 quiesce_osd_cpu_max: 3 If quiesce_disk_util_max or quiesce_osd_cpu_max is zero then the corresponding disk/CPU quiescing operation is skipped. Closes ceph#117 (cherry picked from commit 3d442c7)

ASBishop mentioned this issue Sep 27, 2016

Allow OSD nodes to quiesce after a deleting pool #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resources required to delete a pool affect subsequent test run #117

Resources required to delete a pool affect subsequent test run #117

ASBishop commented Sep 27, 2016

bengland2 commented Oct 19, 2016

Resources required to delete a pool affect subsequent test run #117

Resources required to delete a pool affect subsequent test run #117

Comments

ASBishop commented Sep 27, 2016

bengland2 commented Oct 19, 2016