Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Allow OSD nodes to quiesce after a deleting pool
When deleting a pool, it may take a while for the OSD nodes to delete the objects in the pool. This change makes CBT wait until the OSD nodes quiesce in order to ensure they are idle before starting the next test run. Quiescing is done by waiting until the maximum disk utilization for any disk falls below 3% across a 30 second window, and waiting until the maximum CPU utilization for any ceph-osd process falls below 3%. Closes ceph#117
- Loading branch information
c45a356
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I do "iostat -dxyz ALL 3 1" I get nothing, so I don't see how this particular command works, but I'm using RHEL7.2, maybe it's some newer and sexier Linux distro ;-)
However, this command works on RHEL7.2:
c45a356
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command too is problematic at large scale - any Ceph cluster that is doing scrubbing may take a long time to get to the point where this command exits:
I do not want to disable scrubbing on my cluster because I want to understand how it will perform in the real world, at least as a baseline. I'm watching ceph with "ceph -w" and it wasn't doing anything at all except scrubbing. So I guess I'd use a higher threshold than 5% of 1 core. Maybe 15%? When I deleted a pool, they all got real busy real fast.