Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address high IOPs usage of the Gnocchi Ceph pool #1381

Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 38 additions & 6 deletions gnocchi/storage/ceph.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
# License for the specific language governing permissions and limitations
# under the License.
import collections
import daiquiri

from oslo_config import cfg

Expand Down Expand Up @@ -42,6 +43,11 @@

rados = ceph.rados

LOG = daiquiri.getLogger(__name__)

DEFAULT_RADOS_BUFFER_SIZE = 8192
MAP_UNAGGREGATED_METRIC_NAME_BY_SIZE = {}


class CephStorage(storage.StorageDriver):
WRITE_FULL = False
Expand Down Expand Up @@ -88,6 +94,11 @@ def _store_metric_splits(self, metrics_keys_aggregations_data_offset,
for key, agg, data, offset in keys_aggregations_data_offset:
name = self._get_object_name(
metric, key, agg.method, version)
metric_size = len(data)

MAP_UNAGGREGATED_METRIC_NAME_BY_SIZE[name] = metric_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not worth adding a check here but i think this is storing more than just unaggregated/raw measures object size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @chungg the mapping will not be synced across all the MetricD agents. However, that is not an issue. In the worst case scenario we would just execute one extra read, as they (the mappings) in the agents will not get out of sync by a huge factor.

The following picture is the usage of IOPs in a Gnocchi setup that has been running for months now with this patch applied. As you can see, there are more writes than reads, which is what happens before this patch.
image

Before this patch, this is the behavior we had:
image

a potential concern may be that depending on the number of metrics assigned to a worker, the MAP_UNAGGREGATED_METRIC_NAME_BY_SIZE lookup may get large and consume (too much) memory? maybe it makes sense to only store metrics we know have objects larger than 8192?

I agree, I will do so this change.

one more thought, what happens if we increase read buffer globally? does hardcoding 16384+ buffer size make performance worse if the object is smaller?

That is a good question. It seems that on Ceph side, we could not reach a conclusion. Moreover, by putting this burden on the operator, he would just bump numbers, without understanding what is going on in the system. That is why we decided to use a smarter approach to record the latest size that was written in an given RADOS object.

BTW, we did some analysis, and we found RADOS objects of size equals to 10M, 20M, 40M. That is why a single global configuration would probably not help much operators.

probably not worth adding a check here but i think this is storing more than just unaggregated/raw measures object size.

Yes, it is. This is on purpose. Depending on how you use Gnocchi, for instance, CloudKitty, and so on., You are constantly affecting the same split. That is why we also added this here.

LOG.debug("Storing time series size [%s] for metric [%s]",
rafaelweingartner marked this conversation as resolved.
Show resolved Hide resolved
metric_size, name)
if offset is None:
self.ioctx.write_full(name, data)
else:
Expand Down Expand Up @@ -153,7 +164,14 @@ def _get_splits_unbatched(self, metric, key, aggregation, version=3):
try:
name = self._get_object_name(
metric, key, aggregation.method, version)
return self._get_object_content(name)

metric_size = MAP_UNAGGREGATED_METRIC_NAME_BY_SIZE.get(
name, DEFAULT_RADOS_BUFFER_SIZE)

LOG.debug("Reading metric [%s] with buffer size of [%s].",
name, metric_size)

return self._get_object_content(name, buffer_size=metric_size)
except rados.ObjectNotFound:
return

Expand Down Expand Up @@ -206,9 +224,16 @@ def _build_unaggregated_timeserie_path(metric, version):

def _get_or_create_unaggregated_timeseries_unbatched(
self, metric, version=3):
metric_name = self._build_unaggregated_timeserie_path(metric, version)
metric_size = MAP_UNAGGREGATED_METRIC_NAME_BY_SIZE.get(
metric_name, DEFAULT_RADOS_BUFFER_SIZE)

LOG.debug("Reading unaggregated metric [%s] with buffer size of [%s].",
metric_name, metric_size)

try:
contents = self._get_object_content(
self._build_unaggregated_timeserie_path(metric, version))
metric_name, buffer_size=metric_size)
except rados.ObjectNotFound:
self._create_metric(metric)
else:
Expand All @@ -218,14 +243,21 @@ def _get_or_create_unaggregated_timeseries_unbatched(

def _store_unaggregated_timeseries_unbatched(
self, metric, data, version=3):
self.ioctx.write_full(
self._build_unaggregated_timeserie_path(metric, version), data)

def _get_object_content(self, name):
metric_name = self._build_unaggregated_timeserie_path(metric, version)
metric_size = len(data)

MAP_UNAGGREGATED_METRIC_NAME_BY_SIZE[metric_name] = metric_size
LOG.debug("Storing unaggregated time series size [%s] for metric [%s]",
rafaelweingartner marked this conversation as resolved.
Show resolved Hide resolved
metric_size, metric_name)
self.ioctx.write_full(metric_name, data)

def _get_object_content(self, name, buffer_size=DEFAULT_RADOS_BUFFER_SIZE):
offset = 0
content = b''

while True:
data = self.ioctx.read(name, offset=offset)
data = self.ioctx.read(name, length=buffer_size, offset=offset)
if not data:
break
content += data
Expand Down
Loading