Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace TenantIndex, BlockMeta and CompactedBlockMeta with proto near-equivalents #4072

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

zalegrala
Copy link
Contributor

@zalegrala zalegrala commented Sep 11, 2024

What this PR does:

Here we replace the TenantIndex, BlockMeta and CompactedBlockMeta with structs which are generated from proto. This allows us to convert the backend to proto-only in the future without the translation between a backend struct and an internal struct, since the internal structs here are being replaced.

The replacement structs are not identical. Of note, the BlockMeta struct has several fields which have no proto equivalent. In order to overcome this, several types have been defined in the proto as bytes, which allows the use of the customtype gogo proto option to point to a specific type. Those types must conform to a specific interface, and so methods have been added where necessary.

  • backend.DedicatedColumns updated to include proto marshal and unmarshal methods
  • backend.Encoding updated to include prto marshal and unmarshal methods
  • backend.BlockMeta has a few type changes
    • TotalObjects changes int -> int32
    • Size becomes a funcion and the old size is moved to Size_
    • CompactionLevel changes uint8 -> uint32
    • BloomShardCount changes uint16 -> uint32
    • ReplicationFactor changes uint8 -> uint32

A new UUID package has been added to wrap the google UUID package and provide
the necessary interface for use with proto.

Currently for all three files (TenantIndex, BlockMeta, and CompactedBlockMeta), each follow the same logic:

  • write both proto and json to the backend
  • read proto first, and fall back to json
    • a failure to read the proto is non-fatal
    • a failure to read the json is fatal

This will allow us to read old blocks that were written before this change, and also roll back if necessary. In a future release, we can stop writing the json to the backend and only read/write proto.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zalegrala zalegrala force-pushed the jsonpbTenantIndex branch 3 times, most recently from 513e33b to 3c10c0f Compare September 11, 2024 17:52
@zalegrala zalegrala changed the title wip: Replace TenantIndex, BlockMeta and CompactedBlockMeta with proto near-equivalents Replace TenantIndex, BlockMeta and CompactedBlockMeta with proto near-equivalents Sep 11, 2024
@@ -222,7 +222,8 @@ PROTOC = docker run --rm -u ${shell id -u} -v${PWD}:${PWD} -w${PWD} ${DOCKER_PRO
PROTO_INTERMEDIATE_DIR = pkg/.patched-proto
PROTO_INCLUDES = -I$(PROTO_INTERMEDIATE_DIR)
PROTO_GEN = $(PROTOC) $(PROTO_INCLUDES) --gogofaster_out=plugins=grpc,paths=source_relative:$(2) $(1)
PROTO_GEN_WITH_VENDOR = $(PROTOC) $(PROTO_INCLUDES) -Ivendor -Ivendor/github.com/gogo/protobuf --gogofaster_out=plugins=grpc,paths=source_relative:$(2) $(1)
PROTO_GEN_WITH_VENDOR = $(PROTOC) $(PROTO_INCLUDES) -Ivendor -Ivendor/github.com/gogo/protobuf -Ipkg/tempopb --gogofaster_out=plugins=grpc,paths=source_relative:$(2) $(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this need to change?

@@ -2,6 +2,8 @@ syntax = "proto3";

package tempopb;

option go_package = "github.com/grafana/tempo/pkg/tempopb";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this need to change?

pkg/uuid/uuid.go Outdated
@@ -0,0 +1,87 @@
// Package uuid provides a UUID type that can be used in protocol buffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to tempodb/backend? it would resolve a lot of the google_uuid vs uuid conflict

if err != nil {
return 0, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for perf: data[0] = byte(e)

return nil, nil
}

return json.Marshal(dcs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a comment about why we are doing this add and the dedicated columns "interning"

// TenantIndex implements backend.Reader
func (r *reader) TenantIndex(ctx context.Context, tenantID string) (*TenantIndex, error) {
reader, size, err := r.r.Read(ctx, TenantIndexName, KeyPath([]string{tenantID}), nil)
ctx, span := tracer.Start(ctx, "reader.TenantIndex")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after our discussions yesterday and some thought i'm more on board with leaving the individual block metas json, and only swapping the tenant index over to proto.

i think this would be accomplished by removing all the special backend logic as well as the above blockMetaPB function. Allow that code to take your new BlockMeta struct and marshal to/from json as always.

this would be a huge win for resource usage in compactors/queriers/frontends and dodge the more challenging elements of this PR.

}

out := &TenantIndex{}
err = out.Unmarshal(bytesPb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps make this a method on the tenant index itself to mirror the existing marshal/unmarshal?

https://github.com/grafana/tempo/blob/main/tempodb/backend/tenantindex.go#L58

also, it looks like we have no compression here? i think we should attempt to use zstd as a replacement for gzip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants