feat(profiling): add a functions summaries dataset #6293

viglia · 2024-09-12T09:27:17Z

For both transaction-based and continuous profiling, we'd like to add a way to provide examples of profiles which generated some metrics.

The idea is, when we're visualizing profile functions metrics, the user, for example while browsing the slowest functions, would be able to click on them and see a list of (clickable) profiles associated with those values.

For transaction-based profiling we used to store the example along with the metrics directly in the functions dataset.

Since we have deprecated the functions dataset and moved our metrics to the generic metrics dataset, we now need a solution for storing these examples, both for the transaction-based and the continuous profiling.

This PR would add a new dataset called functions_summaries that serves this purpose.

Trying to keep the PR small. Processor and consumer will be added in separate PRs

Zylphrex · 2024-09-12T14:31:30Z

snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py

+                table_name=local_table_name,
+                columns=columns,
+                engine=table_engines.MergeTree(
+                    order_by="(project_id, end_timestamp)",


We should think about this ORDER BY a little more as one of the primary use cases will be to find examples by function (probably by fingerprint) so we'd want to optimize for these cases. Not sure what other queries this will need to support yet but ordering by timestamp is not optimizing for the main use case.

Yes, for searching examples by function I think it makes sense to have fingerprint in there.

I'd still keep the timestamp as well though, but after the fingerprint in that case.

Something like: order_by="(project_id, fingerprint, end_timestamp).

For the transaction-based profiling the transaction_name would probably need to be considered, but transaction_name is optional as we won't have that for continuous profiling and so can't be part of the primary key.

If we think that for the transaction-based use case, for perf. reasons, we really need transaction_name, I wonder if it'd make sense to define that as required and just set it to a default hard-coded value for continuous profiling.

snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py

* add start_timestamp * add fingerprint to the order by * remove one of the ID columns and add is_continuous

snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py

github-actions · 2024-09-13T07:39:18Z

This PR has a migration; here is the generated SQL

-- start migrations

-- forward migration functions_examples : 0001_functions_examples_create_table
Local op: CREATE TABLE IF NOT EXISTS functions_examples_local (project_id UInt64, profile_id UUID, is_continuous UInt8, transaction_name LowCardinality(Nullable(String)), fingerprint UInt64, name String, package String, thread_id String, min Float64, max Float64, sum Float64, count UInt64, start_timestamp DateTime, end_timestamp DateTime, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release LowCardinality(Nullable(String)), retention_days UInt16) ENGINE ReplicatedMergeTree('/clickhouse/tables/functions_examples/{shard}/default/functions_examples_local', '{replica}') ORDER BY (project_id, fingerprint, start_timestamp) PARTITION BY (retention_days, toMonday(end_timestamp)) TTL end_timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS functions_examples_dist (project_id UInt64, profile_id UUID, is_continuous UInt8, transaction_name LowCardinality(Nullable(String)), fingerprint UInt64, name String, package String, thread_id String, min Float64, max Float64, sum Float64, count UInt64, start_timestamp DateTime, end_timestamp DateTime, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release LowCardinality(Nullable(String)), retention_days UInt16) ENGINE Distributed(`cluster_one_sh`, default, functions_examples_local);
-- end forward migration functions_examples : 0001_functions_examples_create_table




-- backward migration functions_examples : 0001_functions_examples_create_table
Local op: DROP TABLE IF EXISTS functions_examples_local;
Distributed op: DROP TABLE IF EXISTS functions_examples_dist;
-- end backward migration functions_examples : 0001_functions_examples_create_table

add a functions summaries dataset

a578ab3

viglia self-assigned this Sep 12, 2024

github-actions bot added the migrations label Sep 12, 2024

viglia added 2 commits September 12, 2024 14:03

add functions_summaries to the storage_sets in CLUSTER

03c8760

add function columns

7b1a636

Zylphrex reviewed Sep 12, 2024

View reviewed changes

phacops reviewed Sep 12, 2024

View reviewed changes

snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py Outdated Show resolved Hide resolved

phacops reviewed Sep 12, 2024

View reviewed changes

snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py Outdated Show resolved Hide resolved

phacops reviewed Sep 12, 2024

View reviewed changes

snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py Outdated Show resolved Hide resolved

address feedback

b6a82fc

* add start_timestamp * add fingerprint to the order by * remove one of the ID columns and add is_continuous

phacops reviewed Sep 12, 2024

View reviewed changes

snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py Outdated Show resolved Hide resolved

viglia added 3 commits September 13, 2024 09:03

remove nullable modifier from profile_id column

ca2406f

rename dataset as functions_examples

2b24639

fix loader name

2052013

getsentry deleted a comment from github-actions bot Sep 13, 2024

rename folder

7ee61a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(profiling): add a functions summaries dataset #6293

feat(profiling): add a functions summaries dataset #6293

viglia commented Sep 12, 2024 •

edited

Loading

Zylphrex Sep 12, 2024

viglia Sep 12, 2024

github-actions bot commented Sep 13, 2024

feat(profiling): add a functions summaries dataset #6293

Are you sure you want to change the base?

feat(profiling): add a functions summaries dataset #6293

Conversation

viglia commented Sep 12, 2024 • edited Loading

Zylphrex Sep 12, 2024

Choose a reason for hiding this comment

viglia Sep 12, 2024

Choose a reason for hiding this comment

github-actions bot commented Sep 13, 2024

viglia commented Sep 12, 2024 •

edited

Loading