Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(profiling): add a functions summaries dataset #6293

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

viglia
Copy link
Contributor

@viglia viglia commented Sep 12, 2024

For both transaction-based and continuous profiling, we'd like to add a way to provide examples of profiles which generated some metrics.

The idea is, when we're visualizing profile functions metrics, the user, for example while browsing the slowest functions, would be able to click on them and see a list of (clickable) profiles associated with those values.

For transaction-based profiling we used to store the example along with the metrics directly in the functions dataset.

Since we have deprecated the functions dataset and moved our metrics to the generic metrics dataset, we now need a solution for storing these examples, both for the transaction-based and the continuous profiling.

This PR would add a new dataset called functions_summaries that serves this purpose.

Trying to keep the PR small. Processor and consumer will be added in separate PRs

@viglia viglia self-assigned this Sep 12, 2024
table_name=local_table_name,
columns=columns,
engine=table_engines.MergeTree(
order_by="(project_id, end_timestamp)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about this ORDER BY a little more as one of the primary use cases will be to find examples by function (probably by fingerprint) so we'd want to optimize for these cases. Not sure what other queries this will need to support yet but ordering by timestamp is not optimizing for the main use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for searching examples by function I think it makes sense to have fingerprint in there.

I'd still keep the timestamp as well though, but after the fingerprint in that case.

Something like: order_by="(project_id, fingerprint, end_timestamp).

For the transaction-based profiling the transaction_name would probably need to be considered, but transaction_name is optional as we won't have that for continuous profiling and so can't be part of the primary key.

If we think that for the transaction-based use case, for perf. reasons, we really need transaction_name, I wonder if it'd make sense to define that as required and just set it to a default hard-coded value for continuous profiling.

* add start_timestamp
* add fingerprint to the order by
* remove one of the ID columns and add is_continuous
@getsentry getsentry deleted a comment from github-actions bot Sep 13, 2024
Copy link

This PR has a migration; here is the generated SQL

-- start migrations

-- forward migration functions_examples : 0001_functions_examples_create_table
Local op: CREATE TABLE IF NOT EXISTS functions_examples_local (project_id UInt64, profile_id UUID, is_continuous UInt8, transaction_name LowCardinality(Nullable(String)), fingerprint UInt64, name String, package String, thread_id String, min Float64, max Float64, sum Float64, count UInt64, start_timestamp DateTime, end_timestamp DateTime, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release LowCardinality(Nullable(String)), retention_days UInt16) ENGINE ReplicatedMergeTree('/clickhouse/tables/functions_examples/{shard}/default/functions_examples_local', '{replica}') ORDER BY (project_id, fingerprint, start_timestamp) PARTITION BY (retention_days, toMonday(end_timestamp)) TTL end_timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS functions_examples_dist (project_id UInt64, profile_id UUID, is_continuous UInt8, transaction_name LowCardinality(Nullable(String)), fingerprint UInt64, name String, package String, thread_id String, min Float64, max Float64, sum Float64, count UInt64, start_timestamp DateTime, end_timestamp DateTime, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release LowCardinality(Nullable(String)), retention_days UInt16) ENGINE Distributed(`cluster_one_sh`, default, functions_examples_local);
-- end forward migration functions_examples : 0001_functions_examples_create_table




-- backward migration functions_examples : 0001_functions_examples_create_table
Local op: DROP TABLE IF EXISTS functions_examples_local;
Distributed op: DROP TABLE IF EXISTS functions_examples_dist;
-- end backward migration functions_examples : 0001_functions_examples_create_table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants