New Performance Issue Types #37083

nanoburd · 2022-07-26T21:13:46Z

nanoburd
Jul 26, 2022

We are soliciting ideas for possible performance issues that we can deliver this year.

Some problem constraints to get your creative juices going!

1) Problem can be confidently detected by Sentry! Introduce more signal than noise.

~80% sure when we detect it, its a clear performance problem
Think about leveraging insights that can be easily added or that we have access to like suspect spans, tags, duration, etc.

2) The problem detected is actionable, otherwise its just noise again.

~60% sure we know what the general action a developer should take
ok if its more than one suggested action (ie. look at this tag or maybe your hardware)

3) The problem detected is VALUABLE to a developer.

whats a common performance problem developers across languages will eventually have to deal with?
alternatively, what's a SUPER ANNOYING performance problem for a specific use-case or framework?

Or hey, challenge the constraints we're putting around this problem. =)

If you have an idea, reply with:

How do we detect your performance issue?
What action would a developer take to fix this issue?
How impactful do you think this is to a developer? Specific or generic? What languages or frameworks would this apply to?

dcramer · 2022-07-26T21:18:24Z

dcramer
Jul 26, 2022
Maintainer

To help aid this conversation, the example that started it all: O(N) queries

Often consumers of ORMs will generate a SQL query in a loop without even knowing it. These are generally trivial to detect, but are often ignored or missed until capacity of the system (the database in this case) becomes an issue. They're typically easy to fix, so detecting them early is critical to maintaining stability.

Similar in nature, duplicate queries can also be caught this way. If you execute the same call N times in a request lifecycle, there's sometimes ways to consolidate those calls. This one is usually more relevant when the query returns large data sets (total bytes) or takes a long time. Otherwise the duplicate calls can be hard to optimize as they generally will live in fairly different sections of the application.

4 replies

antonpirker Jul 27, 2022
Collaborator

In my years as a Django developer this was always the first optimizations. Looking at Django Debug Toolbar to find the views with the most SQL queries and optimizing this.
If those views would show up in Sentry Issues automatically, it would be really cool.

(And I guess counting total queries in a transaction (or duplicate queries in a transaction) in the (Python) SDK and then creating an Issue for this and hook it to the transaction should be easy, and we do not need to change anything on the server for this to work. The performance issue would just look like any other issue in Sentry though)

mwarkentin Jul 27, 2022
Collaborator

Along similar lines, N+1 API calls (eg. a list view which calls GET for each object) can cause similar performance problems and might indicate that you need to include more data in the LIST call or provide some sort of bulk fetch endpoint.

gggritso Aug 11, 2022
Collaborator

@mwarkentin N+1 API calls should be doable, we're exploring this! You can see the details, including a sample app, in Notion. To your point, we're looking for something like this:

The general plan here is to look for > N concurrent http spans:

N to be determined (maybe 3 or more?)
concurrency criteria TBD, something about threshold of start time, maybe something about threshold of end time
similarity criteria TBD, something about similarity of start time, maybe something about similarity of duration, maybe something about similarity of URL

gggritso Aug 11, 2022
Collaborator

N+1 queries should appear as a subset of #37083 (comment), since we're expecting to see multiple consecutive db spans for the SELECTS. More details there

AbhiPrasad · 2022-07-26T21:19:37Z

AbhiPrasad
Jul 26, 2022
Collaborator

An easy performance issue many front-end developers will understand immediately is if your page has excess component re-renders. For example, a sidebar component re-renders multiple times, even though it's content stays the same. We track ui.update spans for many frameworks (for ex. react), we can expand out that functionality as needed. If we detect that a pageload/navigation has mass re-renders from the same component, we alert users about it.

A developer would ideally put in some kind of memoization or change how context is propagated to address this issue. This might be pretty specific to vdom browser frameworks, but perhaps we can abstract it out to UI's in general, since we generally use the same ui.update span operations across frameworks.

It's important to note that this has the chance of false positives, so we'll have to put some thought into how to avoid that

Perhaps this also leads us to further integrate with things like https://reactjs.org/docs/profiler.html, but I think we have some room to explore here.

2 replies

0Calories Aug 12, 2022
Collaborator

Update: This is something that the team has decided to take on, and we are aiming to release an excess component re-render issue type for EA on Sept 15. The first iteration will focus on React, but we will likely expand it out to other SDKs in the future.

In the meantime, we have added detection of a new issue type which will be beneficial for frontend devs running any of our JS SDKs: Long Task Spans!

Long Tasks are described as "any uninterrupted period where the main UI thread is busy for 50 ms or longer". The ability to detect Long Tasks and generate ui.long-task spans in transactions will be available in the next JS SDK release. When we see Long Task spans in a transaction, we will detect an issue if the cumulative duration of these spans exceeds 500ms (this value is not set in stone and we'll tweak it as we see fit).

Here's an example of a transaction where this issue was detected:

k-fish Aug 15, 2022
Maintainer

The sdk changes for long tasks should be in the next release of sentry-javascript (PR), so users will start receiving long tasks in their javascript project transactions irrespective of performance issues, but it also means that we can start detecting it with performance issues 👍

lucas-zimerman · 2022-07-26T23:58:32Z

lucas-zimerman
Jul 26, 2022

A request being canceled by the user (connection lost for example) should be considered as a failed transaction?

Another point are timeouts on Backend, you may just be getting a symptom of the problem on the Transaction and a Developer may need to further investigate if the issue is on that specific transaction with specific user inputs or it's a chain reaction of another part of their systems not working. In this case, maybe we could introduce new types, for example: database_timeout, database_deadlock,...

2 replies

mitsuhiko Jul 27, 2022
Maintainer

Network flakiness or partially broken apps due to users smashing cancel buttons / refresh are so common that at the very least we should specifically mark such transactions so that we can tell them apart.

k-fish Aug 15, 2022
Maintainer

We do have statuses on http spans for our js browser sdk's, so if we collected that the status is status != ok of any http call within the transaction, we could either detect this as an issue, or as Armin mentioned at the very least denote the transaction as having failed http calls in some way.

For the issues side, there are specific classes of failures we might be able to detect (eg. timeouts) since we can check for durations as well.

mitsuhiko · 2022-07-27T12:00:46Z

mitsuhiko
Jul 27, 2022
Maintainer

Backend related:

For me a quite useful callout I wish I would get if an API call (or an entire transaction) deviates from the baseline a lot. Quite often that is obviously caused by something like a bad database query (can be an N+1 query) but it can also just be because the input to the backend call is larger than anticipated.

For instance from my own experience at Sentry we have customers who upload 5 source maps, and we have customers who upload 50.000 source maps. For the latter a quite common experience is that this not just slows down, but it also can then eventually cause failures. I wish Sentry called that out before it becomes fatal. So the generalized version of this is understanding when an endpoint does work in relation to large and seemingly unbounded input data. Similar issues we have with some internal endpoints that scale badly to large project counts in organizations for instance. I wish I would be alerted of this instead of having to manually look.

1 reply

k-fish Aug 15, 2022
Maintainer

For context for others who might not be familiar, for problems like the source maps 5->50,000 you mentioned, we currently have a passive to find out that the variable n source maps is specifically having an impact of performance due to n already, which is our suspect spans feature with it's default cumulative duration sort. In fact, internally for our sentry.tasks.store.process_event task transaction, the top suspect spans are JavaScriptStacktraceProcessor.fetch_file.http and JavaScriptStacktraceProcessor.fetch_sourcemap for this exact reason.

As mentioned, calling it out to a user is a lot more valuable. Pulling it out into a performance issue to proactively help users find these sorts of problems is definitely possible. It should be caught with our broader identical span detection @gggritso mentioned below, since the key factor is that it's likely a repeated operation in any case of O(n) behaviour, so long as there is a span wrapping each of the n. In terms of quality of that issue, we would likely need to do some tuning or could switch over to a alert-style issue on spans potentially with metrics.

mwarkentin · 2022-07-27T13:52:17Z

mwarkentin
Jul 27, 2022
Collaborator

I'm not sure if we currently capture enough information to be able to alert on this, but another database use case would be detecting full table scans which could indicate a missing index or poorly designed query.

My guess is we'd probably need some way to trigger an EXPLAIN query or capture the number of rows returned by a query to detect this.

0 replies

gggritso · 2022-07-27T18:18:23Z

gggritso
Jul 27, 2022
Collaborator

One idea from the Performance team is waterfall span detection. If dupliate non-overlapping spans are appearing in sequential order, that's a performance issue. One example of this is Python code that makes sequential slow calls to a service, when it should be using an async approach like futures. Another example is JS code that makes sequential network calls to fetch information, when it should be using Promise.all.

1 reply

gggritso Aug 11, 2022
Collaborator

We have a simple version of this running right now. Details are in Notion but here's the gist for how we're detecting this:

iterate through each span, one by one
for every span, see if it's following (but not overlapping) a previous span of the same op
continue until the chain is broken
if the chain has a span length > N and total duration > M (both thresholds TBD) report a performance issue
this is probably going to be a good way to catch N+1 queries as described above

It looks something like this when detected:

We're expecting to find multiple related but different issues this way:

O(N) queries
UI render chains
non-bulk inserts

etc.

gggritso · 2022-07-27T18:21:00Z

gggritso
Jul 27, 2022
Collaborator

A small but maybe powerful idea is to detect identical spans. One example is front-end code that makes multiple GET requests to the same URL during page load. This often happens in more complicated codebases as the code grows and people stop auditing the requests that happen during initial load.

1 reply

gggritso Aug 11, 2022
Collaborator

We're detecting a simple version of this, too. This issue is pretty straight-forward for us:

iterate through every span in the transaction
fingerprint each span by concatenating the op and description
see if there are any fingerprints with > 1 span

If yes, and the total duration of those spans exceeds some threshold, report a performance issue. More details in Notion.

This one is interesting because it catches truly identical spans. This will catch rare examples like multiple GET requests to the exact same URL. It'll also catch some more common examples, like parametrized queries, if they're sufficiently long.

gggritso · 2022-07-27T18:23:27Z

gggritso
Jul 27, 2022
Collaborator

One other idea is to allow developers to manually create performance issues. We can add UI buttons to convert suspicious events, areas of graphs, or other performance objects to issues manually. e.g., a developer spots that the LCP graph has an unusual bump, and presses a button that reports that as a performance issue. @k-fish also pointed out that this is a good research tactic: we can see what people report as performance issues, and then try to automate that later.

0 replies

mjq-sentry · 2022-07-28T19:25:48Z

mjq-sentry
Jul 28, 2022

I can't confidently speak to all the criteria above yet, but for our mobile SDKs I wonder if it would be useful to highlight likely-performance-degrading patterns like IO happening on the main thread. If we see it happening in the same trace as an ANR or slow/frozen frames then it would be especially likely to be relevant. I don't know how much more useful this is than a plain ANR/slow frame alert, but at least we might be able to hint at causes and solutions this way? (I also don't know if the SDKs are currently know which thread a particular span comes from, which I imagine would be a prerequisite).

0 replies

alexjillard · 2022-08-02T17:42:04Z

alexjillard
Aug 2, 2022
Maintainer

@bruno-garcia brought up an issue to the team in a chat today about shader compilation in Flutter causing animations to render poorly.
getsentry/sentry-dart#971.

0 replies

antonpirker · 2022-08-02T18:55:27Z

antonpirker
Aug 2, 2022
Collaborator

For mobile applications: if there is massive overdraw, tell the user that this is making the app slow. See: https://developer.android.com/topic/performance/rendering/overdraw

Can also support other rendering related problems, see https://developer.android.com/topic/performance/rendering/

0 replies

gggritso · 2022-08-11T20:24:01Z

gggritso
Aug 11, 2022
Collaborator

We're working on detecting one more specific issue type: slow database spans. See Notion for details. We're going to raise any egregiously slow database span as a performance issue if it exceeds a certain threshold.

0 replies

mjq-sentry · 2022-08-16T02:07:31Z

mjq-sentry
Aug 16, 2022

Just surfacing another issue type we've been looking at: slow assets. (For employees, here's the Notion doc).

There are two things we've been experimenting with detecting: large uncompressed assets (first pass: #37633, on hold) and slow render-blocking assets (first pass: #37826). Since render-blocking asset detection relies on heuristics (we don't currently know for sure that an asset is render-blocking with what our SDKs currently collect) we'll be validating the detector's accuracy before creating any actual performance issues from it.

0 replies

New Performance Issue Types #37083

nanoburd Jul 26, 2022

1) Problem can be confidently detected by Sentry! Introduce more signal than noise.

2) The problem detected is actionable, otherwise its just noise again.

3) The problem detected is VALUABLE to a developer.

Or hey, challenge the constraints we're putting around this problem. =)

Replies: 13 comments · 11 replies

dcramer Jul 26, 2022 Maintainer

antonpirker Jul 27, 2022 Collaborator

mwarkentin Jul 27, 2022 Collaborator

gggritso Aug 11, 2022 Collaborator

gggritso Aug 11, 2022 Collaborator

AbhiPrasad Jul 26, 2022 Collaborator

0Calories Aug 12, 2022 Collaborator

k-fish Aug 15, 2022 Maintainer

lucas-zimerman Jul 26, 2022

mitsuhiko Jul 27, 2022 Maintainer

k-fish Aug 15, 2022 Maintainer

mitsuhiko Jul 27, 2022 Maintainer

k-fish Aug 15, 2022 Maintainer

mwarkentin Jul 27, 2022 Collaborator

gggritso Jul 27, 2022 Collaborator

gggritso Aug 11, 2022 Collaborator

gggritso Jul 27, 2022 Collaborator

gggritso Aug 11, 2022 Collaborator

gggritso Jul 27, 2022 Collaborator

mjq-sentry Jul 28, 2022

alexjillard Aug 2, 2022 Maintainer

antonpirker Aug 2, 2022 Collaborator

gggritso Aug 11, 2022 Collaborator

mjq-sentry Aug 16, 2022

nanoburd
Jul 26, 2022

Replies: 13 comments 11 replies

dcramer
Jul 26, 2022
Maintainer

antonpirker Jul 27, 2022
Collaborator

mwarkentin Jul 27, 2022
Collaborator

gggritso Aug 11, 2022
Collaborator

gggritso Aug 11, 2022
Collaborator

AbhiPrasad
Jul 26, 2022
Collaborator

0Calories Aug 12, 2022
Collaborator

k-fish Aug 15, 2022
Maintainer

lucas-zimerman
Jul 26, 2022

mitsuhiko Jul 27, 2022
Maintainer

k-fish Aug 15, 2022
Maintainer

mitsuhiko
Jul 27, 2022
Maintainer

k-fish Aug 15, 2022
Maintainer

mwarkentin
Jul 27, 2022
Collaborator

gggritso
Jul 27, 2022
Collaborator

gggritso Aug 11, 2022
Collaborator

gggritso
Jul 27, 2022
Collaborator

gggritso Aug 11, 2022
Collaborator

gggritso
Jul 27, 2022
Collaborator

mjq-sentry
Jul 28, 2022

alexjillard
Aug 2, 2022
Maintainer

antonpirker
Aug 2, 2022
Collaborator

gggritso
Aug 11, 2022
Collaborator

mjq-sentry
Aug 16, 2022