Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use gzip to compress self profile json #1966

Closed
wants to merge 1 commit into from

Conversation

s7tya
Copy link
Contributor

@s7tya s7tya commented Aug 22, 2024

We're currently not compressing the results from the /perf/processed-self-profile endpoint, which sometimes leads to downloading files over 800MB. To reduce load times, this PR proposes adding gzip compression, which is supported by Perfetto. Compressing an 850MB JSON file takes about 4.4 seconds, but the size of compressed data is about 30 MB so this should somewhat improve performance.

Also, I'm using headers::ContentType::from("application/gzip".parse::<mime::Mime>().unwrap()) because the mime crate doesn't support application/gzip and it seems to be unmaintained.
hyperium/mime#136

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Left a few comments.

site/Cargo.toml Outdated Show resolved Hide resolved
@@ -127,6 +127,8 @@ pub async fn handle_self_profile_processed_download(
ContentType::from("image/svg+xml".parse::<mime::Mime>().unwrap())
} else if output.filename.ends_with("html") {
ContentType::html()
} else if output.filename.ends_with("gz") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is the correct way to do this. If we return GZIPed data from a web server, we should return the normal file extension (in this case .json) and use the MIME corresponding to it. And then separately, we should set Content-Encoding: gzip to specify that the file is being compressed. See https://stackoverflow.com/a/59999751/1107768.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there was a difference between these two approaches. Using Content-Encoding lets browsers automatically decompress the content, whereas if you use Content-Type: gzip, the responsibility falls on the Perfetto side. By using Content-Encoding, we could potentially use Brotli instead of gzip, since almost all modern browsers support it. If we didn't have to worry about WebKit, we could even use zstd, though that might be a bit overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And currently we're downloading blob and then move to perfetto's website, so things work like this:

compress (server) → fetch (client) → decompress (client) → PAGE TRANSITION → load (perfetto)
compress (server) → fetch (client) → PAGE TRANSITION → decompress & load (perfetto)

In this case, I thought it would be nice if we could reduce the time we wait on the RLO page after clicking "query trace" button as there's no animation or something that tells it's working

Copy link
Contributor

@Kobzol Kobzol Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there is indeed a difference! We load the data that we pass to Perfetto using "normal browser logic", i.e. the fetch call, which should do the decompression for us in the background (hopefully using optimized C/C++ code).

I get your idea about this:

compress (server) → fetch (client) → decompress (client) → PAGE TRANSITION → load (perfetto)
compress (server) → fetch (client) → PAGE TRANSITION → decompress & load (perfetto)

that seems interesting. So the question is whether it is better to go to Perfetto faster, and let it unGZIP the data using its decompression algorithm (presumably implemented in some C++ -> JS way, hopefully in webassembly?) or if it is faster to let the browser decompress the data using its native code (which should, I expect, be faster than in Perfetto), and only then go to Perfetto. Using the second approach, we could even use e.g. Brotli, as you said (we even have support for it already in the website).

Without measuring this, I think that the browser decompression approach could be faster, especially if we switch to Brotli. If that is the case, we could resolve the slightly longer wait time until Perfetto opens by adding some loading indicator, that shouldn't be that difficult.

Could you please do some small benchmark to evaluate the two approaches? Ideally with as a large a file as possible 😆 You could download the 600 MiB cargo trace locally, and simply load it from disk (and then compress & return it) on your local endpoint, to avoid overloading our live server. The goal of the benchmark would be to evaluate:

  1. What time it takes to: return .gz from server, pass it to Perfetto, let it decompress it
  2. What time it takes to: return .json from server with Content-encoding: gzip, let the browser decompress it, then pass it to Perfetto

Ideally, for both cases, it would be great to know both the duration of the actual file download (which will include also decompression time with approach 2), and also the end-to-end duration between clicking and the trace being fully loaded in Perfetto (you can use a stopwatch xD).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I've created new PR #1968 because they're much different. So mark the related conversations resolved.
and the brotli is so much faster than the current gzip one. it only takes 0.5s to compress...

site/src/self_profile.rs Show resolved Hide resolved
@@ -37,9 +40,15 @@ pub fn generate(
ProcessorType::Crox => {
let opt = serde_json::from_str(&serde_json::to_string(&params).unwrap())
.context("crox opts")?;
let data = crox::generate(self_profile_data, opt).context("crox")?;

let mut encoder = GzEncoder::new(Vec::new(), Compression::default());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please experiment with the Compression levels (there is also a constructor Compression::fast() and Compression::best()) to see what is the performance/result size trade-off? Maybe we could find some better options than the defaults.

site/src/self_profile.rs Show resolved Hide resolved
@s7tya s7tya marked this pull request as draft August 23, 2024 10:20
@s7tya s7tya closed this Aug 23, 2024
@s7tya s7tya deleted the gzip-profile branch October 11, 2024 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants