Improving Compile Times And/Or Add Pluggable Deserializable Types #481

indiv0 · 2022-05-15T09:02:08Z

Hey! First let me say that I love this project. Clearly a lot of thoughtful design work has been put in. Kudos to all the contributors. Sorry about the wall of text in advance.

My scenario is this: I’ve noticed that even debug incremental recompiles of an absolutely barebones lambda_http app take upwards of 15 seconds. That doesn’t sound like much but multiply this by 10x lambdas in a workspace and it gets out of hand pretty quick.

I did some cursory investigation (the results of which I can post later). My investigation indicates that the slowness comes during the codegen_crate and subsequent LLVM optimize steps of the build.

Checking the amount of emitted LLVM IR, I see that it’s in the 4M+ lines range! This is a lot of code for LLVM to optimize, so it explains why both the steps take so long.

I took a look at why the IR is being generated but I couldn’t find any obvious pathological cases. It appears that the majority of the IR is serde::Deserialize impls on the aws_lambda_events types used by this crate.

I took a look at those types, and they have a heck of a lot of #[serde(deserialize_with=“…”)] on their fields. That attribute isn’t too expensive on its own, but multiplied across every field of every struct -- that’s a lot of codegen. There was also some duplicate code because #[derive(Deserialize)] implements visit_seq and visit_map. The former is unnecessary because we’re dealing with JSON and it is strictly invalid in most cases to allow users to deserialize a JSON array into a struct.

(Side-note: there was a proposal to share the deserialize_with wrappers between visit_seq and visit_map, but it was closed as wontfix)

So I just cut the visit_seq code out entirely via a custom #[derive(Deserialize)] impl. That plus stripping down this crate’s LambdaRequest enum to only cover the cases I care about let me bring the LLVM IR count down from 4.5M to about 3.5.

That was it for the low-hanging fruit. The remaining deserialization code seems necessary, but is a bit bloated, purely because serde is very generous with error handling logic. I could probably write it out by hand to be more efficient, or improve the existing codegen of the #[derive(Deserialize)] macro.

But I’ve been rethinking my approach. The problem is that we’re re-running the #[derive(Deserialize)] macro every time the lambda binary compiles. There are some efforts to disable derives when running cargo check. This is clever for the purposes of improving type hint resolution times, etc. But I don’t think it’s sufficient. What’s needed is a true way to cache the results of proc macro evaluations -- if said proc macros are pure. This finally brings me to my first question: Does anyone know of a way to cache the results of a proc-macro evaluation between builds?

Alternatively: Is it possible to force rustc to evaluate a proc macro for a struct, before said struct is used in a leaf crate?

cc @LegNeato @calavera as maintainers of the aws-lambda-events crate, any ideas? Perhaps we could Selectively cargo-expand the Deserialize impls in the aws-lambda-events crate? It would result in a much larger source file, but could cut out at least some of the crate_codegen time in incremental builds. Not sure how much but it’d be a decent chunk of the 40% it seems to take now.

cc @calavera @nmoutschen as maintainers of this crate, any ideas? Perhaps we could refactor things a bit so that users can provide their own APIGW types and deserialize on their end? That way there’s a bit more flexibility. For example I could fork aws-lambda-events, strip it to just the types I need, and hand roll Deserialize impls. At the moment I’m locked in to the types this crate provides. I understand that this crate does this because it handles a lot of the boilerplate when it comes to talking to AWS, but I’m fine with my crate handling some of that.

The text was updated successfully, but these errors were encountered:

indiv0 · 2022-05-15T09:11:38Z

Another option is to look at crates like https://github.com/dtolnay/miniserde or https://github.com/not-fl3/nanoserde. Disabling the deserializer derives on the aws-lambda-events types and replacing them with a hand-rolled nanoserde impl is probably the dream in terms of compile times, but I realize that this trades compile times for development effort on behalf of the crate authors. So unless it could be done with codegen at the time the dependencies are built, it's probably not viable.

calavera · 2022-05-15T17:14:37Z

I definitely agree with you. The reason why aws_lambda_events have so many deserialization attributes is because there is not a canonical source of truth that really defines if fields are required or optional, so we have to try to cover every single scenario that we can think of. I know AWS is working on improving this, but it's not an easy task since it requires coordination from every service team that publishes lambda events.

Perhaps we could refactor things a bit so that users can provide their own APIGW types and deserialize on their end?

You can already do this by using the lambda_runtime directly. If you don't care about about having an http::Request object, you can just do this:

use lambda_runtime::{service_fn, Error, LambdaEvent};

pub(crate) async fn my_handler(event: LambdaEvent<YourOwnAPIGWRequestStruct>) -> Result<(YourOwnAPIGWResponseStruct, Error> {
  let object = event.payload.;

  // Do something useful with the object here
  
  let resp = ...
  Ok(resp)
}

Feel free to bring ideas and suggestions to the aws_lambda_events repository. I'll really appreciate your help to make the codebase better.

calavera · 2022-05-15T17:18:21Z

Something else to point out, is that the apigw events in aws_lambda_events are no longer generated using the Go lambda event structures, so they can be optimized as much as we want 😉

indiv0 · 2022-05-15T22:12:23Z

You can already do this by using the lambda_runtime directly. If you don't care about about having an http::Request object, you can just do this:

Brilliant! Somehow I missed that possibility. I'll take a look to see how much that helps. Thank you.

Something else to point out, is that the apigw events in aws_lambda_events are no longer generated using the Go lambda event structures, so they can be optimized as much as we want 😉

I didn't know that, that's great. Should provide a lot more flexibility when it comes to optimizing.

I definitely agree with you. The reason why aws_lambda_events have so many deserialization attributes is because there is not a canonical source of truth that really defines if fields are required or optional, so we have to try to cover every single scenario that we can think of. I know AWS is working on improving this, but it's not an easy task since it requires coordination from every service team that publishes lambda events.

That's a shame. Is there any way aws-lambda-events could become the source of truth? Is it possible to make every field mandatory, then turn them optional over time, as we find out which fields are required? It'd be a major breaking API change. That gives me another, possibly insane idea...

The aws-lambda-events crate has features to enable/disable types per-service. Could make that more granular? Down to a per-struct or per-field level? That way users bring in only as much serialization code as they require.

A more sane option would be to make the serde dependency in aws-lambda-events optional. Users can continue using the crate as normal, with no breaking changes. Users who want to optimize for compile times can disable this feature. They can then implement serialization manually, for the types they use, via remote derive. The benefit is that users don't need to re-implement the types. aws-lambda-events remains the source of truth for type definitions. Users just opt-in to manual serialization/deserialization.

Feel free to bring ideas and suggestions to the aws_lambda_events repository. I'll really appreciate your help to make the codebase better.

Absolutely, happy to help. Modifying serde_derive to add an attribute that makes visit_seq optional is a good first step. This is the relevant issue upstream: serde-rs/serde#1587. I'll open a PR with my implementation of it. Once that's merged we can enable the attribute on the fields in aws-lambda-types.

P.S. For anyone who is curious on digging into this for themselves, here are some tangential issues/posts I found while investigating the problem. Most of them aren't actual bugs per se but they include very useful scripts/information on how to profile this stuff:

Serialize/Deserialize implementations are very slow to compile serde-rs/json#313
Migration to serde from rustc-serialize causes binary to grow significantly serde-rs/serde#286
- especially this comment by dtolnay
Investigate compile time of serde crate serde-rs/serde#1146
- especially this comment by matklad
Option to show the type parameters of every instantiation of a function dtolnay/cargo-llvm-lines#7
- especially this comment by matklad
Investigate possibly-pathological amounts of LLVM IR with cargo-llvm-lines servo/servo#26713
Update bincode, reduce code size of script by ~11% servo/servo#27063 (comment)
- especially this comment ZoeyR on they cut the serialization code in bincode by a factor of 4x
Who drops all the Arcs ? rust-lang/rust-analyzer#10065
- especially this referenced zulip chat that explains how the rust-analyzer team discovered and fixed an issue that caused hundreds of duplicate monomorphizations

There are also some existing PRs to serde/serde_json by Marwes/dtolnay that might help with reducing the amount of generated IR, if merged:

indiv0 · 2022-05-15T22:21:55Z

cc @matklad. It seems you were on a similar crusade last year. I'm curious to know what conclusions you came to regarding speeding up serde_derive usage.

matklad · 2022-05-16T05:12:39Z

Yeah, in rust-analyzer we also noticed that serde adds a lot of code. For us, that’s not a problem as this happens in the leaf crate, not in the API crate.

Alternatively: Is it possible to force rustc to evaluate a proc macro for a struct, before said struct is used in a leaf crate?

I think proc macros are evaluated once, eagerly, in the crate which defines the struct. But if proc macro generates a bunch of generic functions, those fns than are monimorphised lazily, with duplication across leaf crates. So another avenue for exploration is to find a trick to force monomorohisstion happen upstream somehow, something along these lines

#[derive(Serialize)]
pub struct Foo {…}

pub fn Foo2json(foo: &Foo) -> String {…}

If the consumer of the above crate used Foo via Serialize interface, compile times will suffer. Using Foo2json non-generic function should be fine though!

matklad · 2022-05-16T11:14:53Z

Also, I feel that "JSON serialization takes a lot time to compile" is a problem with hits pretty-much the whole Rust web services ecosystem and, subjectively, it has smaller visibility than it deserves.

So I am wondering if some t-compiler folks at AWS might want to take a look here? cc @pnkfelix

calavera · 2022-05-16T15:19:08Z

Before we go though the route of big optimizations, a short term, specific, work that we can do, is to identify exactly which fields are required in those events, and which are optional, and remove as much serde attributes as possible. The reason why there are so many is because the Rust events are based on the Go events, and the language semantics are very different. See this issue for example:

calavera/aws-lambda-events#22

We could look into other event packages, like the Java one, which might be more stricts, and strip as many serde attributes as possible.

LegNeato · 2022-05-20T04:04:38Z

Sorry I've been MIA. I'd love to see what the dumbest / straightforward path of just expanding the macros and checking in the generated code before we get fancy.

calavera · 2022-05-28T16:39:19Z

My scenario is this: I’ve noticed that even debug incremental recompiles of an absolutely barebones lambda_http app take upwards of 15 seconds. That doesn’t sound like much but multiply this by 10x lambdas in a workspace and it gets out of hand pretty quick.

hey @indiv0, I got curious this morning, so I took a look at how long it takes to compile for me locally. I noticed that for a simple app, we're bringing all the features from Tokio and Tower, which might not be necessary. The codegen phase for aws_lambda_events is only about 2 seconds on my laptop. There is definitely room for improvement, and we might be able to reduce compilation time with some adjustments in the dependencies that we're bringing. Can you share your cargo timings information so I can take a look?

I put my timings output in this URL so you can take a look: https://starlit-buttercream-cf1ff0.netlify.app/
You can generate a similar one with cargo lambda build --release --timings.

calavera · 2022-06-21T01:24:45Z

I did a basic experiment over the weekend to measure how much time aws_lambda_events adds to the compilation of basic http functions. I removed all the fields from the apigw and alb events, and compilation time for a function already compiled dropped from 18 seconds to 9 seconds. This is a significant drop that we should try to fix, however, I don't believe we can just save 9 seconds, after all, I removed all the information from the events.

These are some ideas that we could explore:

Separate serialization from deserialization: Right now events include both attributes Serialize, and Deserialize, even though request events only need Deserialize, and response events only need Serialize.
Remove as many deserialize_with as possible.
Implement Serialize, and Deserialize manually.

rimutaka · 2022-06-21T01:54:50Z

@calavera , this 2-way Ser/Deser is one of those bugs that were there for long enough to become a feature. This PR added the same for Context to enable stubbing out the lambda and running it locally via an SQS proxy - the payload itself already had both.

If we are to make it one way only, it would be great to still have the other way as an optional feature. A few of my clients rely on it for development. If you do remove it we should still be able to use remote derive, but that can be painful.

calavera · 2022-06-21T15:14:08Z

If we are to make it one way only, it would be great to still have the other way as an optional feature.

Yes, that should definitely be available. I was thinking something in the line of how Serde itself works, were you have to import serde::de to access de-serialization options, and serde::ser to access serialization options. That should probably be a discussion in the aws_lambda_events repo.

calavera · 2022-07-02T20:30:04Z

I realized this morning that there is a much simple solution, implemented in #497. lambda_http compiles events for several endpoints, when in reality, people only put a function behind a specific endpoint. At least I've never seen a lambda function behind an ALB, and also an APIGW.

#497 introduces feature flags to remove the event types that you're not interested in. By choosing one event type warm compilation time drops from 18 to 4.5 seconds.

indiv0 changed the title ~~Improving Compile Times And/OR Pluggable Deserializable Types~~ Improving Compile Times And/Or Add Pluggable Deserializable Types May 15, 2022

calavera mentioned this issue May 28, 2022

Reduce dependency features #486

Merged

2 tasks

calavera mentioned this issue Jul 2, 2022

Feature flags for lambda_http #497

Merged

2 tasks

calavera closed this as completed in #497 Jul 6, 2022

smklein mentioned this issue Nov 13, 2023

Build Speed proposal: Cut out the serde dependency when communicating with steno? oxidecomputer/omicron#4484

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Compile Times And/Or Add Pluggable Deserializable Types #481

Improving Compile Times And/Or Add Pluggable Deserializable Types #481

indiv0 commented May 15, 2022 •

edited

Loading

indiv0 commented May 15, 2022 •

edited

Loading

calavera commented May 15, 2022

calavera commented May 15, 2022

indiv0 commented May 15, 2022

indiv0 commented May 15, 2022 •

edited

Loading

matklad commented May 16, 2022

matklad commented May 16, 2022

calavera commented May 16, 2022

LegNeato commented May 20, 2022

calavera commented May 28, 2022 •

edited

Loading

calavera commented Jun 21, 2022

rimutaka commented Jun 21, 2022

calavera commented Jun 21, 2022

calavera commented Jul 2, 2022

Improving Compile Times And/Or Add Pluggable Deserializable Types #481

Improving Compile Times And/Or Add Pluggable Deserializable Types #481

Comments

indiv0 commented May 15, 2022 • edited Loading

indiv0 commented May 15, 2022 • edited Loading

calavera commented May 15, 2022

calavera commented May 15, 2022

indiv0 commented May 15, 2022

indiv0 commented May 15, 2022 • edited Loading

matklad commented May 16, 2022

matklad commented May 16, 2022

calavera commented May 16, 2022

LegNeato commented May 20, 2022

calavera commented May 28, 2022 • edited Loading

calavera commented Jun 21, 2022

rimutaka commented Jun 21, 2022

calavera commented Jun 21, 2022

calavera commented Jul 2, 2022

indiv0 commented May 15, 2022 •

edited

Loading

indiv0 commented May 15, 2022 •

edited

Loading

indiv0 commented May 15, 2022 •

edited

Loading

calavera commented May 28, 2022 •

edited

Loading