Replies: 10 comments 10 replies
-
I need to move on for now, but if someone wants to pick this up and investigate further, that would be much appreciated! |
Beta Was this translation helpful? Give feedback.
-
There is also this tool which shows duplicate functions output in the binary https://github.com/davidlattimore/duplicate-function-checker. |
Beta Was this translation helpful? Give feedback.
-
A really interesting initiative! I had a play with |
Beta Was this translation helpful? Give feedback.
-
It might be worth taking a look at #1559 again, given its prevalence nowadays. |
Beta Was this translation helpful? Give feedback.
-
fwiw |
Beta Was this translation helpful? Give feedback.
-
If we undertake this, we should have a good framework to measure the impact.
We'll also need to agree on what we value higher if any of these conflict. Rustc has a pretty good performance dashboard to compare commits, maybe a good reference. In general, it is not easy to get consistent measurements unless someone has a dedicated machine that can be used for this (with locked cpu frequency, always same rustc version, etc. etc.). But this is a very high bar -- we can totally start without all of this. |
Beta Was this translation helpful? Give feedback.
-
I know that a lot of people care about binary size, but from my point of view it doesn't really matter at all. Game sizes will always be dominated by assets, not code, and so I don't think it is worthwhile to spend time trying to cut down on binary size. Compile times are much more interesting to optimize, as long as there is no cost in terms of ergonomics or performance. |
Beta Was this translation helpful? Give feedback.
-
As part of my The file size has progressively grown, but there's no single PR responsible for a large spike. (For reference, blue is just |
Beta Was this translation helpful? Give feedback.
-
As far as compile times are concerned the biggest speedup I got was from switching clang over to use mold, dynamic linking actually made it worse... at least with mold, it's been a while since I played around with this (and compile times have stayed snappy). Not optimising at all is also a compile time killer, presumably due to codesize exploding at the IR level if basic stuff like dead code elim doesn't run. in [profile.dev]
opt-level = 1
[profile.dev.package."*"]
opt-level = 3
[profile.release]
opt-level = 3
strip = "debuginfo"
[profile.release-lto]
inherits = "release"
lto = true Switch packages you're currently working on over to And in case you're on NixOS, enabling mold for flake.nix{
description = "bevy dev";
inputs = {
nixpkgs.url = "github:nixos/nixpkgs/23.11";
flake-utils.url = "github:numtide/flake-utils";
};
outputs = { self, nixpkgs, flake-utils }:
let supportedSystems = [ "x86_64-linux" ];
in flake-utils.lib.eachSystem supportedSystems (system:
let pkgs = import nixpkgs {inherit system;};
in {
inherit pkgs;
devShell = with pkgs; mkShell.override {
stdenv = stdenvAdapters.useMoldLinker clangStdenv; # <---- Here be moldy magic
} rec {
nativeBuildInputs = [
pkg-config
];
buildInputs = [
udev alsa-lib
gtk3 # panic handler msgbox
vulkan-loader
xorg.libX11 xorg.libXcursor xorg.libXi xorg.libXrandr
libxkbcommon wayland
];
LD_LIBRARY_PATH = lib.makeLibraryPath buildInputs;
};
}
);
} |
Beta Was this translation helpful? Give feedback.
-
The annoying part about We could make the non-generic portion generic on the size of the bundle, instead of the bundle itself, and type-erase it through
Naive implementation, using a type-erased
|
Beta Was this translation helpful? Give feedback.
-
Random thing I got sidetracked on: I think its worthwhile to go through all of our generic / monomorphized code to try to find areas where we can "hoist out" the generic parts to cut down on our code gen, which would ideally both reduce binary sizes and compile times.
cargo bloat
is a tool that lets you see what functions are the "biggest", however to my knowledge, it doesn't have the ability to combine the data for all functions with the same name / path (aka, all outputs produced for a given generic function).So I built a simple tool that reads the json output of cargo bloat, combines functions with the same name, and sorts by total size.
I dumped the cargo bloat results for
3d_scene
using:Then ran those results through my tool linked above.
Here are the top results (sorted smallest to largest) for
3d_scene
. The tuple is(name, total size in bytes, percentage of .text section)
. Note that this methodology will also combine results across trait impls, so for each case, consider if this is combining results across trait impls or generic function impls (ex:PartialReflect::apply
is combined across all trait impls,EntityWorldMut::insert_with_caller
is combined across all generic function impls). The total size of the3d_scene
release binary (not stripped) is82.2 MiB
. This size of the.text
section (the "actual binary code") is32.5 MiB
:Top 25-ish Combined Results For 3d_scene
From there, I chose to try naively optimizing
EntityWorldMut::insert_with_caller
by factoring out the relevant parts that aren't generic. This is almost certainly not viable, given that it relies on allocating a Box / using dynamic dispatch, but it does help make a point. This saved0.5 MiB
in a release build (which is interestingly slightly larger than the reported0.373 MiB
forEntityWorldMut::insert_with_caller
). I could not detect meaningful compile time differences. Is suspect the win is there, but it is small enough that it is covered by noise for small sample sizes.Naive "hoisted" insert optimization
Now obviously saving
0.6%
binary size isn't a drastic win, but this does show that we have the ability to optimize linear-scaling / highly trafficked apis.This is also a "death by 1000 papercuts" situation where we won't see big progress unless we optimize many cases. I suspect there is enough room here to make meaningful progress.
We also need to make sure that we aren't hamstringing performance when we do this. My naive
EntityWorldMut::insert_with_caller
change would show up on benchmarks.Beta Was this translation helpful? Give feedback.
All reactions