Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor performance improvements #110

Merged
merged 4 commits into from
Jan 25, 2024
Merged

Conversation

Colecf
Copy link
Contributor

@Colecf Colecf commented Jan 22, 2024

  • Pass owned strings to id_from_canonical instead of references
  • Precalculate the length of evaluated strings, and reserve space in the result string
  • Only hash strings once in id_from_canonical
  • Switch some HashMaps to FxHashMaps

These changes give about a 20% performance improvement to bench_load_synthetic, and save a couple seconds off of loading the android ninja files. Most of this is from the switch to FxHashMap, the other changes are only around a 5% improvement.

If you have concerns about any of these commits let me know and I'll take those ones out of the PR.

id_from_canonical ideally takes owned strings
instead of references to avoid a copy.
src/eval.rs Outdated
let mut result = 0;
for (i, env) in envs.iter().enumerate() {
if let Some(v) = env.get_var(v.as_ref()) {
result = v.calc_evaluated_length(&envs[i + 1..]);
Copy link
Owner

@evmar evmar Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better as return v.calc_eval... with a 0 return at the end of the loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

/// need to create a new id, but would also be possible to create a version
/// of this function that accepts string references that is more optimized
/// for the case where the entry already exists. But so far, all of our
/// usages of this function have an owned string easily accessible anyways.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case where this might come up is when loading .n2_db, where the string paths are stored canonicalized so we know they don't need any mutation as we parse them. But I think that's only worth really thinking about if/when it comes time to speed up that codepath. (This comment is just refreshing my memory on this.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, though currently read_str in db.rs reads owned strings. Using references would also be a tradeoff that requires keeping the db in memory, but we could do it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it works is the db is read fully at startup and all the paths are mapped to ids as they're read, so it doesn't really need an owned string in there. But I think it's also probably not the slow part, yet...

@@ -76,10 +65,19 @@ impl Loader {
loader
}

/// Convert a path string to a FileId. For performance reasons
/// this requires an owned 'path' param.
fn path(&mut self, mut path: String) -> FileId {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad, I went to all the effort to make this area reuse a single String buffer, but I understand why it must be done. RIP

/// for the case where the entry already exists. But so far, all of our
/// usages of this function have an owned string easily accessible anyways.
pub fn id_from_canonical(&mut self, file: String) -> FileId {
// TODO: so many string copies :<
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW the comment here was a reminder to myself around my struggle to not have so many string copies here. The hashmap is mapping string -> File, and file has a .name field which is a string, so in theory you don't need a second copy of that string as the hashmap key. But I couldn't get all the lifetimes to work when I tried that. Fixing that would mean we no longer need two copies of every path string and would also save a lot of allocations on the load path...

So that we can do one memory allocation for them.
Use HashMap.entry() instead of a lookup + insert.
FxHashMap has a faster hashing algorithm,
at the expense of not being resistent to DOS
attacks.
@evmar evmar merged commit 668d9ab into evmar:main Jan 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants