Skip to content

Commit

Permalink
feat: optimize reading a workspace's files (#6281)
Browse files Browse the repository at this point in the history
# Description

## Problem

Reading all workspace files is slow.

## Summary

**TL;DR: reading all `.nr` files in the noir-circuits project now takes
10ms instead of 1.5 seconds.**

Using `flamegraph` I noticed reading a workspace files was slow. For
example in `noir-contracts` it was taking 1.5 seconds to do that.

The first thing in this PR is using the `walkdir` crate which brings two
benefits:
- It reduces the time to 1.3 seconds (not much, but it's something)
- It removes the need for us to reproduce that code and maintain it (we
had tests for walking a dir)

Still, 1.3 seconds seemed like too slow. At first I searched if there
was alternatives to `std::fs::read_string` but it seems that's the most
efficient thing to use. So I created a separate project that essentially
did `walkdir` on `noir-contracts`, reading all files and it took around
10ms (!!). So there must be something else going on...
 
It turns out, if we have a workspace with projects A, B and C, with both
A and B depending on C, C's files were read twice. So the next thing I
did was to avoid reading a file's source if it was already in the file
manager. That reduces the time ro 340ms. But the separate project was
taking 10ms to do the same! There's still something else...

Well, we still walked C's directories twice (in the above example). So
the next thing is to keep track of which packages we already added files
for, and not traverse those dirs more than once. With that, now adding
all the files for noir-contracts takes around 10ms.

(the difference is so big because in noir-contracts we have many
projects depending on many others, resulting in some project files being
read like, say, 10 times instead of 1)

## Additional Context



## Documentation

Check one:
- [x] No documentation needed.
- [ ] Documentation included in this PR.
- [ ] **[For Experimental Features]** Documentation to be submitted in a
separate PR.

# PR Checklist

- [x] I have tested the changes locally.
- [x] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.
  • Loading branch information
asterite authored Oct 11, 2024
1 parent 87137d8 commit b54ed26
Show file tree
Hide file tree
Showing 4 changed files with 56 additions and 94 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions compiler/fm/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,11 @@ impl FileManager {
self.id_to_path.get(&file_id).map(|path| path.as_path())
}

pub fn has_file(&self, file_name: &Path) -> bool {
let file_name = self.root.join(file_name);
self.name_to_id(file_name).is_some()
}

// TODO: This should accept a &Path instead of a PathBuf
pub fn name_to_id(&self, file_name: PathBuf) -> Option<FileId> {
self.file_map.get_file_id(&PathString::from_path(file_name))
Expand Down
4 changes: 1 addition & 3 deletions tooling/nargo/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,13 @@ rayon.workspace = true
jsonrpc.workspace = true
rand.workspace = true
serde.workspace = true
walkdir = "2.5.0"

[target.'cfg(not(target_arch = "wasm32"))'.dependencies]
noir_fuzzer.workspace = true
proptest.workspace = true

[dev-dependencies]
# TODO: This dependency is used to generate unit tests for `get_all_paths_in_dir`
# TODO: once that method is moved to nargo_cli, we can move this dependency to nargo_cli
tempfile.workspace = true
jsonrpc-http-server = "18.0"
jsonrpc-core-client = "18.0"
jsonrpc-derive = "18.0"
Expand Down
139 changes: 49 additions & 90 deletions tooling/nargo/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,10 @@ pub mod ops;
pub mod package;
pub mod workspace;

use std::collections::{BTreeMap, HashMap};
use std::{
collections::{BTreeMap, HashMap, HashSet},
path::PathBuf,
};

use fm::{FileManager, FILE_EXTENSION};
use noirc_driver::{add_dep, prepare_crate, prepare_dependency};
Expand All @@ -23,6 +26,7 @@ use noirc_frontend::{
};
use package::{Dependency, Package};
use rayon::prelude::*;
use walkdir::WalkDir;

pub use self::errors::NargoError;

Expand Down Expand Up @@ -58,8 +62,14 @@ pub fn insert_all_files_for_workspace_into_file_manager_with_overrides(
file_manager: &mut FileManager,
overrides: &HashMap<&std::path::Path, &str>,
) {
let mut processed_entry_paths = HashSet::new();
for package in workspace.clone().into_iter() {
insert_all_files_for_package_into_file_manager(package, file_manager, overrides);
insert_all_files_for_package_into_file_manager(
package,
file_manager,
overrides,
&mut processed_entry_paths,
);
}
}
// We will pre-populate the file manager with all the files in the package
Expand All @@ -72,44 +82,74 @@ fn insert_all_files_for_package_into_file_manager(
package: &Package,
file_manager: &mut FileManager,
overrides: &HashMap<&std::path::Path, &str>,
processed_entry_paths: &mut HashSet<PathBuf>,
) {
if processed_entry_paths.contains(&package.entry_path) {
return;
}
processed_entry_paths.insert(package.entry_path.clone());

// Start off at the entry path and read all files in the parent directory.
let entry_path_parent = package
.entry_path
.parent()
.unwrap_or_else(|| panic!("The entry path is expected to be a single file within a directory and so should have a parent {:?}", package.entry_path));

// Get all files in the package and add them to the file manager
let paths = get_all_noir_source_in_dir(entry_path_parent)
.expect("could not get all paths in the package");
for path in paths {
for entry in WalkDir::new(entry_path_parent) {
let Ok(entry) = entry else {
continue;
};

if !entry.file_type().is_file() {
continue;
}

if !entry.path().extension().map_or(false, |ext| ext == FILE_EXTENSION) {
continue;
};

let path = entry.into_path();

// Avoid reading the source if the file is already there
if file_manager.has_file(&path) {
continue;
}

let source = if let Some(src) = overrides.get(path.as_path()) {
src.to_string()
} else {
std::fs::read_to_string(path.as_path())
.unwrap_or_else(|_| panic!("could not read file {:?} into string", path))
};

file_manager.add_file_with_source(path.as_path(), source);
}

insert_all_files_for_packages_dependencies_into_file_manager(package, file_manager);
insert_all_files_for_packages_dependencies_into_file_manager(
package,
file_manager,
overrides,
processed_entry_paths,
);
}

// Inserts all files for the dependencies of the package into the file manager
// too
fn insert_all_files_for_packages_dependencies_into_file_manager(
package: &Package,
file_manager: &mut FileManager,
overrides: &HashMap<&std::path::Path, &str>,
processed_entry_paths: &mut HashSet<PathBuf>,
) {
for (_, dep) in package.dependencies.iter() {
match dep {
Dependency::Local { package } | Dependency::Remote { package } => {
insert_all_files_for_package_into_file_manager(
package,
file_manager,
&HashMap::new(),
overrides,
processed_entry_paths,
);
insert_all_files_for_packages_dependencies_into_file_manager(package, file_manager);
}
}
}
Expand Down Expand Up @@ -143,84 +183,3 @@ pub fn prepare_package<'file_manager, 'parsed_files>(

(context, crate_id)
}

// Get all Noir source files in the directory and subdirectories.
//
// Panics: If the path is not a path to a directory.
fn get_all_noir_source_in_dir(dir: &std::path::Path) -> std::io::Result<Vec<std::path::PathBuf>> {
get_all_paths_in_dir(dir, |path| {
path.extension().map_or(false, |extension| extension == FILE_EXTENSION)
})
}

// Get all paths in the directory and subdirectories.
//
// Panics: If the path is not a path to a directory.
//
// TODO: Along with prepare_package, this function is an abstraction leak
// TODO: given that this crate should not know about the file manager.
// TODO: We can clean this up in a future refactor
fn get_all_paths_in_dir(
dir: &std::path::Path,
predicate: fn(&std::path::Path) -> bool,
) -> std::io::Result<Vec<std::path::PathBuf>> {
assert!(dir.is_dir(), "directory {dir:?} is not a path to a directory");

let mut paths = Vec::new();

if dir.is_dir() {
for entry in std::fs::read_dir(dir)? {
let entry = entry?;
let path = entry.path();
if path.is_dir() {
let mut sub_paths = get_all_paths_in_dir(&path, predicate)?;
paths.append(&mut sub_paths);
} else if predicate(&path) {
paths.push(path);
}
}
}

Ok(paths)
}

#[cfg(test)]
mod tests {
use crate::get_all_paths_in_dir;
use std::{
fs::{self, File},
path::Path,
};
use tempfile::tempdir;

fn create_test_dir_structure(temp_dir: &Path) -> std::io::Result<()> {
fs::create_dir(temp_dir.join("sub_dir1"))?;
File::create(temp_dir.join("sub_dir1/file1.txt"))?;
fs::create_dir(temp_dir.join("sub_dir2"))?;
File::create(temp_dir.join("sub_dir2/file2.txt"))?;
File::create(temp_dir.join("file3.txt"))?;
Ok(())
}

#[test]
fn test_get_all_paths_in_dir() {
let temp_dir = tempdir().expect("could not create a temporary directory");
create_test_dir_structure(temp_dir.path())
.expect("could not create test directory structure");

let paths = get_all_paths_in_dir(temp_dir.path(), |_| true)
.expect("could not get all paths in the test directory");

// This should be the paths to all of the files in the directory and the subdirectory
let expected_paths = vec![
temp_dir.path().join("file3.txt"),
temp_dir.path().join("sub_dir1/file1.txt"),
temp_dir.path().join("sub_dir2/file2.txt"),
];

assert_eq!(paths.len(), expected_paths.len());
for path in expected_paths {
assert!(paths.contains(&path));
}
}
}

0 comments on commit b54ed26

Please sign in to comment.