Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix directory targets with empty subdirs #11226

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ElectreAAS
Copy link
Collaborator

@ElectreAAS ElectreAAS commented Dec 18, 2024

Incorporates and replaces #11213, replaces #10931 and #11203, fixes #10609, #11117, #11214.
Work mostly done by myself, with various helpful insights by @maiste and @art-w.

The core of the changes are in dune_targets.ml, where I changed the representation of targets (Targets.Produced.t) from a flat structure to a hierarchical one, mirroring the file system. This makes all traversals easier to reason about, and prevents problems of hitting dirA/dirB/fileB before dirA/fileA, along with the intended feature of actually processing directories properly.

Other noteworthy changes are:

  • Local.Target.create now also creates directories, not just files
  • Build_system.file_exists now also checks for directories, not just files
  • Dune_cache_storage.Artifacts.Metadata_entry.{to,of}_sexp have slightly changed semantics, I added the possibilty to have no digest but instead a simple "<dir>" to indicate a directory. I wouldn't be surprised to hear this has long ramifications. Same concern for Targets.Produced.{digest,to_dyn}.

I'll add for the record that a lot of functions named do_something_file also work on directories, but not all, and that is rather confusing. If I'm not the only one with this sentiment I could do a refactor to make this clear everywhere.

@ElectreAAS ElectreAAS changed the title WIP: directory targets with empty subdirs WIP: fix directory targets with empty subdirs Dec 18, 2024
@ElectreAAS
Copy link
Collaborator Author

ElectreAAS commented Dec 21, 2024

Below are the initial text of the PR, and a comment I made after working on this for a week. They are now outdated, I'm just keeping them here for posterity.

This PR is related to... a lot of things actually.

For some context, let's take this test file from #11116 & #11117:
Why isn't the empty directory output/child restored from cache like output/file?

I'm not really sure, and the original author of this piece of code (rleshchinskiy) doesn't seem very active in this project at this time.
What I am sure of is that breaking the logic of collect is easy to do accidentally, but that at least fixes one problem.

Debugging this is tricky, as a lot of tests fail with a cryptic

Error: Is a directory
-> required by _build/default/a
[1]

I'll note however that the problems likely originates in either #9407, #9470, or #9535.


Belated update before the holidays:
I changed the internal representation of targets (Targets.Produced.t) to be hierarchical - it felt like the right move to detect broken logic, and was mentioned in the comments as something to be done at some point.
From that change I saw that the 'empty-dir' test passed, but a whole lot of other tests didn't. I thought I could work backwards from that to a point where all would pass, but the explanation was harsh: I had broken the caching mechanism, and the only reason the 'empty-dir' test passed was because nothing was cached and everything was rerun 🤦.
When I finally fixed everything else, obviously the 'empty-dir' test didn't pass (the cache worked like it did before my changes, wow!).
At this point in time I have pushed my changes, and they don't break anything, but also don't fix anything...
My hunch now is that this traversal only stores files as metadata, which means that this entries list only contains metadata about files, which means they're the only things getting restored. I tried making a of_files_plus_dirs to be used at that call site, but couldn't get it to work so far. Also I don't know how to generate digests of empty directories to actually store them in metadata...

Hope this progress report isn't too arcanic for reviewers 🙂

@maiste maiste added the shared-cache Shared artefacts cache label Dec 23, 2024
@ElectreAAS ElectreAAS force-pushed the cache-empty-dir branch 2 times, most recently from ed32398 to c181da2 Compare December 24, 2024 14:40
@ElectreAAS ElectreAAS force-pushed the cache-empty-dir branch 7 times, most recently from f720346 to 9852aad Compare January 10, 2025 15:59
@ElectreAAS ElectreAAS marked this pull request as ready for review January 10, 2025 16:05
@ElectreAAS ElectreAAS changed the title WIP: fix directory targets with empty subdirs Fix directory targets with empty subdirs Jan 10, 2025
Signed-off-by: Ambre Austen Suhamy <[email protected]>
Signed-off-by: Ambre Austen Suhamy <[email protected]>
Signed-off-by: Ambre Austen Suhamy <[email protected]>
Signed-off-by: Ambre Austen Suhamy <[email protected]>
Signed-off-by: Ambre Austen Suhamy <[email protected]>
Signed-off-by: Ambre Austen Suhamy <[email protected]>
Signed-off-by: Ambre Austen Suhamy <[email protected]>
Copy link
Collaborator

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot I had some review half-started so let me just submit what's there.

@@ -856,9 +857,12 @@ end = struct
rleshchinskiy: Is this digest ever used? [build_dir] discards it and do we
(or should we) ever use [build_file] to build directories? Perhaps this could
be split in two memo tables, one for files and one for directories. *)
(* ElectreAAS: Tentative answer to above comments: a lot of functions are called
[build_file] or [create_file] even though they also handle directories.
Also yes this digest is used by [Exported.build_dep] defined above. *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe while at it, you could rename it to be something neutral? If something is called "file" I would not expect it to be a directory and it something is called a "directory" I would not expect it to be a file.

Maybe "directory entry"/"dirent"? Or "inode", though that might imply a bit too much. Or something else.

@@ -2,17 +2,20 @@

open Import

(** Build a file. *)
(** Build a target, which may be a file or a directory. *)
val build_file : Path.t -> unit Memo.t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest renaming this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that's what I did initially, but this functions is used in a whole lot of places, and that made the diff rather heavy.
Since this is unrelated to this PR, maybe I'll put this in another one? Or do you think this is relevant?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid renaming things, as this will complicate upstreaming of our internal changes.

Note that build_file and build_dir functions in our internal code base do indeed build files and directories, respectively, so there won't be any such confusion.

(** Build a file and read its contents with [f]. The execution of [f] is not memoized, so
call sites should be careful to avoid duplicating [f]'s work. *)
val with_file : Path.t -> f:(Path.t -> 'a) -> 'a Memo.t

(** Build a file and read its contents. Like [with_file ~f:Io.read_file] but memoized. *)
val read_file : Path.t -> string Memo.t

(** Return [true] if a file exists or is buildable *)
(** Return [true] if a file or directory exists or is buildable. *)
val file_exists : Path.t -> bool Memo.t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also renaming this. It would be very strange if I can call file_exists, get true and then call read_file and it fails since it is not a file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. My initial rename was path_exists since that's the entry type. read_path would also be nice but obviously more involved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug engine shared-cache Shared artefacts cache
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[regression] promotion of directory targets has trouble with directories starting in @
4 participants