Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve walkdir docstring #55476

Merged
merged 3 commits into from
Aug 19, 2024
Merged

Improve walkdir docstring #55476

merged 3 commits into from
Aug 19, 2024

Conversation

nathanrboyer
Copy link
Contributor

I was not able to understand how to use walkdir with the current docstring. It was not clear to me that root changes each iteration. I thought root would stay fixed to the input and dirs would iterate.

I also think the second example could use more explanation since I still do not understand it as I explain in this discourse post:
https://discourse.julialang.org/t/find-all-files-named-findthis-csv-in-nested-subfolders-of-rootfolder/118096. Hopefully a reviewer can help with that.

@giordano giordano added docs This change adds or pertains to documentation filesystem Underlying file system and functions that use it labels Aug 12, 2024
@IanButterworth
Copy link
Member

I was also confused by this. Is root the standard name for this? Maybe just dir would be a better variable name?

@nathanrboyer
Copy link
Contributor Author

Yeah root seems like a bad name based on Wikipedia:

"Every finite tree structure has a member that has no superior. This member is called the "root" or root node."

I think dir would be even more confusing though since that name is already used for iterating through dirs. What about path?

for (path, dirs, files) in walkdir(".")
    println("Directories in $path")
    for dir in dirs
        println(joinpath(path, dir)) # path to directories
    end
    println("Files in $path")
    for file in files
        println(joinpath(path, file)) # path to files
    end
end

@IanButterworth
Copy link
Member

Seems like an improvement already but what about dirpath?

@nathanrboyer
Copy link
Contributor Author

nathanrboyer commented Aug 16, 2024

To clarify that the path will point to a directory? Not sure I like that much.


The input variable may also need some attention I guess since it is also currently called dir, and you wouldn't want to write for (dir, dirs, files) in walkdir(dir) and have each dir mean something different. These names each need to make sense relative to each other. I try to organize my thoughts on each below with some name options.

Input:
I'm tempted to call this root since the iterator will never reach above it, but walkdir(dir) jibes so well I think it has to stay dir or similar. This directory, being an input, is fixed to one value.

Options: dir, topdir, rootdir

First Output:
This is the main iterator. It is a string of the path to the current directory in the file tree during the iteration.
Issues with the current root name:

  1. it is not clear that this value descends and changes
  2. it is not clear that this is the main "walking" value and the only full path

Options: path, dirpath, currentdir, node, location, currentloc

Second Output:
This is a vector of directories directly inside the first output location and no deeper.
The difficulty here is in keeping clear how different this value is from the input and first output even though all are directories:

  1. These are only the names of directories: not a full path.
  2. This is a collection of directories passed together each iteration.
  3. It is not all subdirectories of the input; it is only the ones one level deeper than the current first output.

Options: dirs, subdirs, dirnames, <first output>_dirs

Third Output:
This is a vector of files directly inside the first output location and not deeper. Considerations are basically the same as those for the second output, and its name should match the convention of the second output.

Options: files, subfiles, filenames, <first output>_files

All Together:

My pick right now would probably be for (currentdir, subdirs, subfiles) in walkdir(dir) or for (path, dirnames, filesnames) in walkdir(dir), keeping in mind that these names are mostly just for explanation in the docstring. Actual users will be free to write their own shorter LHS variables. (For both the second and third outputs, I would name a nested loop variable, iterating through the vector, the singular version of the vector variable in the same way it is now e.g. for subdir in subdirs or for dirname in dirnames.)

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(path, dirs, files) LGTM* I'd be happy to merge as is (pending the Channel comment above).

Two reservations I have about (currentdir, subdirs, subfiles) are

  1. subfiles is not a common term
  2. currentdir is two separate words that should be joined with an underscore, not concatenated.

*Is a clear and strict improvement on what we have in master now

base/file.jl Outdated
The directory tree can be traversed top-down or bottom-up.
If `walkdir` or `stat` encounters a `IOError` it will rethrow the error by default.
A custom error handling function can be provided through `onerror` keyword argument.
`onerror` is called with a `IOError` as argument.
The returned iterator is implemented as a [`Channel`](@ref).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The returned iterator is implemented as a [`Channel`](@ref).

This sounds like an implementation detail. By specifying it, we commit to not changing it. We should only do this if there is a reason to guarantee we will continue using Channels to support this function.

Copy link
Contributor Author

@nathanrboyer nathanrboyer Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another Discourse user recommended I add that, but I am happy to get rid of it. I imagine I should then remove the output from the Example that mentions Channel too. (The output was suppressed before.)

I do want to add some sort of note that explains why first(itr) returns a different result every time since that is provided as example usage and is unintuitive to me, but I don't understand it myself. "Because it's a Channel, add that" is the previous advice I received.

@LilithHafner
Copy link
Member

LGTM. Merging now, willing to review further improvements in subsequent PRs if desired.

@LilithHafner LilithHafner merged commit bec4702 into JuliaLang:master Aug 19, 2024
5 of 7 checks passed
@nathanrboyer
Copy link
Contributor Author

Follow up here: #55541

LilithHafner added a commit that referenced this pull request Aug 20, 2024
This is a follow-up documentation PR to #55476. I believe the second
example in the `walkdir` docstring is still unintuitive since the result
changes each time. I attempted a simple explanation, but I don't really
know what I'm talking about. Hopefully someone else can explain what is
happening better. Some additional discussion in [this Discourse
post](https://discourse.julialang.org/t/find-all-files-named-findthis-csv-in-nested-subfolders-of-rootfolder/118096).

---------

Co-authored-by: Lilith Orion Hafner <[email protected]>
KristofferC pushed a commit that referenced this pull request Sep 12, 2024
I was not able to understand how to use `walkdir` with the current
docstring. It was not clear to me that `root` changes each iteration. I
thought `root` would stay fixed to the input and `dirs` would iterate.
KristofferC pushed a commit that referenced this pull request Sep 12, 2024
This is a follow-up documentation PR to #55476. I believe the second
example in the `walkdir` docstring is still unintuitive since the result
changes each time. I attempted a simple explanation, but I don't really
know what I'm talking about. Hopefully someone else can explain what is
happening better. Some additional discussion in [this Discourse
post](https://discourse.julialang.org/t/find-all-files-named-findthis-csv-in-nested-subfolders-of-rootfolder/118096).

---------

Co-authored-by: Lilith Orion Hafner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation filesystem Underlying file system and functions that use it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants