Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-125413: Add pathlib.Path.dir_entry attribute #125419

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Oct 13, 2024

Add a Path.dir_entry attribute. In any path object generated by Path.iterdir(), it stores an os.DirEntry object corresponding to the path; in other cases it is None.

This can be used to retrieve the file type and attributes of directory children without necessarily incurring further system calls.

Under the hood, we use dir_entry in our implementations of PathBase.glob(), PathBase.walk() and PathBase.copy(), the last of which also provides the implementation of Path.copy(), resulting in a modest speedup when copying local directory trees.


📚 Documentation preview 📚: https://cpython-previews--125419.org.readthedocs.build/

Add a `Path.dir_entry` attribute. In any path object generated by
`Path.iterdir()`, it stores an `os.DirEntry` object corresponding to the
path; in other cases it is `None`.

This can be used to retrieve the file type and attributes of directory
children without necessarily incurring further system calls.

Under the hood, we use `dir_entry` in our implementations of
`PathBase.glob()`, `PathBase.walk()` and `PathBase.copy()`, the last of
which also provides the implementation of `Path.copy()`, resulting in a
modest speedup when copying local directory trees.
@barneygale
Copy link
Contributor Author

Copying is a little faster:

$ ./python -m timeit -s "from pathlib import Path" "Path('Doc').copy('Doc2', dirs_exist_ok=True, preserve_metadata=True)"
5 loops, best of 5: 70.7 msec per loop  # before
5 loops, best of 5: 68.7 msec per loop  # after

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll review tests when I'm not sleepy.

Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code that accesses dir_entry is explicitly saying "potentially stale values are OK", so what if we defined it as being lazily populated rather than as it being None if not set externally before being accessed?

This would have the added benefit that the required-for-technical-reasons slot on PurePathBase would be called _dir_entry, and we could define the public read-only property on PathBase like:

@property
def dir_entry(self):
    if self._dir_entry is not None:
        return self._dir_entry
    self.dir_entry = dir_entry = os.DirEntry.from_path(self)
    return dir_entry

It would need a new helper in os.DirEntry that accepted an os.PathLike parameter and creating a populated directory entry instance for it, but that seems like a potentially useful feature anyway.

@bedevere-app
Copy link

bedevere-app bot commented Oct 23, 2024

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants