-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suffix in pathlib is not behaving like a file extension #121347
Comments
How are you defining "file extension"? |
Common denominator I see is: "The file extension defines what kind of file it is." Some definitions I have found:
|
That's only true on Windows. On other operating systems, file extensions are an indicator and nothing more. I can rename an Microsoft's definition ("three- or four-character extension") excludes some valid extensions like |
The point I'm getting at is that there is no standard definition of a file extension! Since #82805 was solved, pathlib's suffix splitting works exactly like |
I get what you are saying. But even on Linux, that will depend on the implementation. E.g. As there is no standard definition, I'd suggest to either define it, or avoid using the term all together and use the implementation as definition, e.g. last segment separated by a dot. Even though it's not authoritatively defined, I'd argue, the above |
The Windows shell API currently supports permanently associating a programmatic identifier (ProgID) with any file extension that does not include white space characters and that has a length from 1 to 198 characters (not including the dot). Thus the API supports ".a", ".so", and ".patch" as normal file extensions. If a file has no extension, or if the extension is longer than 198 characters or contains white space characters, then the API displays an open-with dialog that allows opening the file with an application just once instead of setting a permanent association. |
Hum, that does give some legitimacy to the idea of forbidding whitespace in file extensions in pathlib. |
I think ideal behavior would be, that if you are trying to create a suffix with whitespace, throw an error. If you are trying to access a suffix, if there is whitespace, it's not a suffix but just a regular part of the filename. |
The existing behaviour has some advantages:
So I'm weakly -1 on this idea, but happy to hear opinions from others. |
I respectfully disagree. I'd argue based on the But yes, as an experienced developer you and I know about that pitfall, so we can account for it. Probably avoid using As with from pathlib import Path
files = [
Path("Mr. Smith resume for review v1"),
Path("Mr. Smith resume for review v2"),
Path("Mr. Smith resume for review v3"),
]
for path in files:
path.rename(path.with_suffix(".pdf")) Now you'll have deleted/overwritten v1 and v2 and only have v3 that is now called But I just checked, it looks like the docs were updated in the mean time: #106650. So that wording is much better and describes the behavior more accurately. |
Bug report
Bug description:
According to the docs here a suffix is defined as:
But pathlib doesn't behave as expected, illustrating that on Python 3.12.4:
This is particularly problematic for methods like
with_suffix
. According to the docs here that should:But as established above, that results in:
I've characterized that as a bug as that is not working as described, but could also be a matter of documentation improvement.
I see a few ways:
add_suffix
, or an argument towith_suffix
, but I know that has been discussed before and was ultimately decided against. Also this is a matter of how suffix is defined, not how it's processed.CPython versions tested on:
3.12
Operating systems tested on:
Linux, Windows
The text was updated successfully, but these errors were encountered: