-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Clarify usage of include_package_data/package_data/exclude_package_data
on package data files
#4643
Conversation
include-package-data
on inclusion of package data files
docs/userguide/datafiles.rst
Outdated
@@ -35,16 +37,34 @@ For example, if the package tree looks like this:: | |||
├── data1.txt | |||
└── data2.txt | |||
|
|||
and you supply this configuration: | |||
When at least one of the following conditions are met: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The include_package_data
keyword would only have an effect when condition is met, perhaps it's good to state these upfront?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your contribution!
docs/userguide/datafiles.rst
Outdated
Old packaging installation methods in the Python ecosystem have | ||
traditionally allowed the inclusion of "data files" (files beyond | ||
:ref:`the default set <manifest>` ), which are placed in a platform-specific | ||
location. However, the most common use case for data files distributed | ||
with a package is for use *by* the package, usually by including the | ||
data files **inside the package directory**. | ||
|
||
``Setuptools`` focuses on this most common type of data files and offers three ways |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion. I understand that your main objective was to clarify what “data files” are.
While this intention is very appreciated and welcome, the proposed parenthesis seem to suggest: data file = file not included in the "default set". This could lead to confusion, as what defines a data file is not fundamentally related to whether a file is included in the default set or not. For example, we could change setuptools to start including .json files automatically, but that would not make them more or less “data file”-y.
Could you please have a look at the following suggestion (which adds a new paragraph before the original text)? Would this address your concerns?
Old packaging installation methods in the Python ecosystem have | |
traditionally allowed the inclusion of "data files" (files beyond | |
:ref:`the default set <manifest>` ), which are placed in a platform-specific | |
location. However, the most common use case for data files distributed | |
with a package is for use *by* the package, usually by including the | |
data files **inside the package directory**. | |
``Setuptools`` focuses on this most common type of data files and offers three ways | |
In the Python ecosystem, the term "data files" is used in various complex scenarios | |
and can have nuanced meanings. | |
For the purposes of this documentation, we define "data files" as non-Python files | |
that are installed alongside Python modules and packages on the user's machine | |
when they install a :term:`distribution <Distribution Package>` from PyPI | |
or via a ``.whl`` file. | |
These files are typically intended for use at runtime by the package itself or | |
to influence the behavior of other packages or systems. | |
They may also be referred to as "resource files." | |
Old packaging installation methods in the Python ecosystem | |
have traditionally allowed installation of "data files", which | |
are placed in a platform-specific location. However, the most common use case | |
for data files distributed with a package is for use *by* the package, usually | |
by including the data files **inside the package directory**. | |
Setuptools focuses on this most common type of data files and offers three ways |
Does this look good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion. I understand that your main objective was to clarify what “data files” are.
Yes exactly. Because this is the starting sentence of the Data Files Support
section, and we have to make sure users understand what is considered "data file" in the first place, such that they could decide whether they even need these keywords at all.
For example C source code/READMEs are not "Python" files, but we don't need to declare them to include them into the package. And therefore we might need to clarify the following definition of "data files":
we define "data files" as non-Python files
the proposed parenthesis seem to suggest: data file = file not included in the "default set". This could lead to confusion, as what defines a data file is not fundamentally related to whether a file is included in the default set or not.
Thanks a ton for the clarification! Yes my sketch indeed would need more discussion and check.
They may also be referred to as "resource files."
Personally I think the following terminologies are already very confusing: "package data", "data file", "source distribution", "files inside the wheel", and their difference is pretty unclear, and can we somehow define/clarify them, and perhaps avoid introducing more synonyms ("resource files" in this case)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problems with omitting resource files
, I just mentioned before because other packages started calling them that way (e.g. importlib_resources
). Please feel free to use and modify my suggestion to align with your vision in the contribution.
For example C source code/READMEs are not "Python" files, but we don't need to declare them to include them into the package.
Yes, in this case C/README files are not data files, because they don't fit in the remaining part of the suggested definition:
... that are installed alongside Python modules and packages on the user's machine
when they install a :term:`distribution <Distribution Package>` from PyPI
or via a ``.whl`` file.
These files are typically intended for use at runtime by the package itself or
to influence the behavior of other packages or systems.
C-files are not installed along side Python files in the user's machine and also are not intended to be used in runtime, right?
I just noticed this from official Python docs:
MANIFEST.in does not affect binary distributions such as wheels.
That is not 100% precise is it? MANIFEST.in
determines what goes into the sdist
, and then the contents of the sdist
influence what goes into the wheel (specially when include-package-data = true
which is the default)... So there is a potential indirect effect there. That is because the build process works more or less like in the mermaidjs diagram below1:
graph LR
src[fa:fa-laptop-code source code] --> b1(fa:fa-hammer)
subgraph build process
b1 --> sdist(fa:fa-file-code sdist) --> b2(fa:fa-hammer)
b2 --> wheel(fa:fa-box wheel)
build([build dependencies]) --> b1
build([build dependencies]) --> b2
end
subgraph installation process
wheel --> pkg
pkg(fa:fa-box-open) --> inst[fa:fa-python installed packages]
runtime([runtime dependencies]) --> inst
end
classDef deps fill:#2aa198,stroke:#333
classDef env fill:#6c71c4,stroke:#333
build:::deps
runtime:::deps
Footnotes
-
The "build process" creates distribution artifacts/packages.
sdist
distributions are meant to be platform independent (but may contain varying levels of optimisation - e.g. compiling Python to C via Cython).wheel
distributions contain the final files meant to be directly copied to the user's machine, and therefore may be platform specific. ↩
ae0bde3
to
515c698
Compare
515c698
to
3a84300
Compare
I just noticed this from official Python docs:
|
docs/userguide/datafiles.rst
Outdated
For the purposes of this documentation, we define "data files" as non-Python files | ||
that are installed alongside Python modules and packages on the user's machine | ||
when they install a :term:`distribution <Distribution Package>` from PyPI | ||
or via a ``.whl`` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would tag and explain something I want to change (I find myself lose track of what I was originally intended if I overthink), please free feel to comment.
I believe the source (PyPI vs file) is not important here, but whether we're installing from a binary distribution (.whl
) or source distribution that matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are probably better off if we don't define data files in terms of source distributions or source trees... Because then we enter in muddy waters (as source distributions and source tree may have any kinds of files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are probably better off if we don't define data files in terms of source distributions or source trees
Sorry it looks like I might have caused confusion here. I was suggesting:
- When they install a distribution from PyPI or via a `.whl` file
+ When they install a distribution from source distribution or via a `.whl` file
Because install from PyPI
is not important here, because by install from PyPI
, we could install both from .whl
file (if wheel is built for that particular platform) or source distribution. I hope I didn't misunderstand anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the former phrase is better for the definition than the second. Alternatively only the following is also fine:
- When they install a distribution from source distribution or via a `.whl` file
+ when they install a distribution via a `.whl` file
I believe We should not define define data files as per source distribution
. A Python package always need to be transformed into a wheel, so the files inside of the wheel is what actually matters.
In the previous version install from PyPI
is being used as an equivalent of pip install package
, which in the end of the day downloads (preferentially) a .whl
file.
@abravalheri I believe I have included everything I want to include, perhaps it's time to polish the language and check the correctness and finalize this PR? Some questions unrelated to this PR:
|
include-package-data
on inclusion of package data filesinclude-package-data/package_data/exclude_package_data
on package data files
include-package-data/package_data/exclude_package_data
on package data filesinclude_package_data/package_data/exclude_package_data
on package data files
Thank you very much |
No problem, thanks a ton for guiding me through all these changes :) I learned a lot! |
I believe that this one should be OK, spaces are valid characters in Linus files, also if there are links to this URL we should avoid changing it to avoid invalidating the links.
There is not much we can do about this one, because the definitions are maintained by build. The rest should be fixed in #4661. |
Cool! Thanks for letting me know. You're an amazing reviewer, thank you! |
Summary of changes
include_package_data/package_data/exclude_package_data
inpyproject.toml
on inclusion of package data files, as a follow up of [BUG] Selection of package data in sdist and wheels withinclude-package-data
keyword #4638include-package-data
keyword #4638 (comment)