forked from python/peps
-
Notifications
You must be signed in to change notification settings - Fork 0
/
pep-0262.txt
347 lines (267 loc) · 12.2 KB
/
pep-0262.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
PEP: 262
Title: A Database of Installed Python Packages
Version: $Revision$
Last-Modified: $Date$
Author: A.M. Kuchling <[email protected]>
Status: Deferred
Type: Standards Track
Content-Type: text/x-rst
Created: 08-Jul-2001
Post-History: 27-Mar-2002
Introduction
============
This PEP describes a format for a database of the Python software
installed on a system.
(In this document, the term "distribution" is used to mean a set
of code that's developed and distributed together. A "distribution"
is the same as a Red Hat or Debian package, but the term "package"
already has a meaning in Python terminology, meaning "a directory
with an ``__init__.py`` file in it.")
Requirements
============
We need a way to figure out what distributions, and what versions of
those distributions, are installed on a system. We want to provide
features similar to CPAN, APT, or RPM. Required use cases that
should be supported are:
* Is distribution X on a system?
* What version of distribution X is installed?
* Where can the new version of distribution X be found? (This can
be defined as either "a home page where the user can go and
find a download link", or "a place where a program can find
the newest version?" Both should probably be supported.)
* What files did distribution X put on my system?
* What distribution did the file x/y/z.py come from?
* Has anyone modified x/y/z.py locally?
* What other distributions does this software need?
* What Python modules does this distribution provide?
Database Location
=================
The database lives in a bunch of files under
<prefix>/lib/python<version>/install-db/. This location will be
called INSTALLDB through the remainder of this PEP.
The structure of the database is deliberately kept simple; each
file in this directory or its subdirectories (if any) describes a
single distribution. Binary packagings of Python software such as
RPMs can then update Python's database by just installing the
corresponding file into the INSTALLDB directory.
The rationale for scanning subdirectories is that we can move to a
directory-based indexing scheme if the database directory contains
too many entries. For example, this would let us transparently
switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some
similar hashing scheme.
Database Contents
=================
Each file in INSTALLDB or its subdirectories describes a single
distribution, and has the following contents:
An initial line listing the sections in this file, separated
by whitespace. Currently this will always be 'PKG-INFO FILES
REQUIRES PROVIDES'. This is for future-proofing; if we add a
new section, for example to list documentation files, then
we'd add a DOCS section and list it in the contents. Sections
are always separated by blank lines.
A distribution that uses the Distutils for installation should
automatically update the database. Distributions that roll their
own installation will have to use the database's API to
manually add or update their own entry. System package managers
such as RPM or pkgadd can just create the new file in the
INSTALLDB directory.
Each section of the file is used for a different purpose.
PKG-INFO section
----------------
An initial set of RFC-822 headers containing the distribution
information for a file, as described in PEP 241, "Metadata for
Python Software Packages".
FILES section
-------------
An entry for each file installed by the
distribution. Generated files such as .pyc and .pyo files are
on this list as well as the original .py files installed by a
distribution; their checksums won't be stored or checked,
though.
Each file's entry is a single tab-delimited line that contains
the following fields:
* The file's full path, as installed on the system.
* The file's size
* The file's permissions. On Windows, this field will always be
'unknown'
* The owner and group of the file, separated by a tab.
On Windows, these fields will both be 'unknown'.
* A SHA1 digest of the file, encoded in hex. For generated files
such as \*.pyc files, this field must contain the string "-",
which indicates that the file's checksum should not be verified.
REQUIRES section
----------------
This section is a list of strings giving the services required for
this module distribution to run properly. This list includes the
distribution name ("python-stdlib") and module names ("rfc822",
"htmllib", "email", "email.Charset"). It will be specified
by an extra 'requires' argument to the ``distutils.core.setup()``
function. For example::
setup(..., requires=['xml.utils.iso8601',
Eventually there may be automated tools that look through all of
the code and produce a list of requirements, but it's unlikely
that these tools can handle all possible cases; a manual
way to specify requirements will always be necessary.
PROVIDES section
----------------
This section is a list of strings giving the services provided by
an installed distribution. This list includes the distribution name
("python-stdlib") and module names ("rfc822", "htmllib", "email",
"email.Charset").
XXX should files be listed? e.g. $PREFIX/lib/color-table.txt,
to pick up data files, required scripts, etc.
Eventually there may be an option to let module developers add
their own strings to this section. For example, you might add
"XML parser" to this section, and other module distributions could
then list "XML parser" as one of their dependencies to indicate
that multiple different XML parsers can be used. For now this
ability isn't supported because it raises too many issues: do we
need a central registry of legal strings, or just let people put
whatever they like? Etc., etc...
API Description
===============
There's a single fundamental class, InstallationDatabase. The
code for it lives in distutils/install_db.py. (XXX any
suggestions for alternate locations in the standard library, or an
alternate module name?)
The InstallationDatabase returns instances of Distribution that contain
all the information about an installed distribution.
XXX Several of the fields in Distribution are duplicates of ones in
distutils.dist.Distribution. Probably they should be factored out
into the Distribution class proposed here, but can this be done in a
backward-compatible way?
InstallationDatabase has the following interface::
class InstallationDatabase:
def __init__ (self, path=None):
"""InstallationDatabase(path:string)
Read the installation database rooted at the specified path.
If path is None, INSTALLDB is used as the default.
"""
def get_distribution (self, distribution_name):
"""get_distribution(distribution_name:string) : Distribution
Get the object corresponding to a single distribution.
"""
def list_distributions (self):
"""list_distributions() : [Distribution]
Return a list of all distributions installed on the system,
enumerated in no particular order.
"""
def find_distribution (self, path):
"""find_file(path:string) : Distribution
Search and return the distribution containing the file 'path'.
Returns None if the file doesn't belong to any distribution
that the InstallationDatabase knows about.
XXX should this work for directories?
"""
class Distribution:
"""Instance attributes:
name : string
Distribution name
files : {string : (size:int, perms:int, owner:string, group:string,
digest:string)}
Dictionary mapping the path of a file installed by this distribution
to information about the file.
The following fields all come from PEP 241.
version : distutils.version.Version
Version of this distribution
platform : [string]
summary : string
description : string
keywords : string
home_page : string
author : string
author_email : string
license : string
"""
def add_file (self, path):
"""add_file(path:string):None
Record the size, ownership, &c., information for an installed file.
XXX as written, this would stat() the file. Should the size/perms/
checksum all be provided as parameters to this method instead?
"""
def has_file (self, path):
"""has_file(path:string) : Boolean
Returns true if the specified path belongs to a file in this
distribution.
"""
def check_file (self, path):
"""check_file(path:string) : Boolean
Checks whether the file's size, checksum, and ownership match,
returning true if they do.
"""
Deliverables
============
A description of the database API, to be added to this PEP.
Patches to the Distutils that 1) implement an InstallationDatabase
class, 2) Update the database when a new distribution is installed. 3)
add a simple package management tool, features to be added to this
PEP. (Or should that be a separate PEP?) See [2]_ for the current
patch.
Open Issues
===========
PJE suggests the installation database "be potentially present on
every directory in sys.path, with the contents merged in sys.path
order. This would allow home-directory or other
alternate-location installs to work, and ease the process of a
distutils install command writing the file." Nice feature: it does
mean that package manager tools can take into account Python
packages that a user has privately installed.
AMK wonders: what does setup.py do if it's told to install
packages to a directory not on sys.path? Does it write an
install-db directory to the directory it's told to write to, or
does it do nothing?
Should the package-database file itself be included in the files
list? (PJE would think yes, but of course it can't contain its
own checksum. AMK can't think of a use case where including the
DB file matters.)
PJE wonders about writing the package DB file
**first**, before installing any other files, so that failed partial
installations can both be backed out, and recognized as broken.
This PEP may have to specify some algorithm for recognizing this
situation.
Should we guarantee the format of installation databases remains
compatible across Python versions, or is it subject to arbitrary
change? Probably we need to guarantee compatibility.
Rejected Suggestions
====================
Instead of using one text file per distribution, one large text
file or an anydbm file could be used. This has been rejected for
a few reasons. First, performance is probably not an extremely
pressing concern as the database is only used when installing or
removing software, a relatively infrequent task. Scalability also
likely isn't a problem, as people may have hundreds of Python
packages installed, but thousands or tens of thousands seems
unlikely. Finally, individual text files are compatible with
installers such as RPM or DPKG because a binary packager can just
drop the new database file into the database directory. If one
large text file or a binary file were used, the Python database
would then have to be updated by running a postinstall script.
On Windows, the permissions and owner/group of a file aren't
stored. Windows does in fact support ownership and access
permissions, but reading and setting them requires the win32all
extensions, and they aren't present in the basic Python installer
for Windows.
References
==========
.. [1] Michael Muller's patch (posted to the Distutils-SIG around 28
Dec 1999) generates a list of installed files.
.. [2] A patch to implement this PEP will be tracked as
patch #562100 on SourceForge.
http://www.python.org/sf/562100 .
Code implementing the installation database is currently in
Python CVS in the nondist/sandbox/pep262 directory.
Acknowledgements
================
Ideas for this PEP originally came from postings by Greg Ward,
Fred L. Drake Jr., Thomas Heller, Mats Wichmann, Phillip J. Eby,
and others.
Many changes and rewrites to this document were suggested by the
readers of the Distutils SIG.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: